Google is considering the Gemini AI project, which would use phone data and photographs to inform individuals about their lives.
A Google team has suggested utilizing artificial intelligence technology to construct a “bird’s-eye” vision of consumers’ lives using data from mobile phones such as images and queries.
Table of Contents
ToggleProject Ellmann
The objective of “Project Ellmann,” named after historian and literary critic Richard David Ellmann, is to employ LLMs like Gemini to absorb search results, detect similarities in a user’s images, construct a chatbot, and “answer previously impossible questions.” Ellmann’s stated goal is to be “Your Life Story Teller.”
Whether the business intends to include these features in Google Photos or any other product is unclear. According to a corporate blog post, Google Photographs has over 1 billion users and 4 trillion photographs and videos.
Project Ellmann is only one of many ways Google plans to use AI technology to build or improve its products. Google introduced Gemini, their next “most capable” and powerful AI model, on Wednesday, outperforming OpenAI’s GPT-4 in some circumstances. The business intends to license Gemini to a diverse set of clients via Google Cloud for usage in their apps. Gemini is multi-modal, which means it can process and interpret information other than text, such as pictures, video, and audio.
According to papers obtained by CNBC, a product manager for Google Photos presented Project Ellmann alongside Gemini teams at a recent internal conference. They said that the teams had spent the previous few months concluding that huge language models are the best technology to make this s-eye view of one’s life story a reality.
Ellmann might utilize biographies, past situations, and subsequent images to characterize a user’s photos more profoundly than “just pixels with labels and metadata,” according to the presentation. It advocates being able to identify a succession of moments such as university years, Bay Area years, and parent years.
According to the presentation, huge language models may infer events such as the birth of a user’s kid. “This LLM can use knowledge from higher in the tree to infer that this is Jack’s birth, and that he’s James and Gemma’s first and only child.”
Another example given by presenters was recognizing that one user had just attended a class reunion. “It’s exactly 10 years since he graduated and is full of faces not seen in 10 years, so it’s probably a reunion,” the team speculated in its presentation.
The company also exhibited “Ellmann Chat,” which had the following description: “Imagine opening ChatGPT but it already knows everything about your life.” “How would you put it?”
It showed an example chat in which a person inquired, “Do I have a pet?” To which it responds, “Yes, the user has a dog who wears a red raincoat,” before offering the dog’s name and the names of the two family members it is most frequently seen with.
Other slides showed Ellmann presenting an overview of the user’s eating habits. “You appear to appreciate Italian cuisine. There are numerous photographs of pasta meals and one of a pizza.” It also stated that the individual seems to love trying new foods because one of their photographs had a menu with an unfamiliar cuisine.
According to the presentation, the system also detected what things the user was considering purchasing, their interests, jobs, and vacation plans based on the user’s screenshots. It also stated that it will be able to recognize users’ favorite websites and apps, citing Google Docs, Reddit, and Instagram as examples.
The projected Project Ellmann might aid Google in its fight against other digital behemoths to produce more tailored life experiences.
Going ahead with memories
For years, Google Photos and Apple Photos have offered “memories” and built albums based on photo trends.
Google revealed in November that, with the aid of AI, Google Photographs can now group comparable photographs and organize screenshots into easily accessible albums.
In June, Apple said that its most recent software update will add the ability for its camera app to detect humans, dogs, and cats in photographs. It already recognizes faces and lets users search for them by name.
For example, Apple and Google continue to avoid naming gorillas despite claims in 2015 that the business mislabeled Black individuals as gorillas. According to a New York Times investigation this year, Apple, and Google’s Android software, which powers the vast majority of the world’s cellphones, disabled the ability to visually search for primates out of concern of identifying a human as an animal.
Companies such as Google, Facebook, and Apple have included tools to eliminate undesirable memories over time, but users have claimed that they occasionally still appear and need users to flip through many settings to minimize them.