The Lord of the Rings is Coming: Hears Your Word, Laughs: Google’s New Tool for Video Parsing and Audio Processing
Developers will be able to use new ways of tapping the models ability to parse video and audio with the new tools that Google is launching. The company also said it is adding new Gemini-powered features to its web-based coding tool, Project IDX, including ways for AI to debug and test code.
The Lord of the Rings trilogy can be fit into that context window according to Pichai, who is explaining this to me. This seems too specific, so I ask him: this has already happened, hasn’t it? Someone in Google is just checking to see if Gemini spots any continuity errors, trying to understand the complicated lineage of Middle-earth, and seeing if maybe AI can finally make sense of Tom Bombadil. “I’m sure it has happened, or will happen, whichever one you prefer.” Pichai chuckled with a laugh.
Pichai also thinks the larger context window will be hugely useful for businesses. He says there are use cases that allow for a lot of personal context at the moment of the query. “Think of it as we have dramatically expanded the query window.” He thinks that filmmakers would ask the reviewers what to say about their movie, and companies would look over the financial records of their customers. He believes it to be one of the larger breakthrough we have done.
Eventually, Pichai tells me, all these 1.0s and 1.5s and Pros and Ultras and corporate battles won’t really matter to users. He says that people will be consuming the experiences. “It’s like using a smartphone without always paying attention to the processor underneath.” Everyone knows the chip in their phone is consequential, at this moment, he says. There is a shift in the underlying technology, he says. “People do care.”
Gemini Pro 1.5: Magical AI for Generative Artificial Intelligence based on a Model of Mixture of Experts, Perception and Experiments
The model performs this sort of reasoning across the page, every word, and it really feels quite magical, says Oriol Vinyals.
In a demo, Google DeepMind showed Gemini Pro 1.5 analyzing a 402-page PDF of the Apollo 11 communications transcript. The model was asked to find funny portions and highlight some moments, like when astronauts said that a communications delay was due to a sandwich break. Another demo showed the model answering questions about specific actions in a Buster Keaton movie. The previous version of the game answered these questions in very short amounts of text or video. Google hopes that the new capabilities will allow developers to build new kinds of apps on top of the model.
For its size, the model’s score on several popular benchmarks shows how good the Gemini Pro 1.5 is. The new model exploits a technique previously invented by Google researchers to squeeze out more performance without requiring more computing power. The technique is called mixture of experts and it uses parts of a model that are appropriate for a given task to be activated.
Google says that Gemini Pro 1.5 is as capable as its most powerful offering, Gemini Ultra, in many tasks, despite being a significantly smaller model. Hassabis says there is no reason why the same technique can’t be applied to boost the status of Gemini Ultra.
The frenetic pace of progress in generative AI is at odds with worries about the risks the technology might pose. A way to get feedback on potential risks is offered by giving limited access to the software, according to the company. The company says it has also provided researchers at the UK’s AI Safety Institute with access to its most powerful models so that they can test them.