The Gemini Multi-Task Language Understanding Benchmark, and What We’re Expecting to Learn from the Bard Community in the Coming Era
Right now, Bard is still just a chatbot: you type, it types back. But there’s a new version of Bard coming soon that could be much more. A preview of the most powerful and capable version of the new large language model is going to be released next year. The model can accept both images and audio and video in addition to just text, which is what the multi- version of the model is called.
Bard is now running Gemini Pro, the middle tier of the Gemini series. It’s called Ultra because it’s the most capable and fastest, but it’s not the most small or fast because it isn’t meant for on-device tasks. It’s meant to be the Goldilocks version of the model, really: fast and efficient while still as capable as possible.
Right now, Gemini’s most basic models are text in and text out, but more powerful models like Gemini Ultra can work with images, video, and audio. And “it’s going to get even more general than that,” Hassabis says. “There’s still things like action, and touch — more like robotics-type things.” He says that Gemini will become more grounded and aware over time, as it gets more senses. These models don’t have the same biases and other problems that they did when they were hallucinating. The more they know, the better, says Hassabis.
So, let’s just get to the important question, shall we? OpenAI’s GPT-4 is more than ready to take on the competition. This has very clearly been on Google’s mind for a while. “We’ve done a very thorough analysis of the systems side by side, and the benchmarking,” Hassabis says. The Multi-Task Language Understanding benchmark is a well-established benchmark that compares the ability of two models to generate Python code. Hassabis thinks he is substantially ahead on 30 out of 32 benchmarks. Some of them are not very wide. Some of them are bigger.
The beginning of a larger project and a step change in itself can be seen by the conversations that Pichai and Hassabis had. Gemini is the model Google has been waiting for, the one it has been building toward for years, maybe even the one it should have had ready before OpenAI and ChatGPT took over the world.