The Gemini Advanced and DALL-E 3 Bots: Human and Filipino Observation and Implications for Google and Other Chatbots
The Gemini Advanced is capable. There is no denying that this works better than the lower-tier Gemini. It definitely works best when integrated with Google’s other products like Search and Maps. There are many “creative” requests that involve images for example, and Gemini has a long way to go. The chatbot understands longer strings of instructions, but once you add the photos, you’re probably better off choosing an AI model specifically designed to make pictures.
There’s still a lot of work needed to get consistent, accurate results from these chatbots, and people need to keep using them for the bots to learn how to best respond to questions. Here are some tests I ran to see how they held up.
Eerily, perhaps due to the specificity of the prompt, both chatbots returned very similar generated images. Gemini Ultra’s dog photo, however, elicited what other Verge staff members described as “minor horror.” Its dog has two tongues and an extra limb. It overemphasized the fur’s texture, so it just looks… wrong. I don’t know whether a dog would still be happy in a field of daisies. DALL-E 3 is called upon to generate its images. You still see a digital photo of the dog, even though it doesn’t make you cringe.
Google said Gemini Ultra was made to handle “highly complex tasks,” so I asked Gemini Advanced what these tasks were. The robot responded to the question, “Translation.” I asked the translator to translate a few lines of the Philippine Patriotic Oath. It is a fairly obscure oath, especially since the version I know has been changed several times in the past 20 years.
It cannot assist me with my request since it has not been trained to respond in a subset of languages. I asked it which languages it supports but was told it couldn’t give me a list of the languages it understands. I asked if it knew about Filipino, and it said yes. 40 Gemini currently supports 40 languages, but there isn’t a list of Filipinos in them.
Gemini Advanced can tap into other Google products, which worked in its favor when it tapped Google Maps for both questions. It returned a rundown of several Filipino and Ethiopian restaurants in New York City, attaching Google Maps coordinates for each.
I asked the author a few days ago. I was just looking for new restaurants, and not for this test, so the results were inaccurate. The names of the restaurants were correct — these were establishments that do exist — however, none of the locations were right. I mentioned the issue to chatgt. This test had more accurate locations, but also had a smaller list of restaurants. So in this case, Gemini clearly worked better for this request.
One of the main reasons someone like me would use a chatbot is to summarize complicated papers. Two paragraphs from a paper on image editing were fed into the Advanced. The paper gave me a throbbing headaches the first time I read it, so I was sure it could be given to me. To fully test out its new abilities, I also wanted to see how the chatbot strings the two different instructions. The person asking to summarize was the same person asking to make it generate text.
The summary was very well written. It really did give me a rundown of the concepts discussed in those two paragraphs, but it didn’t “translate” it into plain language. I should have asked that. Gemini then moved on to writing the article I asked for, and you know what? The summary I wanted was better than those 150 words.
How to fit the entire Lord of The Rings trilogy into that bigger context window: Is it really a problem for users? A Conversation with Pichai
Chatbots occupy a tricky space for users — they have to be a search engine, a creation tool, and an assistant all at once. That’s especially true for a chatbot coming from Google, which is increasingly counting on AI to supplement its search engine, its voice assistant, and just about every productivity tool in its arsenal.
There are a lot of improvements in Gemini 1.5: Gemini 1.5 Pro, the general-purpose model in Google’s system, is apparently on par with the high-end Gemini Ultra that the company only recently launched, and it bested Gemini 1.0 Pro on 87 percent of benchmark tests. It was created using a technique known as the mixture of experts, or MoE, which means it only runs part of the overall model when you send in a query. (Here’s a good explainer on the subject.) That approach should make the model faster for you and less wasteful for Google to run.
As he’s explaining this to me, Pichai notes offhandedly that you can fit the entire Lord of The Rings trilogy into that context window. This seems too specific, so I ask him: this has already happened, hasn’t it? Someone in the search engine is trying to understand the complicated roots of Middle-earth, looking for continuity errors and trying to understand Tom Bombadil. “I’m sure it has happened,” Pichai says with a laugh, “or will happen — one of the two.”
The larger context window is likely to be useful for businesses. He says that this will allow use cases where you can add context and information at the moment of the query. “Think of it as we have dramatically expanded the query window.” He imagines that filmmakers would be able to ask what reviewers think of the movie, and that companies would be able to look over huge amounts of financial records. He sees it as one of the larger breakthrough we have done.
Eventually, Pichai tells me, all these 1.0s and 1.5s and Pros and Ultras and corporate battles won’t really matter to users. He says people will just be consuming experiences. “It’s like using a smartphone without always paying attention to the processor underneath.” He says that we’re still in a phase where everyone knows about the chip in their phone. He says that the underlying technology is changing fast. “People do care.”