Uncategorized

DeepMind has an artificial intelligence robot slam-dunking a basketball

Learning to Do Things Differently with the Gemini Robotics Model: A Case Study from the London office at Collected Artificial Intelligence

Alexander Khazatsky, an artificial intelligence researcher and co-founder of Collected Artificial Intelligence in Berkeley, California, which is focused on creating data, believes that the model named ‘Gemini Robotics’ is a small but important step towards that goal.

The company showed several robot arms equipped with the new model that respond to spoken commands, and could do things such as folding paper, hand over veggies, and put a pair of glasses in a case. The robots rely on the new model to connect items that are visible with possible actions in order to do what they’re told. The model is trained so it can be generalized across different hardware.

While we have made progress in each one of these areas individually, we are bringing drastic increasing performance in all three areas with a single model. It enables us to build robots that are more robust to changes in their environment, as well as being more responsive.

The breakthroughs that gave rise to powerful chatbots, including OpenAI’s ChatGPT and Google’s Gemini, have in recent years raised hope of a similar revolution in robotics, but big hurdles remain.

In sci-fi tales, artificial intelligence often powers all sorts of clever, capable, and occasionally homicidal robots. A revealing limitation of today’s best AI is that, for now, it remains squarely trapped inside the chat window.

A team at the London based firm started out with the most advanced vision and language model, called Gemini 2.0, which it trained by analyzing huge volumes of data.

They created a specialized version of the model designed to excel at reasoning tasks involving 3D physical and spatial understanding — for example, predicting an object’s trajectory or identifying the same part of an object in images taken from different angles.

Finally, they further trained the model on data from thousands of hours of real, remote-operated robot demonstrations. The robotic brain could implement real actions, like using learned associations to generate the next word in a sentence, because of this.