The Replication of Artificial Intelligence: An Overview from Noah Hollman and Sam Müller, Frank Hutter, and a Success Report on DeepSeek-R1
The case that artificial intelligence models are still black boxes, even after training on synthetic or real world data, has not changed. As there are more exciting developments in the future, we should not overlook the studies that try to understand the how and why of artificial intelligence. The publications that announce the breakthrough are more important.
Hollman and colleagues’ work is an example of necessity spurring innovation: the researchers realized that there were not enough accessible real-world data sets to train their model, and so they found an alternative approach.
Enhancing trust in AI, along with minimizing harms, must remain a global priority, even though it seems to have been downgraded by Trump. The president has rescinded an executive order by his predecessor, which called on the National Institutes of Standards and Technology (NIST) and AI companies to collaborate to improve both trust in and the safety of AI, including for the use of synthetic data. The word safety was left out of Trump’s new executive order, which is called ‘Removing barriers to US leadership in artificial intelligence’. Last November, NIST published a report on methods for authenticating AI content and tracking its provenance (see go.nature.com/42c21tn). Researchers should not let efforts go to waste.
Synthetic data do not come free of risks, such as the danger of producing inaccurate results, or hallucinations. This is partly why it is important that such studies are replicated. Users can be reassured that they can trust the results of their queries, thanks to the bedrock of science, replication.
This advance is the work of computer scientists Noah Hollman, Samuel Müller and Frank Hutter at the University of Freiburg, Germany, and their colleagues. TabPFN is a model that is designed to analyse the data found in spreadsheets. A spreadsheet is created by adding rows and columns with data and using mathematical models to make predictions from that data. TabPFN can make predictions on any small data set, ranging from those used in accounting and finance to those from genomics and neuroscience. Moreover, the model predictions are accurate even though it is trained entirely without real-world data, but instead on 100 million randomly generated data sets.
DeepSeek, a company based in China, showed that big sums might not be needed. It released DeepSeek-R1, a large language model (LLM) capable of step-by-step tasks analogous to human reasoning — reportedly at a fraction of the cost and computing power of existing LLMs. The performance on chemistry and mathematics was similar to what the o 1 LLM released by Openai achieved last September. The news that an artificial intelligence is cheap has caused the price of some technology stocks to plummet.
It is not even past January, and 2025 is already proving to be a defining year for artificial intelligence (AI). On 21 January, just one day into his presidency, US President Donald Trump announced the Stargate Project, a joint venture between leading technology companies and financiers in the United States, Japan and the United Arab Emirates. They said they’d develop Artificial Intelligence infrastructure in the United States with US$500 billion.
In preliminary tests of R1’s abilities on data-driven scientific tasks — taken from real papers in topics including bioinformatics, computational chemistry and cognitive neuroscience — the model matched o1’s performance, says Sun. The ScienceAgentBench is a collection of problems her team created and they challenged both models to complete them. These tasks include analysising and visualization. Both models solved only around one-third of the challenges correctly. It cost 13 times less to run R1 using the API, but it had a slower “thinking” time than o1, notes Sun.
The great performance and low cost of Deepseek-R1 will make it more attractive for scientists to use LLMs in their research, says a researcher at Ohio State University. “Almost every colleague and collaborator working in AI is talking about it.”
R1 is cheap and open, which could be a game-changer if researchers want to research the model at a fraction of the cost of their competitors. They can download the model to their own server and build it for free, which is not possible with competing closed models.
In mathematics R1 is showing promise. Frieder Simon , a mathematician and computer scientist at the University of Oxford, UK, challenged both models to create a proof in the abstract field of functional analysis and found R1’s argument more promising than o1’s. Researchers have to be prepared with skills such as telling a good and bad proof apart, in order to benefit from the model’s mistakes.