A team of researchers from Sakana AI, a Japanese AI startup founded last year by Google alumni and that reportedly was valued at over a $1 billion in June, this week published a paper titled "The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery" that is making big waves and could revolutionize how we conduct scientific research.

The paper’s authors, who (in addition to folks from Sakana) hail from the University of Oxford, the University of British Columbia and the Vector Institute in Toronto, imagine a world where AI systems can independently generate novel research ideas, design and run experiments, analyze results, and even write up full scientific papers. And, while far from perfect, this new paper presents a comprehensive framework called The AI Scientist that aims to do exactly that: automate the entire scientific discovery process from start to finish.

Here's how it works: The AI Scientist uses large language models (specifically either GPT-4o from OpenAI, Claude 3.5 Sonnet from Anthropic (#798) and the open-source Llama 3.1 405B (#806) to do the heavy lifting. These models have already shown impressive capabilities in assisting human scientists with tasks like brainstorming ideas or writing code, but The AI Scientist takes things several steps further by combining so many steps of the scientific process together.

The system starts by generating novel research ideas in a given field. It then designs experiments to test these ideas, writes the necessary code, and executes the experiments. After collecting and analyzing the results, The AI Scientist writes up a full scientific paper (composed with LaTeX) describing its findings, complete with figures and proper formatting.

Separately, the researchers also developed an AI-powered review system to evaluate the quality of the generated papers. This automated reviewer provides feedback and provides scores that are supposed to be comparable to human reviewers at top machine learning conferences like NeurIPS.

In the paper, the team demonstrated The AI Scientist’s capabilities across three distinct areas of machine learning research: diffusion modeling, transformer-based language modeling, and learning dynamics. Based on judgments by their own AI reviewer (so perhaps take this with a grain of salt), they found that The AI Scientist could produce papers that exceed the acceptance threshold for top ML conferences.

One of the most impressive aspects is the system's cost-effectiveness. The researchers report that The AI Scientist can generate full research papers for as little as $15 each. This could dramatically expand access to cutting-edge research capabilities (although, ML research itself would typically cost at least multiple orders of magnitude more than $15).

Of course, like any cutting-edge research, there are tons of limitations and ethical considerations to work through. The current version of The AI Scientist is prone to some errors and hallucinations, and there are valid concerns about how this technology could impact the scientific publishing ecosystem. The researchers, for example, emphasize that papers generated by AI systems should be clearly labeled as such.

Looking to the future, the team envisions expanding The AI Scientist to other scientific domains beyond machine learning. They suggest that by integrating with robotic lab automation, similar systems could one day conduct experiments in fields like biology, chemistry, and materials science.

I encourage you to check out the full paper for more details on this fascinating development, including to see for yourself full examples of the generated ML papers… some of which are at least superficially compelling as human-expert designed and written up. I’ve also got a link to the associated GitHub repo, but be careful, it includes the ominous warning: “Caution! This codebase will execute LLM-written code. There are various risks and challenges associated with this autonomy. This includes e.g. the use of potentially dangerous packages, web access, and potential spawning of processes. Use at your own discretion. Please make sure to containerize and restrict web access appropriately.” Be very careful to do that because, as covered by the mainstream press, The AI Scientist displayed some fairly concerning power-seeking behaviors with implications for AI safety, including (for example) editing its own code to remove time constraints on how long a given agentic AI process can run for, allowing The AI Scientist to potentially consume far more resources than its human creators intended. This kind of power-seeking behavior could be especially dangerous if a system with autonomy like The AI Scientist had access to a robotic wet lab where real-world experiments are being run because it could end up manufacturing, say, novel and dangerous viral pathogens.

Despite the risks, I (as usual) remain optimistic that we can work out safeguards around the most dangerous risks and that this research represents a significant step towards realizing the potential of AI to be creative and productive in scientific discovery. While it's unlikely to fully replace human scientists anytime soon, The AI Scientist and systems like it could become powerful tools to accelerate innovation and tackle some of the world's most pressing challenges from clean-energy production to food security to healthcare.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.