KAIKAI

An AI researcher is gaining attention for its ability to allow AI to experiment and optimize models autonomously, opening up new avenues of research but also raising concerns within the community.

Andrej Karpathy is a freelance researcher and founder of Eureka Labs. Photo: The Information.

Andrej Karpathy, a renowned AI researcher who worked from the early days of OpenAI and later at Tesla, is gaining attention on X. He currently boasts 1.9 million followers on X, with his credible and often prophetic statements about AI.

The viral post describes an experiment using an AI coding agent to run a series of tests aimed at improving the training process of a small language model. He left the AI agent running continuously for two days, enough time to conduct 700 different tests.

Through these experiments, the system discovered 20 optimal ways to improve training time. This result, known as "autoresearch," increased training efficiency by 11% when applied to larger language models.

Tobias Lütke, CEO of Shopify, shared on X that he tried using "autoresearch" to optimize an AI model based on the company's internal data. Lütke said that after letting the system run overnight, it performed 37 tests and resulted in a 19% performance increase.

Many people are paying attention to "autoresearch" because it closely resembles the idea of self-improving AI systems, a concept previously only found in science fiction. While some researchers are eager to realize it, others are concerned about the prospect of AI being able to upgrade itself.

With this capability, AI can continuously optimize its own code and training process in a loop. This could lead to what AI security researchers call an intelligence explosion, where machines surpass human cognitive abilities and escape control.

However, Andrej Karpathy's experiment hasn't reached that level yet. The AI agent in the "autoresearch" system is only adjusting the training code and initial setup of a different, much smaller and less complex AI model.

The AI system is not yet capable of perfecting itself. Nevertheless, Karpathy emphasized that this experiment has significant implications for how AI labs will conduct research in the future and could significantly accelerate their development.

“Top LLM labs will eventually do this too,” Andrej Karpathy wrote on X. He acknowledged that scaling up would require more tools, since his system only needed to handle the fine-tuning of a model and the training process was contained in 630 lines of Python code.

Karpathy added that implementing “autoresearch” only requires meeting technical requirements. “You task a swarm of agents with coordinating and fine-tuning small models, then pass the ideas to larger-scale testing, and humans only need to participate at the periphery,” he wrote.

Janakiram MSV, principal analyst at Janakiram & Associates, points out that the core component of “autoresearch” can be applied to many different agent systems. He views Karpathy’s article as best practice for those working with AI agents, with a guide file clearly describing tasks, constraints, things agents shouldn’t do, and stop conditions.

However, some critics argue that Karpathy essentially only rediscovers a part of the AutoML process, which companies like Google and Microsoft have been using for years. AutoML also operates in an optimization loop and performs a series of experiments to find the best data, the most suitable model architecture, and the optimal fine-tuning.

Andrej Karpathy refutes comparisons with AutoML, arguing that older methods are actually far inferior to "autoresearch." According to him, previous systems are almost incapable of matching an LLM, which can write code itself, learn from previous experiments, and even access the internet to find new ideas.

Experiment that has shaken the AI ​​world.

An AI researcher is gaining attention for its ability to allow AI to experiment and optimize models autonomously, opening up new avenues of research but also raising concerns within the community.

Experiment that has shaken the AI world.