The system achieved state-of-the-art results in fixed-budget language model training, small-model training speed, and GPU kernel optimization, according to a company announcement.
The system automates the research loop for a target objective: it proposes an idea, implements it, runs an experiment, validates the result, and uses what it learns to choose the next experiment. It runs many research threads over long horizons, keeps useful context from prior experiments, combines promising branches, and puts results through validation for reward hacks and variance before treating improved performance as real progress. The company has open-sourced training scripts and kernel implementations discovered by the system, publishing them on GitHub.
The benchmarks were chosen for tight feedback loops and clear metrics. NanoChat Autoresearch, based on Andrej Karpathy's repository, tasks systems with training a small language model to the lowest validation loss within a fixed five-minute budget on a single GPU. NanoGPT Speedrun is a harder test: the benchmark asks how quickly a small GPT-style model can be trained to a fixed validation loss of 3.28 on the FineWeb text dataset using a single HGX H100 8-GPU node, and has been optimized by the community for over two years with 83 human record-setting contributions to the leaderboard. Recursive's best run reached the target in 77.3 seconds on 8× H100. SOL-ExecBench tests GPU kernel optimization toward hardware performance limits.
The demonstration arrives weeks after Recursive emerged from stealth in May with $650 million in funding at a $4.65 billion valuation, led by GV and Greycroft, with additional backing from AMD Ventures and NVIDIA. The company was founded in 2025 by researchers from OpenAI, Google DeepMind, Meta AI, Salesforce AI, and Uber AI, including Richard Socher, Tim Rocktäschel, Jeff Clune, Josh Tobin, and Tim Shi. The startup's central thesis is that the next leap in AI will come not from simply building larger models, but from automating the research process itself.
Recursive describes these as early signs that its system can advance the frontier on AI training and infrastructure tasks when the goal is well-defined, measurable, and efficient to evaluate repeatedly. The key uncertainty is whether such results can generalise to domains where goals are less well-defined, harder to measure, and less efficient to evaluate — the characteristics of fundamental AI research breakthroughs. The broader question is whether systems that excel at optimizing narrow, highly instrumented tasks can scale to the open-ended scientific discovery that recursive self-improvement ultimately requires.