Tech

How an AI system learned to write expert-level scientific code

A new study shows how ERA combines large language models with tree search to rapidly build expert-level research software, outperforming leading benchmarks in tasks from single-cell genomics to COVID-19 hospitalization forecasting.

Study: An AI system to help scientists write expert-level empirical software. Image Credit: Molnia / Shutterstock

A recent study published in the journal Nature introduces Empirical Research Assistance (ERA), an artificial intelligence (AI) system that combines a large language model (LLM) with a tree search (TS) algorithm, potentially overcoming the time-consuming and expertise-sensitive challenges associated with manual software development. ERA uses AI and the TS algorithm to automatically design and improve scientific software. The optimized system can generate expert-level solutions across various fields. In some cases, it even outperformed human-developed and benchmark models on specific scorable scientific tasks, including the official CovidHub Ensemble used for coronavirus disease 2019 (COVID-19)-related hospitalization forecasting.

AI Scientific Software Background

Empirical software is crucial across many areas of scientific research. This is because such software allows scientists to model complex systems and diseases. These range from fluid and atmospheric dynamics to social and biological processes. Developing these software systems, however, is a slow, labor-intensive, and expert-sensitive process. Automation could spearhead innovation and improve research efficiency.

ERA Tree Search Study Design

In the present study, researchers developed ERA to automatically generate and refine scientific software by optimizing quality scores. They regarded the creation of scientific software as a “scorable task”. The candidate programs were evaluated based on how well their outputs could maximize predefined performance metrics.

The system generates multiple software candidates, then rewrites and improves them in a feedback loop guided by performance signals from the scoring function. As an advancement over template-based generative programming (GP), ERA uses an LLM as a flexible engine to generate code by integrating domain knowledge from multiple possible solutions. Unlike systems that generate code from scratch, ERA can modify existing software candidates. ERA is also more versatile than AutoML, as it can rewrite almost any software. This includes everything from preparing and organizing data to running complex simulations and solving advanced mathematical problems.

The TS algorithm prioritizes promising candidates, ensuring systematic exploration of alternative implementations. Researchers can inject insights from research papers, textbooks, and search engine results into the LLM prompts. This enables knowledge-guided code evolution. Similar to combining different ideas, the researchers generated ‘recombinations’ of method pairs based on code summaries. They then ran ERA with prompts for these recombinations to improve model solutions.

The team evaluated ERA across various Kaggle playground competitions and six scientific benchmarks. These spanned bioinformatics, epidemiology, geospatial analysis, neuroscience, and numerical computation. They included tasks such as single-cell RNA sequencing (scRNA-seq) batch integration, COVID-19 hospitalization forecasting, time-series prediction, geospatial segmentation, neural activity modeling in zebrafish, and numerical integration problems.

Researchers assessed ERA’s performance using competition rankings and task-specific scoring systems. To predict COVID-19-related hospitalizations in the United States (US), they tested ERA using a rolling validation approach, in which models were optimized and selected using the preceding 6 weeks of data, while training used historical hospitalization records. They also verified performance using short CovidHub summaries without original code and the General Time Series Forecasting Model Evaluation (GIFTEval).

Scientific Benchmark Performance Results

ERA consistently demonstrated expert-level performance across multiple scientific disciplines. The system even outperformed human-developed methods and benchmark systems in several benchmarked tasks. In bioinformatics, the system generated 40 new approaches for scRNA-seq analysis, surpassing leading methods on the OpenProblems leaderboard. One version of the Batch Balanced K-Nearest Neighbors (BBKNN) method developed by ERA improved overall performance by 14% compared with previously published approaches. ERA, importantly, preserved important biological signals during batch correction.

In epidemiology, the system produced 14 forecasting strategies that outperformed the official CovidHub Ensemble in predicting COVID-19-related hospitalizations in the US. ERA achieved a mean Weighted Interval Score (WIS) of 26, outperforming the official CovidHub Ensemble benchmark, which had a mean WIS of 29, with lower scores indicating better performance. The system achieved this by recombining strengths from different modeling approaches. These included pairing statistical trend analysis with epidemiological disease-spread models. Many hybrid strategies developed using ERA’s TS algorithm also performed better than their parent models, highlighting the value of the recombining methods.

The system, furthermore, demonstrated robust performance in time-series forecasting, geospatial image segmentation, brain activity estimation in zebrafish, and numerical integration tasks. In several cases, ERA exceeded leaderboard results from foundation models, deep learning systems, and traditional forecasting approaches. The system’s advantage stemmed from its ability to continuously explore and refine thousands of software variations while integrating external scientific knowledge from research papers, textbooks, and search engines.

Adding problem-specific guidance to the prompts considerably improved performance. As an example, researchers instructed ERA to create its own boosted decision tree (BDT) library without using existing software packages. They manually verified the results, confirming that ERA followed these instructions. The system also performed consistently well without publicly available code.

AI Research Automation Implications

The findings suggest that AI-driven systems such as ERA could dramatically speed up some forms of computational scientific work by reducing the time, expertise, and computational effort needed to develop advanced research software. By rapidly generating and refining high-performing solutions across various fields using a score-based optimization process, ERA may help researchers tackle complex scientific challenges more efficiently. The system can generate expert-level software in hours or days instead of weeks or months, potentially accelerating progress across multiple areas of science.

However, the authors stress that optimizing empirical predictive models is not the same as full scientific discovery, which also requires reasoning about mechanisms, causal relationships, theories, and mathematical frameworks. They also note broader safety risks if such systems lower the expertise barrier for deploying advanced computational models in sensitive domains.

Download your PDF copy by clicking here.

Related Items:Featured, Games, Latest Technology, Tech, Technology, Technology News

Click to comment

Flipbeans

How an AI system learned to write expert-level scientific code

AI Scientific Software Background

ERA Tree Search Study Design

Scientific Benchmark Performance Results

AI Research Automation Implications

Leave a Reply
Cancel reply

Leave a Reply

Most Popular

‘True activism has to cost you something’: Bridgerton’s Nicola Coughlan on politics, paparazzi and parasocial fandom | Nicola Coughlan

‘Filled with human waste’: British biologist tests Ganga water, video sparks discussion

Mummers Parade is still going on, after string band competition postponed amid wind in Philadelphia | 2026 Philadelphia Mummers Parade Livestream

Ignored India Star Buys New BMW Car

How Prince’s ‘Purple Rain’ album plays a key role in ‘Stranger Things’ finale

Netflix New Releases: December 2025

Mike Santoli’s long-time ‘Mystery Broker’ is revealed, says bull run ‘going to end’ within 2 years

Arc Raiders down today and thousands can’t connect, here’s the shocking reason behind the massive server collapse

Mickey Rourke faces eviction from L.A. home over $60K in unpaid rent

Disney park worker hurt shielding crowd from 400-pound runaway prop in Indiana Jones show

Scorpio Horoscope Today, May 21, 2026: Earnings can exceed expectations, but don’t turn that into careless spending

Taylor Swift Just Reportedly Cut Out 1 Wedding Guest Amid Claims They ‘Can’t Be Trusted’—’It’s a Controversial Situation’

Plex appeal fades as Lifetime Pass jumps to $750

IPL 2026 | 'We've been quite poor': Hardik Pandya's blunt verdict on Mumbai Indians' IPL campaign – The Times of India

China’s new supply chain curbs threaten India’s electronics hub ambitions, industry seeks relief

IPL 2026 | 'We've been quite poor': Hardik Pandya's blunt verdict on Mumbai Indians' IPL campaign – The Times of India

Could Vincent Kompany be the driving force? FC Bayern Munich are reportedly weighing a shock, free transfer

Hardik Pandya says no hiding away for Mumbai Indians after another sloppy fielding show: ‘We’ve been quite poor’ | Cricket – Hindustan Times

Liverpool could activate this BVB star’s release clause

Carlsen Headlines Star-Studded Return Of World Team Rapid & Blitz Championships

Flipbeans

AI Scientific Software Background

ERA Tree Search Study Design

Scientific Benchmark Performance Results

AI Research Automation Implications

Recommended for you

Leave a Reply Cancel reply

Leave a Reply

Most Popular

‘True activism has to cost you something’: Bridgerton’s Nicola Coughlan on politics, paparazzi and parasocial fandom | Nicola Coughlan

‘Filled with human waste’: British biologist tests Ganga water, video sparks discussion

Mummers Parade is still going on, after string band competition postponed amid wind in Philadelphia | 2026 Philadelphia Mummers Parade Livestream

Ignored India Star Buys New BMW Car

How Prince’s ‘Purple Rain’ album plays a key role in ‘Stranger Things’ finale

Netflix New Releases: December 2025

Mike Santoli’s long-time ‘Mystery Broker’ is revealed, says bull run ‘going to end’ within 2 years

Arc Raiders down today and thousands can’t connect, here’s the shocking reason behind the massive server collapse

Mickey Rourke faces eviction from L.A. home over $60K in unpaid rent

Disney park worker hurt shielding crowd from 400-pound runaway prop in Indiana Jones show

Scorpio Horoscope Today, May 21, 2026: Earnings can exceed expectations, but don’t turn that into careless spending

Taylor Swift Just Reportedly Cut Out 1 Wedding Guest Amid Claims They ‘Can’t Be Trusted’—’It’s a Controversial Situation’

Plex appeal fades as Lifetime Pass jumps to $750

IPL 2026 | 'We've been quite poor': Hardik Pandya's blunt verdict on Mumbai Indians' IPL campaign – The Times of India

China’s new supply chain curbs threaten India’s electronics hub ambitions, industry seeks relief

IPL 2026 | 'We've been quite poor': Hardik Pandya's blunt verdict on Mumbai Indians' IPL campaign – The Times of India

Could Vincent Kompany be the driving force? FC Bayern Munich are reportedly weighing a shock, free transfer

Hardik Pandya says no hiding away for Mumbai Indians after another sloppy fielding show: ‘We’ve been quite poor’ | Cricket – Hindustan Times

Liverpool could activate this BVB star’s release clause

Carlsen Headlines Star-Studded Return Of World Team Rapid & Blitz Championships

Leave a Reply
Cancel reply