Abstract
Climate science relies on automated workflows to transform comprehensive research questions into data-driven insights over massive, heterogeneous datasets distributed across multiple data repositories. With the rise of large language models, automated workflow generation has become increasingly feasible. However, existing approaches face significant challenges: generic LLM agents lack domain-specific knowledge of climate data sources and analysis conventions, while static scripting pipelines cannot adapt to diverse task requirements or recover from execution failures. Consequently, existing methods struggle to reliably complete complex, multi-step climate analysis workflows.The CLIMATEAGENT framework is proposed to address these limitations through specialized multi-agent orchestration. The architecture decomposes high-level user questions into executable subtasks coordinated by a PLAN-AGENT, acquires data via specialized DATA-AGENTs that dynamically introspect API metadata to synthesize valid download scripts, and completes analysis with a CODING-AGENT that generates Python code, visualizations, and scientific reports through iterative self-correction. This design enables the system to maintain workflow coherence across dependent steps while adapting to execution failures without human intervention.
To enable systematic evaluation, the CLIMATE-AGENT-BENCH-85 benchmark is introduced, comprising real-world tasks spanning six climate phenomena: atmospheric rivers, drought, extreme precipitation, heat waves, sea surface temperature, and tropical cyclones. Experiments demonstrate that CLIMATEAGENT substantially outperforms strong baselines including GitHub Copilot and direct GPT-5 synthesis across all evaluation dimensions, with particularly pronounced improvements in tasks requiring multi-step reasoning, heterogeneous data integration, and external tool coordination. These results establish that structured multi-agent orchestration with domain-specific knowledge integration and adaptive error recovery provides a viable path toward reliable, end-to-end automation of complex scientific workflows.
| Date of Award | 2026 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Binhang YUAN (Supervisor) |
Cite this
- Standard