AgentScout Logo Agent Scout

ArXiv cs.AI 周报:AI 智能体领域每周论文追踪(2026 年 5 月第一周)

本周 ArXiv cs.AI 类别共收录 98 篇论文,其中 30 篇聚焦智能体相关研究。多智能体推理实现 Pareto-optimal 测试时扩展,突破单智能体计算效率瓶颈;Agent Capsules 通过质量门控粒度控制减少 51% token 消耗;RAG-Gym 提供语言智能体检索增强生成的系统化优化框架。

AgentScout · · · 10 分钟阅读
#arxiv #ai-agents #multi-agent #rag #reasoning #llm
Analyzing Data Nodes...
SIG_CONF:CALCULATING
Verified Sources

数据概览

  • 快照周次:2026-05-01 至 2026-05-07
  • 追踪器:ArXiv AI 智能体论文(查看所有快照:/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly
  • 更新频率:每周
  • 主要来源ArXiv cs.AI RSSArXiv cs.CL RSSArXiv API

关键事实

  • :本周 cs.AI 共收录 98 篇论文;涉及 cs.AI、cs.CL 和 cs.MA 类别的 30 篇智能体相关投稿
  • 什么:多智能体推理论文引入 Pareto-optimal 测试时扩展;Agent Capsules 实现 51% token 缩减;5 篇论文趋势评分达 7+
  • 何时:2026 年 5 月 1 日至 7 日当周
  • 影响:多智能体框架以 15 篇论文占据主导;RAG 优化研究环比增长 50%;token 效率成为关键设计考量因素

方法论

本追踪器监控 ArXiv cs.AI 和 cs.CL 类别的 AI 智能体相关投稿。论文筛选标准包括与智能体、多智能体系统、工具调用、推理和检索增强生成(RAG)的相关性。趋势评分(1-10 分)基于新颖性、引用速度、实用性和行业相关性趋势进行评定。数据通过 ArXiv RSS 订阅源和 ArXiv API 查询采集。本快照覆盖 2026 年 5 月 1 日至 7 日期间发表的论文。

本周数据

趋势评分 Top 20 论文

ArXiv ID标题趋势评分关键主题
2605.01566Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling9多智能体、推理、测试时扩展、计算效率
2605.00410Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines8多智能体、流水线优化、token 效率、质量门控
2502.13957RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation8RAG、智能体优化、系统化框架、语言智能体
2605.0198612 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation7多智能体、决策制定、评测、陪审团模拟
2605.01604Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework7Agentic AI、评测、失效模式、生产部署
2605.03534SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG7RAG、证据验证、不确定性感知、选择性检索
2605.03476CuraView: Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification7多智能体、RAG、幻觉检测、医疗 AI、GraphRAG
2412.20138TradingAgents: Multi-Agents LLM Financial Trading Framework7多智能体、LLM、交易、金融分析
2410.02958AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML7多智能体、LLM、AutoML、自动化
2412.12881RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement7RAG、推理、MCTS、验证、精化
2605.01920A Language for Describing Agentic LLM Contexts6Agentic、LLM、上下文规范、形式语言
2605.01208Faithful Mobile GUI Agents with Guided Advantage Estimator6智能体、GUI、移动端、优势估计器
2605.01293Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning6Agentic、神经符号、技能归纳、长时域任务
2605.01879Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems6多智能体、自主、规划、层论
2605.02910CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing6智能体、推理、创造力、工具重用
2605.02967AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines6RAG、优化、流水线、声明式
2605.03228MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory6智能体、LLM、安全、影子记忆、长时域威胁
2605.04018Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems6Agentic、推理、检索、搜索系统
2401.07324Small LLMs Are Weak Tool Learners: A Multi-LLM Agent6多智能体、工具调用、LLM、分解
2605.01102Towards Multi-Agent Autonomous Reasoning in Hydrodynamics5多智能体、自主、推理、流体动力学

重点论文摘要

多智能体推理(2605.01566) — 证明多智能体推理在测试时扩展中实现 Pareto-optimal 计算效率,系统性超越单智能体方法。论文提供实证证据表明,将推理分布在多个智能体上,每单位计算量的性能优于扩展单智能体推理。

Agent Capsules(2605.00410) — 引入自适应执行运行时,将多智能体流水线执行视为带有经验质量约束的优化问题。相比手工实现实现 51% token 缩减,同时通过动态粒度控制维持质量阈值。

RAG-Gym(2502.13957) — 针对 RAG 语言智能体的系统化优化综合平台,覆盖三个维度:提示工程、actor 调优和 critic 训练。为 RAG 智能体开发提供可复现基准和优化协议。

12 Angry AI Agents(2605.01986) — 通过电影陪审团审议场景评测多智能体 LLM 决策能力,测试协作、共识形成以及在冲突证据条件下的审慎推理。

环比摘要

指标本周上周变化
总论文数(cs.AI)9830+227%
智能体相关论文3025+20%
多智能体论文15150%
RAG 相关论文128+50%
高影响力论文(趋势评分 7+)108+25%
推理聚焦论文810-20%
工具调用论文440%

上周快照:arxiv-cs-ai-weekly-20260430

生态指标

类别分布

类别论文数占比
cs.AI(人工智能)4545.9%
cs.CL(计算与语言)3535.7%
cs.MA(多智能体系统)88.2%
cs.LG(机器学习)55.1%
其他55.1%

本周热门主题

主题论文数代表论文
多智能体 LLM 框架152605.01566, 2605.00410, 2605.01986
RAG 优化与评测122502.13957, 2605.03534, 2605.03476
智能体推理与决策82605.01566, 2605.02910, 2605.04018
工具调用与函数调用42605.02910, 2401.07324
自主系统设计62605.01879, 2605.01102, 2605.01293

关键词频率

关键词频率环比变化
agent28+7%
multi-agent150%
RAG12+50%
reasoning8-20%
autonomous6+50%
LLM6+20%
optimization5+67%
evaluation5+25%
tool-use40%
safety3+50%

趋势与观察

涌现模式

  1. 多智能体推理中的 Pareto-optimal scaling — 首批明确解决多智能体测试时扩展计算效率权衡的论文出现,超越单智能体优化范式。

  2. Token 效率成为一等设计约束 — Agent Capsules 的 51% token 缩减标志着从能力导向转向效率导向的多智能体流水线设计。

  3. 系统化 RAG 优化框架 — RAG-Gym 引入 Gym 风格环境用于可复现的 RAG 智能体优化,类似强化学习训练范式。

  4. GraphRAG 集成用于幻觉检测 — CuraView 展示了结合知识图谱与多智能体验证用于医疗 AI 可靠性的方法。

  5. 生产评测框架涌现 — 多篇论文涉及 Agentic AI 系统的失效模式、漂移模式和部署监控。

与上周的显著变化

  • 首次出现 Pareto-optimal scaling 论文 — 明确的多智能体计算效率优化
  • 质量门控粒度控制 — 自适应多智能体流水线执行的新范式
  • RAG 相关论文环比增长 50% — 检索与智能体架构持续融合
  • Token 分配成为系统设计原则 — 边际 token 分配器框架被提出

🔺 独家情报:别处看不到的洞察

置信度: 高 | 新颖度评分: 62/100

常规报道追踪单篇论文发布,但本周 30 篇智能体论文的聚合信号揭示了研究重心的战略转向。多智能体推理(2605.01566)证明多智能体方法实现 Pareto-optimal scaling,从根本上挑战了单智能体扩展的正统观念。Agent Capsules 的 51% token 缩减(2605.00410)验证了效率提升来自架构优化,而非仅仅依赖模型改进。RAG-Gym(2502.13957)引入可复现优化协议,解决了”每个 RAG 系统都是定制开发”这一阻碍企业采用的问题。

关键启示:平台团队应立即投资多智能体编排基础设施 — 单智能体扩展已呈边际收益递减,51% token 效率提升在生产规模下直接转化为成本降低。RAG-Gym 的系统化方法支持标准化评测,加速从原型到生产的路径。

历史快照

查看所有历史快照:/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly

信息来源


完整论文列表(30 篇)
ArXiv ID标题作者类别发表日期趋势评分
2605.01566Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time ScalingFlorian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gippcs.AI2026-05-069
2605.00410Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM PipelinesAninda Raycs.CL, cs.AI2026-05-018
2502.13957RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented GenerationGuangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhangcs.CL, cs.AI2025-02-198
2605.0198612 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury DeliberationAhmet Bahaddin Ersozcs.AI2026-05-067
2605.01604Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation FrameworkMukund Pandeycs.AI2026-05-067
2605.03534SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAGJingxi Qiu, Zeyu Han, Cheng Huangcs.CL2026-05-067
2605.03476CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge VerificationSeverin Ye, Xiao Kong, Xiaopeng He, Guangsu Yan, Dongsuk Ohcs.CL2026-05-067
2412.20138TradingAgents: Multi-Agents LLM Financial Trading FrameworkYijia Xiao, Edward Sun, Di Luo, Wei Wangq-fin.TR, cs.AI2024-12-287
2410.02958AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoMLPatara Trirat, Wonyong Jeong, Sung Ju Hwangcs.LG, cs.AI2024-10-037
2412.12881RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and RefinementJinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao, Yang Song, Tao Zhangcs.CL, cs.AI2024-12-177
2605.01920A Language for Describing Agentic LLM ContextsNoga Peleg Pelc, Gal A. Kaminka, Yoav Goldbergcs.AI2026-05-066
2605.01208Faithful Mobile GUI Agents with Guided Advantage EstimatorHaowen Hu, Pengzhou Cheng, Zheng Wu, Lingzhong Dong, Gongshen Liu, Zhuosheng Zhangcs.AI2026-05-066
2605.01293Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic TasksJie-Jing Shao, Haiyan Yin, Yueming Lyu, Xingrui Yu, Lan-Zhe Guo, Ivor Tsang, James Kwok, Yu-Feng Lics.AI2026-05-066
2605.01879Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous SystemsManuel Hernandez, Eduardo Sanchez-Sotocs.AI2026-05-066
2605.02910CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool RepurposingCheng Qian, Hyeonjeong Ha, Jiayu Liu, Bingxiang He, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Jics.AI2026-05-066
2605.02967AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG PipelinesXintan Zeng, Yongchao Liu, Yice Luo, Jiajun Zhencs.AI2026-05-066
2605.03228MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow MemoryAlexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjescs.CL2026-05-066
2605.04018Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search SystemsYilun Zhao, Jinbiao Wei, Tingyu Song, Siyue Zhang, Chen Zhao, Arman Cohancs.CL2026-05-066
2401.07324Small LLMs Are Weak Tool Learners: A Multi-LLM AgentWeizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huangcs.AI, cs.CL2024-01-146
2605.01102Towards Multi-Agent Autonomous Reasoning in HydrodynamicsJinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawsoncs.AI2026-05-065
2605.01101Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy AgentShakeel Sheikh, Patrick Marmaroli, MD Sahidullah, Slim Ouni, Fabrice Hirsch, Goncalo Leal, Bjorn W Schullercs.AI2026-05-065
2605.00841AI Agents for Sustainable SMEs: A Green ESG Assessment FrameworkViet Trinh, Tan Nguyen, Minh-Huyen Phan, Quan Luucs.AI2026-05-065
2605.01214Agentic AI Systems Should Be Designed as Marginal Token AllocatorsSiqi Zhucs.AI2026-05-065
2605.01758Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent SystemsYue Ma, Ziyuan Yang, Yi Zhangcs.AI2026-05-065
2605.01675CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized CheckersYuliang Song, Eldan Cohencs.AI2026-05-065
2605.03314When to Think, When to Speak: Learning Disclosure Policies for LLM ReasoningJiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu Youcs.CL2026-05-065
2605.00846ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable CitationsNavapat Nananukul, Mayank Kejriwalcs.AI2026-05-065
2605.01789DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop AgentsQisong Zhang, Wenzhuo Wu, Zhuangzhuang Jia, Yunhao Yang, Huayu Zhang, Xianghao Zang, Zhixiang He, Zhongjiang He, Kongming Liang, Zhanyu Macs.AI2026-05-065
2605.01847NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent ProfilesJia Xiaocs.AI2026-05-065

ArXiv cs.AI 周报:AI 智能体领域每周论文追踪(2026 年 5 月第一周)

本周 ArXiv cs.AI 类别共收录 98 篇论文,其中 30 篇聚焦智能体相关研究。多智能体推理实现 Pareto-optimal 测试时扩展,突破单智能体计算效率瓶颈;Agent Capsules 通过质量门控粒度控制减少 51% token 消耗;RAG-Gym 提供语言智能体检索增强生成的系统化优化框架。

AgentScout · · · 10 分钟阅读
#arxiv #ai-agents #multi-agent #rag #reasoning #llm
Analyzing Data Nodes...
SIG_CONF:CALCULATING
Verified Sources

数据概览

  • 快照周次:2026-05-01 至 2026-05-07
  • 追踪器:ArXiv AI 智能体论文(查看所有快照:/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly
  • 更新频率:每周
  • 主要来源ArXiv cs.AI RSSArXiv cs.CL RSSArXiv API

关键事实

  • :本周 cs.AI 共收录 98 篇论文;涉及 cs.AI、cs.CL 和 cs.MA 类别的 30 篇智能体相关投稿
  • 什么:多智能体推理论文引入 Pareto-optimal 测试时扩展;Agent Capsules 实现 51% token 缩减;5 篇论文趋势评分达 7+
  • 何时:2026 年 5 月 1 日至 7 日当周
  • 影响:多智能体框架以 15 篇论文占据主导;RAG 优化研究环比增长 50%;token 效率成为关键设计考量因素

方法论

本追踪器监控 ArXiv cs.AI 和 cs.CL 类别的 AI 智能体相关投稿。论文筛选标准包括与智能体、多智能体系统、工具调用、推理和检索增强生成(RAG)的相关性。趋势评分(1-10 分)基于新颖性、引用速度、实用性和行业相关性趋势进行评定。数据通过 ArXiv RSS 订阅源和 ArXiv API 查询采集。本快照覆盖 2026 年 5 月 1 日至 7 日期间发表的论文。

本周数据

趋势评分 Top 20 论文

ArXiv ID标题趋势评分关键主题
2605.01566Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling9多智能体、推理、测试时扩展、计算效率
2605.00410Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines8多智能体、流水线优化、token 效率、质量门控
2502.13957RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation8RAG、智能体优化、系统化框架、语言智能体
2605.0198612 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation7多智能体、决策制定、评测、陪审团模拟
2605.01604Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework7Agentic AI、评测、失效模式、生产部署
2605.03534SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG7RAG、证据验证、不确定性感知、选择性检索
2605.03476CuraView: Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification7多智能体、RAG、幻觉检测、医疗 AI、GraphRAG
2412.20138TradingAgents: Multi-Agents LLM Financial Trading Framework7多智能体、LLM、交易、金融分析
2410.02958AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML7多智能体、LLM、AutoML、自动化
2412.12881RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement7RAG、推理、MCTS、验证、精化
2605.01920A Language for Describing Agentic LLM Contexts6Agentic、LLM、上下文规范、形式语言
2605.01208Faithful Mobile GUI Agents with Guided Advantage Estimator6智能体、GUI、移动端、优势估计器
2605.01293Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning6Agentic、神经符号、技能归纳、长时域任务
2605.01879Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems6多智能体、自主、规划、层论
2605.02910CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing6智能体、推理、创造力、工具重用
2605.02967AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines6RAG、优化、流水线、声明式
2605.03228MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory6智能体、LLM、安全、影子记忆、长时域威胁
2605.04018Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems6Agentic、推理、检索、搜索系统
2401.07324Small LLMs Are Weak Tool Learners: A Multi-LLM Agent6多智能体、工具调用、LLM、分解
2605.01102Towards Multi-Agent Autonomous Reasoning in Hydrodynamics5多智能体、自主、推理、流体动力学

重点论文摘要

多智能体推理(2605.01566) — 证明多智能体推理在测试时扩展中实现 Pareto-optimal 计算效率,系统性超越单智能体方法。论文提供实证证据表明,将推理分布在多个智能体上,每单位计算量的性能优于扩展单智能体推理。

Agent Capsules(2605.00410) — 引入自适应执行运行时,将多智能体流水线执行视为带有经验质量约束的优化问题。相比手工实现实现 51% token 缩减,同时通过动态粒度控制维持质量阈值。

RAG-Gym(2502.13957) — 针对 RAG 语言智能体的系统化优化综合平台,覆盖三个维度:提示工程、actor 调优和 critic 训练。为 RAG 智能体开发提供可复现基准和优化协议。

12 Angry AI Agents(2605.01986) — 通过电影陪审团审议场景评测多智能体 LLM 决策能力,测试协作、共识形成以及在冲突证据条件下的审慎推理。

环比摘要

指标本周上周变化
总论文数(cs.AI)9830+227%
智能体相关论文3025+20%
多智能体论文15150%
RAG 相关论文128+50%
高影响力论文(趋势评分 7+)108+25%
推理聚焦论文810-20%
工具调用论文440%

上周快照:arxiv-cs-ai-weekly-20260430

生态指标

类别分布

类别论文数占比
cs.AI(人工智能)4545.9%
cs.CL(计算与语言)3535.7%
cs.MA(多智能体系统)88.2%
cs.LG(机器学习)55.1%
其他55.1%

本周热门主题

主题论文数代表论文
多智能体 LLM 框架152605.01566, 2605.00410, 2605.01986
RAG 优化与评测122502.13957, 2605.03534, 2605.03476
智能体推理与决策82605.01566, 2605.02910, 2605.04018
工具调用与函数调用42605.02910, 2401.07324
自主系统设计62605.01879, 2605.01102, 2605.01293

关键词频率

关键词频率环比变化
agent28+7%
multi-agent150%
RAG12+50%
reasoning8-20%
autonomous6+50%
LLM6+20%
optimization5+67%
evaluation5+25%
tool-use40%
safety3+50%

趋势与观察

涌现模式

  1. 多智能体推理中的 Pareto-optimal scaling — 首批明确解决多智能体测试时扩展计算效率权衡的论文出现,超越单智能体优化范式。

  2. Token 效率成为一等设计约束 — Agent Capsules 的 51% token 缩减标志着从能力导向转向效率导向的多智能体流水线设计。

  3. 系统化 RAG 优化框架 — RAG-Gym 引入 Gym 风格环境用于可复现的 RAG 智能体优化,类似强化学习训练范式。

  4. GraphRAG 集成用于幻觉检测 — CuraView 展示了结合知识图谱与多智能体验证用于医疗 AI 可靠性的方法。

  5. 生产评测框架涌现 — 多篇论文涉及 Agentic AI 系统的失效模式、漂移模式和部署监控。

与上周的显著变化

  • 首次出现 Pareto-optimal scaling 论文 — 明确的多智能体计算效率优化
  • 质量门控粒度控制 — 自适应多智能体流水线执行的新范式
  • RAG 相关论文环比增长 50% — 检索与智能体架构持续融合
  • Token 分配成为系统设计原则 — 边际 token 分配器框架被提出

🔺 独家情报:别处看不到的洞察

置信度: 高 | 新颖度评分: 62/100

常规报道追踪单篇论文发布,但本周 30 篇智能体论文的聚合信号揭示了研究重心的战略转向。多智能体推理(2605.01566)证明多智能体方法实现 Pareto-optimal scaling,从根本上挑战了单智能体扩展的正统观念。Agent Capsules 的 51% token 缩减(2605.00410)验证了效率提升来自架构优化,而非仅仅依赖模型改进。RAG-Gym(2502.13957)引入可复现优化协议,解决了”每个 RAG 系统都是定制开发”这一阻碍企业采用的问题。

关键启示:平台团队应立即投资多智能体编排基础设施 — 单智能体扩展已呈边际收益递减,51% token 效率提升在生产规模下直接转化为成本降低。RAG-Gym 的系统化方法支持标准化评测,加速从原型到生产的路径。

历史快照

查看所有历史快照:/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly

信息来源


完整论文列表(30 篇)
ArXiv ID标题作者类别发表日期趋势评分
2605.01566Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time ScalingFlorian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gippcs.AI2026-05-069
2605.00410Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM PipelinesAninda Raycs.CL, cs.AI2026-05-018
2502.13957RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented GenerationGuangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhangcs.CL, cs.AI2025-02-198
2605.0198612 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury DeliberationAhmet Bahaddin Ersozcs.AI2026-05-067
2605.01604Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation FrameworkMukund Pandeycs.AI2026-05-067
2605.03534SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAGJingxi Qiu, Zeyu Han, Cheng Huangcs.CL2026-05-067
2605.03476CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge VerificationSeverin Ye, Xiao Kong, Xiaopeng He, Guangsu Yan, Dongsuk Ohcs.CL2026-05-067
2412.20138TradingAgents: Multi-Agents LLM Financial Trading FrameworkYijia Xiao, Edward Sun, Di Luo, Wei Wangq-fin.TR, cs.AI2024-12-287
2410.02958AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoMLPatara Trirat, Wonyong Jeong, Sung Ju Hwangcs.LG, cs.AI2024-10-037
2412.12881RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and RefinementJinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao, Yang Song, Tao Zhangcs.CL, cs.AI2024-12-177
2605.01920A Language for Describing Agentic LLM ContextsNoga Peleg Pelc, Gal A. Kaminka, Yoav Goldbergcs.AI2026-05-066
2605.01208Faithful Mobile GUI Agents with Guided Advantage EstimatorHaowen Hu, Pengzhou Cheng, Zheng Wu, Lingzhong Dong, Gongshen Liu, Zhuosheng Zhangcs.AI2026-05-066
2605.01293Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic TasksJie-Jing Shao, Haiyan Yin, Yueming Lyu, Xingrui Yu, Lan-Zhe Guo, Ivor Tsang, James Kwok, Yu-Feng Lics.AI2026-05-066
2605.01879Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous SystemsManuel Hernandez, Eduardo Sanchez-Sotocs.AI2026-05-066
2605.02910CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool RepurposingCheng Qian, Hyeonjeong Ha, Jiayu Liu, Bingxiang He, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Jics.AI2026-05-066
2605.02967AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG PipelinesXintan Zeng, Yongchao Liu, Yice Luo, Jiajun Zhencs.AI2026-05-066
2605.03228MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow MemoryAlexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjescs.CL2026-05-066
2605.04018Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search SystemsYilun Zhao, Jinbiao Wei, Tingyu Song, Siyue Zhang, Chen Zhao, Arman Cohancs.CL2026-05-066
2401.07324Small LLMs Are Weak Tool Learners: A Multi-LLM AgentWeizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huangcs.AI, cs.CL2024-01-146
2605.01102Towards Multi-Agent Autonomous Reasoning in HydrodynamicsJinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawsoncs.AI2026-05-065
2605.01101Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy AgentShakeel Sheikh, Patrick Marmaroli, MD Sahidullah, Slim Ouni, Fabrice Hirsch, Goncalo Leal, Bjorn W Schullercs.AI2026-05-065
2605.00841AI Agents for Sustainable SMEs: A Green ESG Assessment FrameworkViet Trinh, Tan Nguyen, Minh-Huyen Phan, Quan Luucs.AI2026-05-065
2605.01214Agentic AI Systems Should Be Designed as Marginal Token AllocatorsSiqi Zhucs.AI2026-05-065
2605.01758Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent SystemsYue Ma, Ziyuan Yang, Yi Zhangcs.AI2026-05-065
2605.01675CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized CheckersYuliang Song, Eldan Cohencs.AI2026-05-065
2605.03314When to Think, When to Speak: Learning Disclosure Policies for LLM ReasoningJiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu Youcs.CL2026-05-065
2605.00846ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable CitationsNavapat Nananukul, Mayank Kejriwalcs.AI2026-05-065
2605.01789DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop AgentsQisong Zhang, Wenzhuo Wu, Zhuangzhuang Jia, Yunhao Yang, Huayu Zhang, Xianghao Zang, Zhixiang He, Zhongjiang He, Kongming Liang, Zhanyu Macs.AI2026-05-065
2605.01847NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent ProfilesJia Xiaocs.AI2026-05-065
kgm35luz1nbvgu2bsddds░░░qwu0f9lygxn01blusip808mxkftvha6uzr░░░4jwfd0iblp3b78f2s8ptajr2y81su9xwl░░░axbbprn3k4a61vcwvyz89scrp3k43ml████vpa3giq79cskszr6u2wobsy5mxiarbr7████bp87dgb5y4dheap9o2qehdxw6u5mg6zd░░░pvwdo49ot7fa6t7tvvlph44yzbaq7rui░░░iguus5tr0ow12oqolifzg22vvjqqgnc3h████e2tosich9798wbx0huevo1bk5r653jbe████4w9rrs6uxfkaepja914rj8w3hh46a13u████kjltwihgiteypj6godqkiaw3q194x0u6████j4dzfnbml8efyrd0wd5t9mydk063pfd░░░gv22apwa2mkmv8fxr5g7gecjii17nin████vj33fxklqrmf9nr91im1ih78iyk87k6████zdn8sqwtysfkbxf0srvff8k3bohel3snk░░░bji1o0bk3qitukchfqbkma6scwkzw7es████bzsp3s41mhrm2mjf64u5rp515enxujy████h49j131mmxmemwi5dmjjnq660qs2bvyne░░░cwzqy3bu8rl6lz03n35lzeu484rrc7co████qhgwor31mid0m88avqpxih6cdic7jkamf████m2tf3ks51pfhxlcwskmaitd7sd0sqoqxh████jj95upg4ncsgndt6vp3ly9blni9dt07p████zt5r1l0ezanngihtbkse4p2hmauund5dx████c7wmsodcjz77y8e6oum9rmjs6he2qx82n░░░2lf9bdjwc3v0tiezypojb4h8t9cevdqg░░░z58vugy2lvnjesvo5zrfw7kcbmfndjqpn████6s31nnzp688sdgdhru1zvifp8m56csh4m░░░ut2ibftf19iurzexgz5rtraccn5ds6b░░░yuuxo97vzknuw4so2of7mgx1smpssk79e████8c7iujvxhdy7ez021w1hautql310jd████mssnskp5hfl6n1bxv9tjiuvs7413oisn░░░2d6xb8bwgjeyhgd82j9v4ixac7gbummzl████mx57h4zrrvx7mntnsf532880iis1dng████0887xtwvp6do569qlsr3hgtlj3kbxkw65g████mxye13p3yb0302ei7av4j69qe7ynd25jn████vj6yqv7yu45mr726eseh4r8vn6pn6298░░░9govrcbzipy9kbj7hi3ren0ycx615sa████wrgomfme1qdxofeylrzy2ws2rjdhs5d8░░░1bp5tsuqc5e7mmfw7gaya5tyw9mwtzw5░░░4rojc33un8bj5n6m4qkd9berho1nppjgh░░░jxyttela0e8seeo0ewxuxwvqtzgfkue░░░uhm1iczts394kiujsxxl4ozugk9sh0j████ugao09eb41ggs0ckmg8s2jk0wa6j9947q████xfzkhjfqt5r1r1hlb9norm8l9zj7317n████qg75ths24dvqy5xmrfblswhjgnqti████sh14vwjny5afwrzwa1pikc2nka0q5jegy████klj1773zy859jj0ijv4x4us0l2nuo9m████b7t9ltoeieffnpidt4vyuw4qx1mx36mf8████v97mf0uuwfeuqy6sjo2yzhum6cge2jqqk████6cwtoe22oks3m2a1d1yj4ogo1i5dz7yxt░░░8g24y2q90p