AgentScout Logo Agent Scout

ArXiv 智能体论文周报:自进化架构与分布式网络取得重大突破

本周追踪三十五篇人工智能领域论文,揭示自进化智能体、分布式网络架构和创意领域基准测试的三大突破性进展。研究显示,九十亿参数的进化模型直接挑战三千九百七十亿参数的前沿模型,游戏创作基准测试则暴露出前沿模型在创意任务中的显著短板。

AgentScout · · · 8 分钟阅读
#ai-agents #arxiv #research-papers #agent-benchmarks #self-evolving-agents
Analyzing Data Nodes...
SIG_CONF:CALCULATING
Verified Sources

数据概览

关键事实

  • 数量:共 35 篇论文,其中 28 篇为 Agent 相关(80%),6 篇为多 Agent 系统,3 篇为自进化 Agent
  • 内容:引入 7 个新基准测试;Agent 相关论文平均趋势评分达 8.1(较上周 7.4 上升)
  • 时间:2026 年 6 月 18 日当周
  • 影响:OPD-Evolver、GameCraft-Bench 和分布式 Agent 网络成为高评分论文(趋势评分 10/10)

方法论

本追踪器每周监控 ArXiv cs.AI 和 cs.CL RSS 订阅源,筛选 Agent 相关研究。论文评分采用综合趋势评分(1-10),基于以下维度:新颖性、引用潜力、基准测试贡献和社区参与度(HuggingFace 点赞数)。Agent 相关论文通过标题和摘要关键词匹配识别。数据采集通过 Jina Reader API 完成;ArXiv API 直接访问仍被封锁。

本周指标

指标本周上周变化
论文总数3531+4
Agent 相关28280
Agent 占比80%90%-10pp
新基准测试770
平均趋势评分(Agent)8.17.4+0.7
多 Agent 论文64+2
自进化 Agent32+1

本周高关注论文

标题ArXiv ID趋势评分核心主题
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation2606.1762810agent evolution, self-evolving agents, memory hierarchy
Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes2606.1736810distributed agents, P2P networks, multi-agent systems
GameCraft-Bench: Can Agents Build Playable Games End-to-End?2606.1786110game generation agents, coding benchmarks, creative agents
Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search2606.172099agentic search, multi-hop reasoning, query diversification
When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval2606.172209self-evolving agents, legal AI, rule evolution
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning2606.176829multi-agent reasoning, RL agents, environment design
SEAGym: An Evaluation Environment for Self-Evolving LLM Agents2606.175469self-evolving agents, agent evaluation, evolution tracking
EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks2606.176989shopping agents, long-horizon tasks, hidden intent
Dissecting Model Behavior through Agent Trajectories2606.174549trajectory analysis, agent behavior, harness design

显著基准测试

基准测试ArXiv ID领域核心发现
GameCraft-Bench2606.17861游戏生成首个端到端游戏生成基准测试(Godot 引擎);前沿模型成功率仅 41.46%
EComAgentBench2606.17698电商662 个购物任务,包含分布式隐藏意图;最佳模型准确率 57.1%
SEAGym2606.17546Agent 进化追踪自进化 Agent 在训练/验证/测试/回放/成本各阶段的 Harness 更新
MapSatisfyBench2606.17453导航评估感知满意度的地图 Agent,隐式决策因子来自真实用户数据
CEO-Bench2606.17459战略多 Agent 高管模拟的战略资源重新分配;揭示单一顾问捕获失效模式
MemTrace2606.17328记忆长期记忆基准测试,揭示证据使用瓶颈主导失败
LongWebBench2606.17727网页生成490 个结构化 + 507 个功能性任务,面向长时程网页生成

热门话题

话题论文数量平均趋势评分代表论文
自进化 Agent39.3OPD-Evolver, When Rules Learn, SEAGym
分布式 Agent110.0Distributed General-Purpose Agent Networks
多 Agent 系统68.2CEO-Bench, Trainee to Trainer, Parasocial Scripts
Agent 基准测试77.9GameCraft-Bench, EComAgentBench, SEAGym
Agent 记忆47.5MemSlides, FinAcumen, MemTrace
Agent 搜索19.0DivInit

🔺 独家情报:别处看不到的洞察

置信度: 高 | 新颖度评分: 62/100

虽然单篇论文在 HuggingFace 上获得关注,但本周 35 篇论文的集体信号揭示了三个结构性转变,这些是大多数报道所忽略的:

1. 自进化 Agent 正在缩小参数差距。 OPD-Evolver 的 9B 参数模型超越 ReasoningBank 11.5% 和 Skill0 5.8%,直接挑战 397B 前沿模型。这不是增量改进——它表明结构化记忆层次(OPD-Evolver 的四级架构)可以替代原始规模。对于 Agent 进化任务,架构比参数数量更重要。

2. 创意领域基准测试暴露前沿模型局限性。 GameCraft-Bench 显示,即使最强的代码 Agent 在端到端游戏生成上也仅达到 41.46% 的成功率。EComAgentBench 的最佳模型在包含分散需求的购物任务上仅达到 57.1%。这些结果与传统基准测试上 90%+ 的分数形成鲜明对比,揭示前沿模型在需要长时程规划和隐式需求发现的多步骤创意任务上仍面临挑战。

3. 分布式 P2P Agent 网络作为架构替代方案兴起。 关于分布式通用 Agent 网络的论文(趋势评分 10)引入了首个系统性的点对点 Agent 协作框架,采用基于 BAID 的身份绑定和 MG-EigenTrust 信誉机制。这将范式从单一 Agent 编排(LangChain、CrewAI)转向去中心化 Agent 网络——这是当前主要框架均未涉及的方向。

关键启示: 构建 Agent 系统的企业团队应优先考虑记忆架构设计(OPD-Evolver 的慢快协同进化),而非模型参数数量,并为分布式 Agent 网络做好准备——这是当前编排框架之后的下一个架构演进方向。

趋势与观察

  • 自进化框架激增:本周三篇论文聚焦具有显式记忆层次的自进化 Agent,较上周两篇有所增加。相对 ReasoningBank 提升 11.5% 表明慢快协同进化架构正在成熟。

  • 基准测试转向复杂真实任务:七个新基准测试针对多步推理、创意生成和隐藏意图发现——从单轮任务转向需要持续 Agent 推理的场景。

  • 规模化轨迹分析:本周分析了 138k 条 Agent 轨迹,揭示模型特定的行为模式。这种定量的 Agent 行为分析方法正在成为标准评估工具。

  • Agent 记忆架构多样化:出现四种不同的记忆方法——层次化(MemSlides)、基于经验(FinAcumen)、长期(MemTrace)和进化追踪(SEAGym)。尚无共识架构;该领域正在探索多个设计方向。

  • 长时程推理获得关注:多个基准测试(EComAgentBench、LongWebBench、GameCraft-Bench)专门针对需要 10 步以上的任务,表明该领域正从单轮转向持续推理。

周度对比总结

指标本周上周变化
追踪论文数3531+4
Agent 相关论文28280
Agent 占比80%90%-10pp
平均趋势评分(Agent)8.17.4+0.7
多 Agent 论文64+2
自进化 Agent32+1
引入基准测试770
趋势评分 ≥ 99 篇4 篇+5

显著变化:Agent 论文平均趋势评分周环比上升 0.7 分,由三篇趋势评分 10 的论文驱动(OPD-Evolver、分布式 Agent 网络、GameCraft-Bench)。这表明 Agent 领域的研究质量集中度更高。

完整论文列表

标题作者分类发布日期评分ArXivHF
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy DistillationNUS Research Teamcs.AI2026-06-17102606.17628链接
Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and PrototypesMultiple authorscs.AI2026-06-17102606.17368
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?CUHKSZcs.AI2026-06-17102606.17861链接
Beyond Parallel Sampling: Diverse Query Initialization for Agentic SearchCMU Research Teamcs.AI2026-06-1792606.17209
When Rules Learn: A Self-Evolving Agent for Legal Case RetrievalMultiple authorscs.AI2026-06-1792606.17220
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent ReasoningMultiple authorscs.AI2026-06-1792606.17682
SEAGym: An Evaluation Environment for Self-Evolving LLM AgentsMultiple authorscs.AI2026-06-1792606.17546
EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden IntentMultiple authorscs.AI2026-06-1792606.17698
Dissecting Model Behavior through Agent TrajectoriesMultiple authorscs.AI2026-06-1792606.17454
Scaling Enterprise Agent Routing: Degradation, Diagnosis, and RecoveryMultiple authorscs.AI2026-06-1782606.17519
Can LLMs Be CEOs? Benchmarking Strategic Resource Reallocation with Multi-Role Agent SimulationMultiple authorscs.AI2026-06-1782606.17459
Environment-Grounded Automated Prompt Optimization for LLM Game AgentsMultiple authorscs.AI2026-06-1782606.17838
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide GenerationYe Jin, Yangyang Xu, Jun Zhu, Yibo Yangcs.CL2026-06-1782606.17162
MapSatisfyBench: Benchmarking Satisfaction-Aware Map AgentsMultiple authorscs.AI2026-06-1782606.17453
Closing the Feedback Loop: From Experience Extraction to Insight Governance in Verbal Reinforcement LearningMultiple authorscs.AI2026-06-1782606.17591
StepGuard: Guarding Web Navigation via Single-Step CalibrationMultiple authorscs.AI2026-06-1782606.17871
FinAcumen: Financial Multimodal Reasoning via Self-Evolving Experience Memory HarnessMultiple authorscs.AI2026-06-1782606.17642
Beyond Domains: Reusing Web Skills via Transferable Interaction PatternsMultiple authorscs.AI2026-06-1782606.17645
Surrogate Assisted Pedestrian Protection Design via a Foundation Model Orchestrated WorkflowMultiple authorscs.AI2026-06-1772606.17577
DecoSearch: Complexity-Aware Routing and Plan-Level Repair for Text-to-SQLMultiple authorscs.AI2026-06-1772606.17821
LLM-as-Judge in Education: A Curriculum-Grounded Marking PipelineMultiple authorscs.AI2026-06-1772606.17507
AIPatient Arena: EHR-grounded evaluation of LLMs in clinical consultation workflowsMultiple authorscs.AI2026-06-1772606.17474
From Parasocial Scripts to Dyadic Persistence in Autonomous AI-Agent CommunitiesMohammadsadegh Abolhasani et al.cs.CL2026-06-1772606.17174
LecturaAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted LearningMultiple authorscs.CL2026-06-1572606.16428链接
DeepInsight: A Unified Evaluation Infrastructure Across the Physical AI StackMultiple authorscs.AI2026-06-1772606.17574
FlowRAG: Synergizing Explicit Reasoning via Frequency-Aware Multi-Granularity Graph FlowMultiple authorscs.AI2026-06-1772606.17856
MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation EvaluationMultiple authorscs.CL2026-06-1772606.17449
Brick-DICL: Dynamic In-Context Learning for Automated Brick Schema ClassificationMultiple authorscs.AI2026-06-1772606.17637
LongWebBench: Evaluating Structural and Functional Webpage Generation in Long-Horizon SettingsMultiple authorscs.AI2026-06-1772606.17727
MemTrace: Probing What Final Accuracy Misses in Long-Term MemoryMultiple authorscs.AI2026-06-1772606.17328
PromptMN: Pseudo Prompting LanguageEnkhzol Dovdoncs.CL2026-06-1762606.17164
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling19 authorscs.AI2026-06-1762606.18023链接
Zone of Proximal Policy Optimization: Teacher in Prompts, Not GradientsNVIDIAcs.AI2026-06-1762606.18216链接
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA PretrainingCUHKcs.AI2026-06-1762606.17200链接

历史快照

信息来源

ArXiv 智能体论文周报:自进化架构与分布式网络取得重大突破

本周追踪三十五篇人工智能领域论文,揭示自进化智能体、分布式网络架构和创意领域基准测试的三大突破性进展。研究显示,九十亿参数的进化模型直接挑战三千九百七十亿参数的前沿模型,游戏创作基准测试则暴露出前沿模型在创意任务中的显著短板。

AgentScout · · · 8 分钟阅读
#ai-agents #arxiv #research-papers #agent-benchmarks #self-evolving-agents
Analyzing Data Nodes...
SIG_CONF:CALCULATING
Verified Sources

数据概览

关键事实

  • 数量:共 35 篇论文,其中 28 篇为 Agent 相关(80%),6 篇为多 Agent 系统,3 篇为自进化 Agent
  • 内容:引入 7 个新基准测试;Agent 相关论文平均趋势评分达 8.1(较上周 7.4 上升)
  • 时间:2026 年 6 月 18 日当周
  • 影响:OPD-Evolver、GameCraft-Bench 和分布式 Agent 网络成为高评分论文(趋势评分 10/10)

方法论

本追踪器每周监控 ArXiv cs.AI 和 cs.CL RSS 订阅源,筛选 Agent 相关研究。论文评分采用综合趋势评分(1-10),基于以下维度:新颖性、引用潜力、基准测试贡献和社区参与度(HuggingFace 点赞数)。Agent 相关论文通过标题和摘要关键词匹配识别。数据采集通过 Jina Reader API 完成;ArXiv API 直接访问仍被封锁。

本周指标

指标本周上周变化
论文总数3531+4
Agent 相关28280
Agent 占比80%90%-10pp
新基准测试770
平均趋势评分(Agent)8.17.4+0.7
多 Agent 论文64+2
自进化 Agent32+1

本周高关注论文

标题ArXiv ID趋势评分核心主题
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation2606.1762810agent evolution, self-evolving agents, memory hierarchy
Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes2606.1736810distributed agents, P2P networks, multi-agent systems
GameCraft-Bench: Can Agents Build Playable Games End-to-End?2606.1786110game generation agents, coding benchmarks, creative agents
Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search2606.172099agentic search, multi-hop reasoning, query diversification
When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval2606.172209self-evolving agents, legal AI, rule evolution
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning2606.176829multi-agent reasoning, RL agents, environment design
SEAGym: An Evaluation Environment for Self-Evolving LLM Agents2606.175469self-evolving agents, agent evaluation, evolution tracking
EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks2606.176989shopping agents, long-horizon tasks, hidden intent
Dissecting Model Behavior through Agent Trajectories2606.174549trajectory analysis, agent behavior, harness design

显著基准测试

基准测试ArXiv ID领域核心发现
GameCraft-Bench2606.17861游戏生成首个端到端游戏生成基准测试(Godot 引擎);前沿模型成功率仅 41.46%
EComAgentBench2606.17698电商662 个购物任务,包含分布式隐藏意图;最佳模型准确率 57.1%
SEAGym2606.17546Agent 进化追踪自进化 Agent 在训练/验证/测试/回放/成本各阶段的 Harness 更新
MapSatisfyBench2606.17453导航评估感知满意度的地图 Agent,隐式决策因子来自真实用户数据
CEO-Bench2606.17459战略多 Agent 高管模拟的战略资源重新分配;揭示单一顾问捕获失效模式
MemTrace2606.17328记忆长期记忆基准测试,揭示证据使用瓶颈主导失败
LongWebBench2606.17727网页生成490 个结构化 + 507 个功能性任务,面向长时程网页生成

热门话题

话题论文数量平均趋势评分代表论文
自进化 Agent39.3OPD-Evolver, When Rules Learn, SEAGym
分布式 Agent110.0Distributed General-Purpose Agent Networks
多 Agent 系统68.2CEO-Bench, Trainee to Trainer, Parasocial Scripts
Agent 基准测试77.9GameCraft-Bench, EComAgentBench, SEAGym
Agent 记忆47.5MemSlides, FinAcumen, MemTrace
Agent 搜索19.0DivInit

🔺 独家情报:别处看不到的洞察

置信度: 高 | 新颖度评分: 62/100

虽然单篇论文在 HuggingFace 上获得关注,但本周 35 篇论文的集体信号揭示了三个结构性转变,这些是大多数报道所忽略的:

1. 自进化 Agent 正在缩小参数差距。 OPD-Evolver 的 9B 参数模型超越 ReasoningBank 11.5% 和 Skill0 5.8%,直接挑战 397B 前沿模型。这不是增量改进——它表明结构化记忆层次(OPD-Evolver 的四级架构)可以替代原始规模。对于 Agent 进化任务,架构比参数数量更重要。

2. 创意领域基准测试暴露前沿模型局限性。 GameCraft-Bench 显示,即使最强的代码 Agent 在端到端游戏生成上也仅达到 41.46% 的成功率。EComAgentBench 的最佳模型在包含分散需求的购物任务上仅达到 57.1%。这些结果与传统基准测试上 90%+ 的分数形成鲜明对比,揭示前沿模型在需要长时程规划和隐式需求发现的多步骤创意任务上仍面临挑战。

3. 分布式 P2P Agent 网络作为架构替代方案兴起。 关于分布式通用 Agent 网络的论文(趋势评分 10)引入了首个系统性的点对点 Agent 协作框架,采用基于 BAID 的身份绑定和 MG-EigenTrust 信誉机制。这将范式从单一 Agent 编排(LangChain、CrewAI)转向去中心化 Agent 网络——这是当前主要框架均未涉及的方向。

关键启示: 构建 Agent 系统的企业团队应优先考虑记忆架构设计(OPD-Evolver 的慢快协同进化),而非模型参数数量,并为分布式 Agent 网络做好准备——这是当前编排框架之后的下一个架构演进方向。

趋势与观察

  • 自进化框架激增:本周三篇论文聚焦具有显式记忆层次的自进化 Agent,较上周两篇有所增加。相对 ReasoningBank 提升 11.5% 表明慢快协同进化架构正在成熟。

  • 基准测试转向复杂真实任务:七个新基准测试针对多步推理、创意生成和隐藏意图发现——从单轮任务转向需要持续 Agent 推理的场景。

  • 规模化轨迹分析:本周分析了 138k 条 Agent 轨迹,揭示模型特定的行为模式。这种定量的 Agent 行为分析方法正在成为标准评估工具。

  • Agent 记忆架构多样化:出现四种不同的记忆方法——层次化(MemSlides)、基于经验(FinAcumen)、长期(MemTrace)和进化追踪(SEAGym)。尚无共识架构;该领域正在探索多个设计方向。

  • 长时程推理获得关注:多个基准测试(EComAgentBench、LongWebBench、GameCraft-Bench)专门针对需要 10 步以上的任务,表明该领域正从单轮转向持续推理。

周度对比总结

指标本周上周变化
追踪论文数3531+4
Agent 相关论文28280
Agent 占比80%90%-10pp
平均趋势评分(Agent)8.17.4+0.7
多 Agent 论文64+2
自进化 Agent32+1
引入基准测试770
趋势评分 ≥ 99 篇4 篇+5

显著变化:Agent 论文平均趋势评分周环比上升 0.7 分,由三篇趋势评分 10 的论文驱动(OPD-Evolver、分布式 Agent 网络、GameCraft-Bench)。这表明 Agent 领域的研究质量集中度更高。

完整论文列表

标题作者分类发布日期评分ArXivHF
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy DistillationNUS Research Teamcs.AI2026-06-17102606.17628链接
Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and PrototypesMultiple authorscs.AI2026-06-17102606.17368
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?CUHKSZcs.AI2026-06-17102606.17861链接
Beyond Parallel Sampling: Diverse Query Initialization for Agentic SearchCMU Research Teamcs.AI2026-06-1792606.17209
When Rules Learn: A Self-Evolving Agent for Legal Case RetrievalMultiple authorscs.AI2026-06-1792606.17220
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent ReasoningMultiple authorscs.AI2026-06-1792606.17682
SEAGym: An Evaluation Environment for Self-Evolving LLM AgentsMultiple authorscs.AI2026-06-1792606.17546
EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden IntentMultiple authorscs.AI2026-06-1792606.17698
Dissecting Model Behavior through Agent TrajectoriesMultiple authorscs.AI2026-06-1792606.17454
Scaling Enterprise Agent Routing: Degradation, Diagnosis, and RecoveryMultiple authorscs.AI2026-06-1782606.17519
Can LLMs Be CEOs? Benchmarking Strategic Resource Reallocation with Multi-Role Agent SimulationMultiple authorscs.AI2026-06-1782606.17459
Environment-Grounded Automated Prompt Optimization for LLM Game AgentsMultiple authorscs.AI2026-06-1782606.17838
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide GenerationYe Jin, Yangyang Xu, Jun Zhu, Yibo Yangcs.CL2026-06-1782606.17162
MapSatisfyBench: Benchmarking Satisfaction-Aware Map AgentsMultiple authorscs.AI2026-06-1782606.17453
Closing the Feedback Loop: From Experience Extraction to Insight Governance in Verbal Reinforcement LearningMultiple authorscs.AI2026-06-1782606.17591
StepGuard: Guarding Web Navigation via Single-Step CalibrationMultiple authorscs.AI2026-06-1782606.17871
FinAcumen: Financial Multimodal Reasoning via Self-Evolving Experience Memory HarnessMultiple authorscs.AI2026-06-1782606.17642
Beyond Domains: Reusing Web Skills via Transferable Interaction PatternsMultiple authorscs.AI2026-06-1782606.17645
Surrogate Assisted Pedestrian Protection Design via a Foundation Model Orchestrated WorkflowMultiple authorscs.AI2026-06-1772606.17577
DecoSearch: Complexity-Aware Routing and Plan-Level Repair for Text-to-SQLMultiple authorscs.AI2026-06-1772606.17821
LLM-as-Judge in Education: A Curriculum-Grounded Marking PipelineMultiple authorscs.AI2026-06-1772606.17507
AIPatient Arena: EHR-grounded evaluation of LLMs in clinical consultation workflowsMultiple authorscs.AI2026-06-1772606.17474
From Parasocial Scripts to Dyadic Persistence in Autonomous AI-Agent CommunitiesMohammadsadegh Abolhasani et al.cs.CL2026-06-1772606.17174
LecturaAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted LearningMultiple authorscs.CL2026-06-1572606.16428链接
DeepInsight: A Unified Evaluation Infrastructure Across the Physical AI StackMultiple authorscs.AI2026-06-1772606.17574
FlowRAG: Synergizing Explicit Reasoning via Frequency-Aware Multi-Granularity Graph FlowMultiple authorscs.AI2026-06-1772606.17856
MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation EvaluationMultiple authorscs.CL2026-06-1772606.17449
Brick-DICL: Dynamic In-Context Learning for Automated Brick Schema ClassificationMultiple authorscs.AI2026-06-1772606.17637
LongWebBench: Evaluating Structural and Functional Webpage Generation in Long-Horizon SettingsMultiple authorscs.AI2026-06-1772606.17727
MemTrace: Probing What Final Accuracy Misses in Long-Term MemoryMultiple authorscs.AI2026-06-1772606.17328
PromptMN: Pseudo Prompting LanguageEnkhzol Dovdoncs.CL2026-06-1762606.17164
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling19 authorscs.AI2026-06-1762606.18023链接
Zone of Proximal Policy Optimization: Teacher in Prompts, Not GradientsNVIDIAcs.AI2026-06-1762606.18216链接
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA PretrainingCUHKcs.AI2026-06-1762606.17200链接

历史快照

信息来源

lyqlixwxlfcn5lwuw2t65g████w8k5q8qajop6ibbijcp4hfcw6k0rl3toh████ekfqacdihgjpxig0lh66e27tvup0rjv3░░░8070acqc0eqlre51rhtj59v25hxn24yab░░░gki5ryol0t8b44upsqqjfj2k2rmfujy97████3kfwqt1bjxwsul1r8lsn66qhc8lmmvay████cpqxntk5fvxr6xaqallxfke1g7ogqlfp████yy9i6fv55pofxrrczf9l3l7pzywpljvkc████3e6awbj85viynx76n9ao3bmtb06oo6kk9░░░1ch1smb1j34abzdlvlynm9af2141jv54░░░g37lhmug5opxqpd36l0kppprktwg46a5████lq0ymjh7948hx0zjlo37a3c69stxraz3░░░hyjkc7jub653covooonin7njo7ffg048████eosjsyyl1bhhd4pp2jx8yh4yywuchla69████hnzzl8at77nz5u70ew1celow6a06jp3s░░░hu3li55llpokkyw8akvl77fk6wy7pjg29░░░dzfdqzgapbjcuf3r1ol4ssd0j2nsvrc████k9pjejxmh9gwmlqqtzm0bnjw5rihddjf░░░gflhxs5v10dxsfs9032vsr48uuyetqhr░░░spifsuygjfc3r0thzp4t4gw5133pe0n1████rr842ejxoup5wtndc1w22aggkzikoycek████mp9quly69o28tb7hz6ksldq49ffh20ln░░░xadmunogs1hulvv264gx18peapi8x3qd████g7mbeb3rjmhos8c7h5stme60e031jduxc████ok5df24wane8syp5vbdytv3rmod1zkqua████oouojhxkmkb0dzmqyremczrpaf7ewoe6q████t3e96wdoirtehlbux9re8p4g9ec0amig░░░jlhkh662qlpg9r6t93b2j0k154lzq7xyp░░░h6pe2oatv3gz1b5e7efbuqlhn1efalif████0q29zs6lwdyyio0okbxi3p3bgjsr6lyag░░░0ioexjap37srch3dxpz748gxdnamgvcr8p████rlb89wv0l0rvqui40muc9rzj0g7xyqr████p850pcxasd4q0wcy2sqbs8oc6aeavpf████x9qzz0mlqor9qdf1qfcwmbswpzbwn7seq░░░jo8eb11o8vfwwr0zwjatdsz6qap6nk2s████tcp049vcu8kx11lstkgfncbtvelghgl2c████634m7k3d4bpdi2oudggvld53d24380kk░░░td5lo1dbppzqog7lfgrub0m8tmtv5r████ntpe0bm20cr6wgvgwmtc8cyc2udcuj████ncz7j237llimnzap79nogroahy0h6yrrs░░░mr8vq3t2vdpw0tfcxvzf49cz66wh505wc████ne5u3gwir0pi2kqb3s0jisub2k8kt3cpo████qlb6go532bm76ww48tmoid3c47p56w08y████qteudf48s1ow47w2ekduiouakvn4lv0f░░░sheh5giqhrh2kxx4twm3li8ft6av001████yyheikf32c53v1u2abqjfulvgueydl4░░░5g5faozw8cw7cxbuzbk6vs2q84kwg09bo░░░qjlrexqls1mf3gfpkr5o0uky7yiomrpt████3iicivyk02bjyiaa996kifqnl8ahi4ugi████m99rcem73weqwy0fsqr8v9yya66oshtl████a5hdr8ipbxs