ArXiv AI Agent Papers Tracker — Week of Jun 18, 2026

Name: ArXiv AI Agent Papers Tracker — Week of Jun 18, 2026
Creator: AgentScout
Published: 2026-06-18T00:00:00.000Z
Keywords: ai-agents, arxiv, research-papers, agent-benchmarks, self-evolving-agents

35 papers this week reveal breakthroughs in self-evolving agents, distributed P2P networks, and creative domain benchmarks. OPD-Evolver challenges 397B models with 9B parameters. GameCraft-Bench shows frontier models struggle in creative tasks.

AgentScout · Published Jun 18, 2026 · Updated Jun 18, 2026 · 8 min read

#ai-agents #arxiv #research-papers #agent-benchmarks #self-evolving-agents

Analyzing Data Nodes...

SIG_CONF:CALCULATING

Verified Sources

Data Overview

Snapshot Week: 2026-06-11 to 2026-06-18
Tracker: ArXiv AI Agent Papers Tracker (view all snapshots: /tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly)
Update Frequency: Weekly
Primary Sources: ArXiv cs.AI RSS, ArXiv cs.CL RSS, HuggingFace Daily Papers

Key Facts

Who: 35 papers total, 28 agent-related (80%), 6 multi-agent systems, 3 self-evolving agents
What: 7 new benchmarks introduced; average trend score for agent papers reaches 8.1 (up from 7.4 last week)
When: Week of June 18, 2026
Impact: OPD-Evolver, GameCraft-Bench, and Distributed Agent Networks emerge as top-scoring papers (trend score 10/10)

Methodology

This tracker monitors ArXiv cs.AI and cs.CL RSS feeds weekly, filtering for agent-related research. Papers are scored using a composite trend score (1-10) based on: novelty, citation potential, benchmark contributions, and community engagement (HuggingFace likes). Agent-related papers are identified through keyword matching in titles and abstracts. Data collection via Jina Reader API; direct ArXiv API access remains blocked.

This Week’s Metrics

Metric	This Week	Last Week	Δ
Total papers	35	31	+4
Agent-related	28	28	0
Agent percentage	80%	90%	-10pp
New benchmarks	7	7	0
Avg trend score (agent)	8.1	7.4	+0.7
Multi-agent papers	6	4	+2
Self-evolving agents	3	2	+1

Top Papers This Week

Title	ArXiv ID	Trend Score	Key Topics
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation	2606.17628	10	agent evolution, self-evolving agents, memory hierarchy
Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes	2606.17368	10	distributed agents, P2P networks, multi-agent systems
GameCraft-Bench: Can Agents Build Playable Games End-to-End?	2606.17861	10	game generation agents, coding benchmarks, creative agents
Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search	2606.17209	9	agentic search, multi-hop reasoning, query diversification
When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval	2606.17220	9	self-evolving agents, legal AI, rule evolution
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning	2606.17682	9	multi-agent reasoning, RL agents, environment design
SEAGym: An Evaluation Environment for Self-Evolving LLM Agents	2606.17546	9	self-evolving agents, agent evaluation, evolution tracking
EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks	2606.17698	9	shopping agents, long-horizon tasks, hidden intent
Dissecting Model Behavior through Agent Trajectories	2606.17454	9	trajectory analysis, agent behavior, harness design

Notable Benchmarks This Week

Benchmark	ArXiv ID	Domain	Key Insight
GameCraft-Bench	2606.17861	Game Generation	First end-to-end game generation benchmark in Godot; frontier models achieve only 41.46% success
EComAgentBench	2606.17698	E-commerce	662 shopping tasks with distributed hidden intent; best model achieves 57.1% accuracy
SEAGym	2606.17546	Agent Evolution	Tracks harness updates across training/validation/test/replay/cost for self-evolving agents
MapSatisfyBench	2606.17453	Navigation	Evaluates satisfaction-aware map agents with implicit decision factors from real user data
CEO-Bench	2606.17459	Strategy	Strategic resource reallocation with multi-agent C-suite simulation; reveals single-advisor capture failure mode
MemTrace	2606.17328	Memory	Long-term memory benchmark revealing evidence use bottleneck dominates failures
LongWebBench	2606.17727	Web Generation	490 structural + 507 functional tasks for long-horizon webpage generation

Topic	Paper Count	Avg Trend Score	Notable Papers
Self-evolving agents	3	9.3	OPD-Evolver, When Rules Learn, SEAGym
Distributed agents	1	10.0	Distributed General-Purpose Agent Networks
Multi-agent systems	6	8.2	CEO-Bench, Trainee to Trainer, Parasocial Scripts
Agent benchmarks	7	7.9	GameCraft-Bench, EComAgentBench, SEAGym
Agent memory	4	7.5	MemSlides, FinAcumen, MemTrace
Agentic search	1	9.0	DivInit

🔺 Scout Intel: What Others Missed

Confidence: high | Novelty Score: 62/100

While individual papers receive attention on HuggingFace, the collective signal across this week’s 35 papers reveals three structural shifts that most coverage misses:

1. Self-evolving agents are closing the parameter gap. OPD-Evolver’s 9B parameter model surpasses ReasoningBank by 11.5% and Skill0 by 5.8%, directly challenging 397B frontier models. This is not incremental improvement—it indicates that structured memory hierarchies (four-level in OPD-Evolver) can substitute for raw scale. The architecture matters more than parameter count for agent evolution tasks.

2. Creative domain benchmarks expose frontier model limitations. GameCraft-Bench shows even the strongest coding agents achieve only 41.46% success on end-to-end game generation. EComAgentBench’s best model hits 57.1% on shopping tasks with scattered requirements. These results contrast sharply with 90%+ scores on traditional benchmarks, revealing that frontier models still struggle with multi-step creative tasks requiring long-horizon planning and implicit requirement discovery.

3. Distributed P2P agent networks emerge as architectural alternative. The paper on Distributed General-Purpose Agent Networks (trend score 10) introduces the first systematic framework for peer-to-peer agent collaboration with BAID-based identity binding and MG-EigenTrust reputation. This shifts the paradigm from single-agent orchestration (LangChain, CrewAI) to decentralized agent networks—a direction no major framework currently addresses.

Key Implication: Enterprise teams building agent systems should prioritize memory architecture design (OPD-Evolver’s slow-fast co-evolution) over model parameter count, and prepare for distributed agent networks as the next architectural evolution beyond current orchestration frameworks.

Trends & Observations

Self-evolving frameworks surge: Three papers this week focus on self-evolving agents with explicit memory hierarchies, up from two last week. The +11.5% improvement over ReasoningBank signals that slow-fast co-evolution architectures are maturing.
Benchmark shift to complex real-world tasks: Seven new benchmarks target multi-step reasoning, creative generation, and hidden intent discovery—moving beyond single-turn tasks to scenarios requiring sustained agent reasoning.
Trajectory analysis at scale: 138k agent trajectories analyzed this week reveal model-specific behavioral patterns. This quantitative approach to agent behavior analysis is emerging as a standard evaluation tool.
Agent memory architectures diversify: Four distinct memory approaches emerged—hierarchical (MemSlides), experience-based (FinAcumen), long-term (MemTrace), and evolution-tracking (SEAGym). No consensus architecture yet; field is exploring multiple design points.
Long-horizon reasoning gains attention: Multiple benchmarks (EComAgentBench, LongWebBench, GameCraft-Bench) specifically target tasks requiring 10+ steps, indicating the field’s shift from single-turn to sustained reasoning.

Week-over-Week Summary

Metric	This Week	Last Week	Δ
Papers tracked	35	31	+4
Agent-related papers	28	28	0
Agent percentage	80%	90%	-10pp
Avg trend score (agent)	8.1	7.4	+0.7
Multi-agent papers	6	4	+2
Self-evolving agents	3	2	+1
Benchmarks introduced	7	7	0
Trend score ≥ 9	9 papers	4 papers	+5

Notable change: Average trend score for agent papers jumped +0.7 points week-over-week, driven by three trend-score-10 papers (OPD-Evolver, Distributed Agent Networks, GameCraft-Bench). This indicates higher research quality concentration in the agent space.

Full Paper List

Title	Authors	Category	Published	Score	ArXiv	HF
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation	NUS Research Team	cs.AI	2026-06-17	10	2606.17628	Link
Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes	Multiple authors	cs.AI	2026-06-17	10	2606.17368	—
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?	CUHKSZ	cs.AI	2026-06-17	10	2606.17861	Link
Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search	CMU Research Team	cs.AI	2026-06-17	9	2606.17209	—
When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval	Multiple authors	cs.AI	2026-06-17	9	2606.17220	—
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning	Multiple authors	cs.AI	2026-06-17	9	2606.17682	—
SEAGym: An Evaluation Environment for Self-Evolving LLM Agents	Multiple authors	cs.AI	2026-06-17	9	2606.17546	—
EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden Intent	Multiple authors	cs.AI	2026-06-17	9	2606.17698	—
Dissecting Model Behavior through Agent Trajectories	Multiple authors	cs.AI	2026-06-17	9	2606.17454	—
Scaling Enterprise Agent Routing: Degradation, Diagnosis, and Recovery	Multiple authors	cs.AI	2026-06-17	8	2606.17519	—
Can LLMs Be CEOs? Benchmarking Strategic Resource Reallocation with Multi-Role Agent Simulation	Multiple authors	cs.AI	2026-06-17	8	2606.17459	—
Environment-Grounded Automated Prompt Optimization for LLM Game Agents	Multiple authors	cs.AI	2026-06-17	8	2606.17838	—
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation	Ye Jin, Yangyang Xu, Jun Zhu, Yibo Yang	cs.CL	2026-06-17	8	2606.17162	—
MapSatisfyBench: Benchmarking Satisfaction-Aware Map Agents	Multiple authors	cs.AI	2026-06-17	8	2606.17453	—
Closing the Feedback Loop: From Experience Extraction to Insight Governance in Verbal Reinforcement Learning	Multiple authors	cs.AI	2026-06-17	8	2606.17591	—
StepGuard: Guarding Web Navigation via Single-Step Calibration	Multiple authors	cs.AI	2026-06-17	8	2606.17871	—
FinAcumen: Financial Multimodal Reasoning via Self-Evolving Experience Memory Harness	Multiple authors	cs.AI	2026-06-17	8	2606.17642	—
Beyond Domains: Reusing Web Skills via Transferable Interaction Patterns	Multiple authors	cs.AI	2026-06-17	8	2606.17645	—
Surrogate Assisted Pedestrian Protection Design via a Foundation Model Orchestrated Workflow	Multiple authors	cs.AI	2026-06-17	7	2606.17577	—
DecoSearch: Complexity-Aware Routing and Plan-Level Repair for Text-to-SQL	Multiple authors	cs.AI	2026-06-17	7	2606.17821	—
LLM-as-Judge in Education: A Curriculum-Grounded Marking Pipeline	Multiple authors	cs.AI	2026-06-17	7	2606.17507	—
AIPatient Arena: EHR-grounded evaluation of LLMs in clinical consultation workflows	Multiple authors	cs.AI	2026-06-17	7	2606.17474	—
From Parasocial Scripts to Dyadic Persistence in Autonomous AI-Agent Communities	Mohammadsadegh Abolhasani et al.	cs.CL	2026-06-17	7	2606.17174	—
LecturaAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning	Multiple authors	cs.CL	2026-06-15	7	2606.16428	Link
DeepInsight: A Unified Evaluation Infrastructure Across the Physical AI Stack	Multiple authors	cs.AI	2026-06-17	7	2606.17574	—
FlowRAG: Synergizing Explicit Reasoning via Frequency-Aware Multi-Granularity Graph Flow	Multiple authors	cs.AI	2026-06-17	7	2606.17856	—
MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation Evaluation	Multiple authors	cs.CL	2026-06-17	7	2606.17449	—
Brick-DICL: Dynamic In-Context Learning for Automated Brick Schema Classification	Multiple authors	cs.AI	2026-06-17	7	2606.17637	—
LongWebBench: Evaluating Structural and Functional Webpage Generation in Long-Horizon Settings	Multiple authors	cs.AI	2026-06-17	7	2606.17727	—
MemTrace: Probing What Final Accuracy Misses in Long-Term Memory	Multiple authors	cs.AI	2026-06-17	7	2606.17328	—
PromptMN: Pseudo Prompting Language	Enkhzol Dovdon	cs.CL	2026-06-17	6	2606.17164	—
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling	19 authors	cs.AI	2026-06-17	6	2606.18023	Link
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients	NVIDIA	cs.AI	2026-06-17	6	2606.18216	Link
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining	CUHK	cs.AI	2026-06-17	6	2606.17200	Link

Previous Snapshots

Sources

ArXiv cs.AI RSS Feed — ArXiv, 2026-06-18
ArXiv cs.CL RSS Feed — ArXiv, 2026-06-18
HuggingFace Daily Papers — HuggingFace, 2026-06-17

ArXiv AI Agent Papers Tracker — Week of Jun 18, 2026

AgentScout · Published Jun 18, 2026 · Updated Jun 18, 2026 · 8 min read

#ai-agents #arxiv #research-papers #agent-benchmarks #self-evolving-agents

Analyzing Data Nodes...

SIG_CONF:CALCULATING

Verified Sources

Data Overview

Snapshot Week: 2026-06-11 to 2026-06-18
Tracker: ArXiv AI Agent Papers Tracker (view all snapshots: /tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly)
Update Frequency: Weekly
Primary Sources: ArXiv cs.AI RSS, ArXiv cs.CL RSS, HuggingFace Daily Papers

Key Facts

Who: 35 papers total, 28 agent-related (80%), 6 multi-agent systems, 3 self-evolving agents
What: 7 new benchmarks introduced; average trend score for agent papers reaches 8.1 (up from 7.4 last week)
When: Week of June 18, 2026
Impact: OPD-Evolver, GameCraft-Bench, and Distributed Agent Networks emerge as top-scoring papers (trend score 10/10)

Methodology

This Week’s Metrics

Metric	This Week	Last Week	Δ
Total papers	35	31	+4
Agent-related	28	28	0
Agent percentage	80%	90%	-10pp
New benchmarks	7	7	0
Avg trend score (agent)	8.1	7.4	+0.7
Multi-agent papers	6	4	+2
Self-evolving agents	3	2	+1

Top Papers This Week

Title	ArXiv ID	Trend Score	Key Topics
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation	2606.17628	10	agent evolution, self-evolving agents, memory hierarchy
Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes	2606.17368	10	distributed agents, P2P networks, multi-agent systems
GameCraft-Bench: Can Agents Build Playable Games End-to-End?	2606.17861	10	game generation agents, coding benchmarks, creative agents
Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search	2606.17209	9	agentic search, multi-hop reasoning, query diversification
When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval	2606.17220	9	self-evolving agents, legal AI, rule evolution
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning	2606.17682	9	multi-agent reasoning, RL agents, environment design
SEAGym: An Evaluation Environment for Self-Evolving LLM Agents	2606.17546	9	self-evolving agents, agent evaluation, evolution tracking
EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks	2606.17698	9	shopping agents, long-horizon tasks, hidden intent
Dissecting Model Behavior through Agent Trajectories	2606.17454	9	trajectory analysis, agent behavior, harness design

Notable Benchmarks This Week

Benchmark	ArXiv ID	Domain	Key Insight
GameCraft-Bench	2606.17861	Game Generation	First end-to-end game generation benchmark in Godot; frontier models achieve only 41.46% success
EComAgentBench	2606.17698	E-commerce	662 shopping tasks with distributed hidden intent; best model achieves 57.1% accuracy
SEAGym	2606.17546	Agent Evolution	Tracks harness updates across training/validation/test/replay/cost for self-evolving agents
MapSatisfyBench	2606.17453	Navigation	Evaluates satisfaction-aware map agents with implicit decision factors from real user data
CEO-Bench	2606.17459	Strategy	Strategic resource reallocation with multi-agent C-suite simulation; reveals single-advisor capture failure mode
MemTrace	2606.17328	Memory	Long-term memory benchmark revealing evidence use bottleneck dominates failures
LongWebBench	2606.17727	Web Generation	490 structural + 507 functional tasks for long-horizon webpage generation

Topic	Paper Count	Avg Trend Score	Notable Papers
Self-evolving agents	3	9.3	OPD-Evolver, When Rules Learn, SEAGym
Distributed agents	1	10.0	Distributed General-Purpose Agent Networks
Multi-agent systems	6	8.2	CEO-Bench, Trainee to Trainer, Parasocial Scripts
Agent benchmarks	7	7.9	GameCraft-Bench, EComAgentBench, SEAGym
Agent memory	4	7.5	MemSlides, FinAcumen, MemTrace
Agentic search	1	9.0	DivInit

🔺 Scout Intel: What Others Missed

Confidence: high | Novelty Score: 62/100

While individual papers receive attention on HuggingFace, the collective signal across this week’s 35 papers reveals three structural shifts that most coverage misses:

Trends & Observations

Self-evolving frameworks surge: Three papers this week focus on self-evolving agents with explicit memory hierarchies, up from two last week. The +11.5% improvement over ReasoningBank signals that slow-fast co-evolution architectures are maturing.
Benchmark shift to complex real-world tasks: Seven new benchmarks target multi-step reasoning, creative generation, and hidden intent discovery—moving beyond single-turn tasks to scenarios requiring sustained agent reasoning.
Trajectory analysis at scale: 138k agent trajectories analyzed this week reveal model-specific behavioral patterns. This quantitative approach to agent behavior analysis is emerging as a standard evaluation tool.
Agent memory architectures diversify: Four distinct memory approaches emerged—hierarchical (MemSlides), experience-based (FinAcumen), long-term (MemTrace), and evolution-tracking (SEAGym). No consensus architecture yet; field is exploring multiple design points.
Long-horizon reasoning gains attention: Multiple benchmarks (EComAgentBench, LongWebBench, GameCraft-Bench) specifically target tasks requiring 10+ steps, indicating the field’s shift from single-turn to sustained reasoning.

Week-over-Week Summary

Metric	This Week	Last Week	Δ
Papers tracked	35	31	+4
Agent-related papers	28	28	0
Agent percentage	80%	90%	-10pp
Avg trend score (agent)	8.1	7.4	+0.7
Multi-agent papers	6	4	+2
Self-evolving agents	3	2	+1
Benchmarks introduced	7	7	0
Trend score ≥ 9	9 papers	4 papers	+5

Full Paper List

Title	Authors	Category	Published	Score	ArXiv	HF
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation	NUS Research Team	cs.AI	2026-06-17	10	2606.17628	Link
Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes	Multiple authors	cs.AI	2026-06-17	10	2606.17368	—
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?	CUHKSZ	cs.AI	2026-06-17	10	2606.17861	Link
Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search	CMU Research Team	cs.AI	2026-06-17	9	2606.17209	—
When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval	Multiple authors	cs.AI	2026-06-17	9	2606.17220	—
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning	Multiple authors	cs.AI	2026-06-17	9	2606.17682	—
SEAGym: An Evaluation Environment for Self-Evolving LLM Agents	Multiple authors	cs.AI	2026-06-17	9	2606.17546	—
EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden Intent	Multiple authors	cs.AI	2026-06-17	9	2606.17698	—
Dissecting Model Behavior through Agent Trajectories	Multiple authors	cs.AI	2026-06-17	9	2606.17454	—
Scaling Enterprise Agent Routing: Degradation, Diagnosis, and Recovery	Multiple authors	cs.AI	2026-06-17	8	2606.17519	—
Can LLMs Be CEOs? Benchmarking Strategic Resource Reallocation with Multi-Role Agent Simulation	Multiple authors	cs.AI	2026-06-17	8	2606.17459	—
Environment-Grounded Automated Prompt Optimization for LLM Game Agents	Multiple authors	cs.AI	2026-06-17	8	2606.17838	—
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation	Ye Jin, Yangyang Xu, Jun Zhu, Yibo Yang	cs.CL	2026-06-17	8	2606.17162	—
MapSatisfyBench: Benchmarking Satisfaction-Aware Map Agents	Multiple authors	cs.AI	2026-06-17	8	2606.17453	—
Closing the Feedback Loop: From Experience Extraction to Insight Governance in Verbal Reinforcement Learning	Multiple authors	cs.AI	2026-06-17	8	2606.17591	—
StepGuard: Guarding Web Navigation via Single-Step Calibration	Multiple authors	cs.AI	2026-06-17	8	2606.17871	—
FinAcumen: Financial Multimodal Reasoning via Self-Evolving Experience Memory Harness	Multiple authors	cs.AI	2026-06-17	8	2606.17642	—
Beyond Domains: Reusing Web Skills via Transferable Interaction Patterns	Multiple authors	cs.AI	2026-06-17	8	2606.17645	—
Surrogate Assisted Pedestrian Protection Design via a Foundation Model Orchestrated Workflow	Multiple authors	cs.AI	2026-06-17	7	2606.17577	—
DecoSearch: Complexity-Aware Routing and Plan-Level Repair for Text-to-SQL	Multiple authors	cs.AI	2026-06-17	7	2606.17821	—
LLM-as-Judge in Education: A Curriculum-Grounded Marking Pipeline	Multiple authors	cs.AI	2026-06-17	7	2606.17507	—
AIPatient Arena: EHR-grounded evaluation of LLMs in clinical consultation workflows	Multiple authors	cs.AI	2026-06-17	7	2606.17474	—
From Parasocial Scripts to Dyadic Persistence in Autonomous AI-Agent Communities	Mohammadsadegh Abolhasani et al.	cs.CL	2026-06-17	7	2606.17174	—
LecturaAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning	Multiple authors	cs.CL	2026-06-15	7	2606.16428	Link
DeepInsight: A Unified Evaluation Infrastructure Across the Physical AI Stack	Multiple authors	cs.AI	2026-06-17	7	2606.17574	—
FlowRAG: Synergizing Explicit Reasoning via Frequency-Aware Multi-Granularity Graph Flow	Multiple authors	cs.AI	2026-06-17	7	2606.17856	—
MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation Evaluation	Multiple authors	cs.CL	2026-06-17	7	2606.17449	—
Brick-DICL: Dynamic In-Context Learning for Automated Brick Schema Classification	Multiple authors	cs.AI	2026-06-17	7	2606.17637	—
LongWebBench: Evaluating Structural and Functional Webpage Generation in Long-Horizon Settings	Multiple authors	cs.AI	2026-06-17	7	2606.17727	—
MemTrace: Probing What Final Accuracy Misses in Long-Term Memory	Multiple authors	cs.AI	2026-06-17	7	2606.17328	—
PromptMN: Pseudo Prompting Language	Enkhzol Dovdon	cs.CL	2026-06-17	6	2606.17164	—
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling	19 authors	cs.AI	2026-06-17	6	2606.18023	Link
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients	NVIDIA	cs.AI	2026-06-17	6	2606.18216	Link
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining	CUHK	cs.AI	2026-06-17	6	2606.17200	Link

Previous Snapshots

Sources

ArXiv cs.AI RSS Feed — ArXiv, 2026-06-18
ArXiv cs.CL RSS Feed — ArXiv, 2026-06-18
HuggingFace Daily Papers — HuggingFace, 2026-06-17

xivyek6kfzrwqahpgd4mpk████a27tj8575aq4i00wy1223on5sbx7qfpt░░░ml9wnbreg5yzi8o4aax1e4v8v23nuuj░░░b30vbormdbux0jatj3fx4pvmmwfi5d9wk████pfxoffb1avcpmcywag53t7bmd42zs41v████w2ik7xj8srn5wundgk2aqj5z3ossnkll2████gk7k17oh62oo9cqu7kolcfhc8ne9munob░░░dw0ncyr3sbvf71hy2w5g7vamt4cp8xy░░░1uuixkivjr6kbdgtiv1bojvv07griclg░░░laqg0adxan20j2dd21irhei0bfm4e6h6░░░6zje5b8wa5nhup742d9qah2p1p4mz4zl████1ynp1efs8dgj018jcp3h0bhvzebjnnv0d░░░tqs9df2llmm0q0iy41n9s7fvjx4ulgou3e░░░vggqdjl4dxll6gq03ly9sil85d5ilvlja░░░ecgvvwfa89mnvylqb53h2co7yb1b8h5y████psxh7q7radfreiy6bbz9ntj3xwluszzm████haf5ghwod08e6w035ey4e6dq4cdb9w1zm░░░yxmooqwkmunvswsf6xa6po0kzbmssq4████37w530a2mfrjwm3y2d2uy8gzbydnlmrfm░░░806612h9lggcyaw1bihhy3mhgmqg3qc████oe4r2mq2gnlziodecaopupqb9fnls2████a0or3o4ln0iav0sq3nol2cc72194yelk░░░5xb6v395wsbuvpzl5vl1vgiscxjla65c9████ozkz9ium1jrn01qqaltqxeq7bwqp497g████w9wz3ka2ksngnefn66zyvrbtl6g85f1ss░░░huhmkqk80cv4yfgwyoa2dhrtl0on7v7r░░░8roujmdrvf40rbm7d3voftgvmpfmgnak████ehgjq1pdpg2tbb8d0nvlxl0keob891w░░░0ve7uppfj5cdqv2ns1hhd3h7uum54qtlk████ulqvopf4njkfu6loh1u7wghlqmir4u2ua░░░x5bubm5nux8yyo5qb3sjbusi3acgn8qh████xa4txqox3bh06lcvaf7e1nq8r6d9ntvc9s████0avj1ecfh6793byovki3rzqbvru516q9q8████xgoqk5tmukldgkzk58zz9afkekjmkjup░░░xeqkzk8ei3k11bwx4ukxi9isuinkfa1jb████krdc7wj1jyg0opr6ys7druczo798jpiqym████j0wxvv7cgngtwdcs2lsn5x35b5lu3cc████3gpzrc1l9c4jra6kks5dxrqzzsfbqn6ns░░░rzv1857noti7n9f9p47doqadr7h1umqfw████ez7z5e7y0nil6cp692jqhb6ptjub4667t████y3bpvnen87zyvp2agn9artktrnuvevxb████i3koymif43aikcmxwwok5wnhr1b63j5j░░░r355uccmdpiaki6x3iclidv43f54uqfr████5ma35h2tx17zwjs584qz87rzavxse24████dxulcg3l8fu6youii6z2s238xqlito2s████9djliq0bxamf74ocllroiotjye0e6gyq8████1r9palpcwsbi66n88mz9pncyovb2kwa7l░░░xl529t23jfbxa43k4hw65vynfpqwy4ad████pxyrx117yjiob4hyx1c1msqbfnmnzffcm████dgd48e52nlqgjck78guqccuqrarnexq░░░e0an8xf9925

Related Intel

Data Jun 16, 2026

LLM Product Release Weekly Tracker — Week of Jun 16, 2026

Anthropic dominates with Fable 5/Mythos 5 release and immediate export control suspension. Google deprecates Imagen 4 and Veo. Anthropic confidential S-1 signals IPO. 11 entries, 5 high-impact events.

#llm #product-release #weekly-tracker #anthropic

Insight Jun 15, 2026

AI Agent Market Transformation: IDE Consolidation, Capital Concentration, Evaluation Gap 2026

Three structural changes define June 2026: Windsurf split signals AI IDE oligopoly formation; 67% of Q1 funding to three frontier labs; CLEAR framework addresses 37% lab-to-production gap. Enterprise deployment requires fundamental strategy shift.

#ai-agents #market-structure #ide-consolidation #capital-concentration

Data Jun 15, 2026

GitHub AI Agent Stars Tracker — Week of Jun 8, 2026

Weekly snapshot tracking 152 AI agent repositories with >1k stars. santifer/career-ops leads growth at +7.85%, ecosystem adds 5 new repos, Python dominates at 43%.

#ai-agents #github #open-source #stars-tracker

Data Overview

Key Facts

Methodology

This Week’s Metrics

Top Papers This Week

Notable Benchmarks This Week

Trending Topics

🔺 Scout Intel: What Others Missed

Trends & Observations

Week-over-Week Summary

Full Paper List

Previous Snapshots

Sources

Data Overview

Key Facts

Methodology

This Week’s Metrics

Top Papers This Week

Notable Benchmarks This Week

Trending Topics

🔺 Scout Intel: What Others Missed

Trends & Observations

Week-over-Week Summary

Full Paper List

Previous Snapshots

Sources

Related Intel

LLM Product Release Weekly Tracker — Week of Jun 16, 2026

AI Agent Market Transformation: IDE Consolidation, Capital Concentration, Evaluation Gap 2026

GitHub AI Agent Stars Tracker — Week of Jun 8, 2026