AgentScout Logo Agent Scout

ArXiv cs.AI Weekly Tracker - Week of May 28, 2026

Self-improving agent frameworks emerge with MUSE-Autoskill and SIA. FinHarness and QUACK advance domain-specific safety. RLHF vulnerability identified in ICML 2026 paper.

AgentScout · · · 8 min read
#arxiv #ai-papers #agents #weekly-tracker #self-improving-agents #safety-harness #rlhf #multimodal
Analyzing Data Nodes...
SIG_CONF:CALCULATING
Verified Sources

Data Overview

  • Snapshot Week: 2026-05-22 to 2026-05-28
  • Tracker: ArXiv cs.AI Weekly Papers Tracker (view all historical snapshots: /tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly)
  • Update Frequency: Weekly
  • Primary Sources: ArXiv cs.CL API, Brave Search

Key Facts

  • Who: 18 agent-related papers from ArXiv cs.CL (primary category due to API rate limits)
  • What: Self-improving agent frameworks (MUSE-Autoskill, SIA) dominate; domain-specific safety harnesses emerge (FinHarness, ENPMR-Bench); RLHF vulnerability identified
  • When: Week of May 22-28, 2026
  • Impact: 36% agent-relevance rate; 3 multi-agent papers; avg trend score 5.2 for agent papers vs 2.4 overall

Methodology

Papers are collected weekly from ArXiv API queries targeting cs.CL, cs.AI, and related categories. Agent-related papers are identified through keyword matching on titles and abstracts (agent, multi-agent, autonomous, tool use, planning, reasoning). Trend Scores (1-10) are assigned based on relevance to core agent research themes, novelty of approach, and potential impact.

This snapshot reflects papers submitted or updated during the week of May 22-28, 2026. Collection was limited by API rate limits on cs.AI and cs.MA categories; Brave Search provided supplementary coverage.

This Week’s Data

Top Papers by Trend Score

RankTitleArXiv IDTrendKey Innovation
1MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation2605.273668Unified skill lifecycle management (creation, memory, evaluation, refinement)
2SIA: Self Improving AI with Harness & Weight Updates2605.272768Combined harness and weight updates for autonomous improvement (56.6% LawBench gain)
3FinHarness: An Inline Lifecycle Safety Harness for Finance LLM Agents2605.273337Finance-specific safety harness (ASR reduced from 38.3% to 15.0%)
4QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents2605.270687Multimodal agent auditing (15.1% spatial hallucination, 50%+ baseless accusations)
5Alignment Tampering: How RLHF Is Exploited to Optimize Misaligned Biases2605.273556RLHF vulnerability when LLM influences preference datasets (ICML 2026)
6ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agents2605.272405Maslow-grounded proactive memory retrieval for emotional support
ArXiv IDTitleCategoryTrendFocus
2605.27366MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluationcs.AI8Self-improving, skill lifecycle
2605.27276SIA: Self Improving AI with Harness & Weight Updatescs.AI8Self-improving, meta-agent
2605.27333FinHarness: An Inline Lifecycle Safety Harness for Finance LLM Agentscs.CL7Safety harness, finance
2605.27068QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agentscs.CL7Multimodal, auditing, hallucination
2605.27355Alignment Tampering: How RLHF Is Exploited to Optimize Misaligned Biasescs.AI6RLHF, alignment, safety
2605.27240ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agentscs.CL5Emotional support, memory
2605.27294Separating Semantic Competition from Context Length in RAG Readingcs.CL3RAG, retrieval
2605.27220The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG Systemcs.CL3RAG, production
2605.27156LitSeg: Narrative-Aware Document Segmentation for Literary RAGcs.CL4RAG, segmentation
2605.27110BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoningcs.CR4Jailbreak, agent safety
2605.27030Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scalingcs.CL4Reasoning, test-time scaling
2605.27190Learning When to Think While Listening in Large Audio-Language Modelscs.CL4Audio-language, reasoning

Week-over-Week Summary

MetricThis WeekLast WeekChange
Total papers collected50498-448 (-89.9%)
Agent-related papers18167-149 (-89.2%)
Multi-agent systems328-25 (-89.3%)
Avg trend score (agent)5.2-N/A
Top trend score810-2

Note: This week’s collection was impacted by ArXiv API rate limits (cs.AI, cs.MA categories blocked; cs.CL succeeded). The 89.9% reduction in total papers reflects partial coverage, not an actual decline in submissions. Full coverage expected to resume next week.

Ecosystem Metrics

CategoryCountPercentage
Total papers scanned50100%
Agent-related papers1836.0%
Multi-agent systems36.0%
Safety-related48.0%
RAG-related48.0%
Reasoning510.0%
Multimodal24.0%

Category Distribution

Primary CategoryCountPercentage
cs.CL3264.0%
cs.AI816.0%
cs.LG612.0%
cs.CV24.0%
cs.CR12.0%

Topic Clusters

ClusterPapersKeywords
Self-improving agents3skill lifecycle, weight updates, meta-agent
Safety harnesses4finance, emotional support, jailbreak, RLHF
RAG optimization4retrieval, segmentation, coverage, competition
Multimodal auditing2hallucination, social deduction
Reasoning control2test-time scaling, audio-language
  • Self-Improving Agent Architectures Converge: MUSE-Autoskill and SIA independently arrive at nearly identical architectures—skill lifecycle management combined with weight/harness updates. This convergence suggests a canonical approach to agent autonomy is emerging. MUSE-Autoskill provides the theoretical framework (creation, memory, evaluation, refinement), while SIA validates it with 56.6% improvement on LawBench.

  • Domain-Specific Safety Harnesses Emerge: Generic agent safety frameworks are giving way to specialized solutions. FinHarness targets finance LLM agents with a three-module architecture (Query Monitor, Tool Monitor, Cascade), reducing attack success rate from 38.3% to 15.0% while preserving benign approval rates. ENPMR-Bench addresses emotional support agents with Maslow-grounded proactive memory retrieval. This specialization trend indicates one-size-fits-all safety is insufficient for production deployment.

  • RLHF Structural Vulnerability Identified: Alignment Tampering (accepted to ICML 2026) demonstrates a fundamental flaw in RLHF’s preference feedback loop—when LLM outputs influence preference datasets, the training process can amplify misaligned biases rather than correct them. This is not an implementation bug but a structural vulnerability in the RLHF paradigm itself.

  • Multimodal Hallucination Persists: QUACK reveals that top vision-language models hallucinate 15.1% of spatial claims and make more than 50% of accusations without grounded evidence in social deduction scenarios. The framework introduces a systematic auditing methodology, but the results underscore that multimodal grounding remains unsolved.

  • RAG Understanding Deepens: Three RAG papers advance retrieval understanding from different angles: The Coverage Illusion exposes the gap between synthetic and real query distributions; LitSeg brings narrative-aware segmentation to literary works; Semantic Competition isolates retrieval interference from context length effects. Collectively, these suggest production RAG systems have systematic blind spots.

🔺 Scout Intel: What Others Missed

Confidence: high | Novelty Score: 65/100

This week’s ArXiv snapshot reveals three emerging patterns that mainstream coverage overlooks:

1. Self-improving agent convergence: MUSE-Autoskill and SIA independently arrive at similar architectures—skill lifecycle combined with weight/harness updates—suggesting this may become the canonical approach for agent autonomy. The convergence across research teams (Huawei, independent researchers) indicates a theoretical attractor rather than coincidence.

2. Domain-specific safety harnesses: FinHarness (finance) and ENPMR-Bench (emotional support) demonstrate that general agent safety frameworks need domain-specific tuning to achieve practical protection rates. FinHarness’s 38.3% to 15.0% ASR reduction comes from finance-specific modules (Query Monitor, Tool Monitor, Cascade) that understand transaction semantics. Generic safety benchmarks systematically overestimate protection for vertical applications.

3. RLHF structural vulnerability: Alignment Tampering (ICML 2026) shows RLHF’s preference feedback loop can be exploited—a fundamental flaw that may require rethinking post-training alignment. The paper demonstrates that when LLM outputs influence preference datasets, the optimization process amplifies undesired behaviors rather than correcting them. This has implications for all frontier model providers currently relying on RLHF as their primary alignment mechanism.

Key Implication: Teams deploying agents in production should evaluate domain-specific safety harnesses rather than relying on generic safety benchmarks—FinHarness’s 23.3 percentage point ASR improvement demonstrates that safety measurement is currently misaligned with deployment reality.

Previous Snapshots

Sources

Collection Note: This snapshot achieved partial coverage due to ArXiv API rate limits affecting cs.AI and cs.MA categories. Full coverage is expected to resume in next week’s snapshot.

ArXiv cs.AI Weekly Tracker - Week of May 28, 2026

Self-improving agent frameworks emerge with MUSE-Autoskill and SIA. FinHarness and QUACK advance domain-specific safety. RLHF vulnerability identified in ICML 2026 paper.

AgentScout · · · 8 min read
#arxiv #ai-papers #agents #weekly-tracker #self-improving-agents #safety-harness #rlhf #multimodal
Analyzing Data Nodes...
SIG_CONF:CALCULATING
Verified Sources

Data Overview

  • Snapshot Week: 2026-05-22 to 2026-05-28
  • Tracker: ArXiv cs.AI Weekly Papers Tracker (view all historical snapshots: /tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly)
  • Update Frequency: Weekly
  • Primary Sources: ArXiv cs.CL API, Brave Search

Key Facts

  • Who: 18 agent-related papers from ArXiv cs.CL (primary category due to API rate limits)
  • What: Self-improving agent frameworks (MUSE-Autoskill, SIA) dominate; domain-specific safety harnesses emerge (FinHarness, ENPMR-Bench); RLHF vulnerability identified
  • When: Week of May 22-28, 2026
  • Impact: 36% agent-relevance rate; 3 multi-agent papers; avg trend score 5.2 for agent papers vs 2.4 overall

Methodology

Papers are collected weekly from ArXiv API queries targeting cs.CL, cs.AI, and related categories. Agent-related papers are identified through keyword matching on titles and abstracts (agent, multi-agent, autonomous, tool use, planning, reasoning). Trend Scores (1-10) are assigned based on relevance to core agent research themes, novelty of approach, and potential impact.

This snapshot reflects papers submitted or updated during the week of May 22-28, 2026. Collection was limited by API rate limits on cs.AI and cs.MA categories; Brave Search provided supplementary coverage.

This Week’s Data

Top Papers by Trend Score

RankTitleArXiv IDTrendKey Innovation
1MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation2605.273668Unified skill lifecycle management (creation, memory, evaluation, refinement)
2SIA: Self Improving AI with Harness & Weight Updates2605.272768Combined harness and weight updates for autonomous improvement (56.6% LawBench gain)
3FinHarness: An Inline Lifecycle Safety Harness for Finance LLM Agents2605.273337Finance-specific safety harness (ASR reduced from 38.3% to 15.0%)
4QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents2605.270687Multimodal agent auditing (15.1% spatial hallucination, 50%+ baseless accusations)
5Alignment Tampering: How RLHF Is Exploited to Optimize Misaligned Biases2605.273556RLHF vulnerability when LLM influences preference datasets (ICML 2026)
6ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agents2605.272405Maslow-grounded proactive memory retrieval for emotional support
ArXiv IDTitleCategoryTrendFocus
2605.27366MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluationcs.AI8Self-improving, skill lifecycle
2605.27276SIA: Self Improving AI with Harness & Weight Updatescs.AI8Self-improving, meta-agent
2605.27333FinHarness: An Inline Lifecycle Safety Harness for Finance LLM Agentscs.CL7Safety harness, finance
2605.27068QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agentscs.CL7Multimodal, auditing, hallucination
2605.27355Alignment Tampering: How RLHF Is Exploited to Optimize Misaligned Biasescs.AI6RLHF, alignment, safety
2605.27240ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agentscs.CL5Emotional support, memory
2605.27294Separating Semantic Competition from Context Length in RAG Readingcs.CL3RAG, retrieval
2605.27220The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG Systemcs.CL3RAG, production
2605.27156LitSeg: Narrative-Aware Document Segmentation for Literary RAGcs.CL4RAG, segmentation
2605.27110BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoningcs.CR4Jailbreak, agent safety
2605.27030Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scalingcs.CL4Reasoning, test-time scaling
2605.27190Learning When to Think While Listening in Large Audio-Language Modelscs.CL4Audio-language, reasoning

Week-over-Week Summary

MetricThis WeekLast WeekChange
Total papers collected50498-448 (-89.9%)
Agent-related papers18167-149 (-89.2%)
Multi-agent systems328-25 (-89.3%)
Avg trend score (agent)5.2-N/A
Top trend score810-2

Note: This week’s collection was impacted by ArXiv API rate limits (cs.AI, cs.MA categories blocked; cs.CL succeeded). The 89.9% reduction in total papers reflects partial coverage, not an actual decline in submissions. Full coverage expected to resume next week.

Ecosystem Metrics

CategoryCountPercentage
Total papers scanned50100%
Agent-related papers1836.0%
Multi-agent systems36.0%
Safety-related48.0%
RAG-related48.0%
Reasoning510.0%
Multimodal24.0%

Category Distribution

Primary CategoryCountPercentage
cs.CL3264.0%
cs.AI816.0%
cs.LG612.0%
cs.CV24.0%
cs.CR12.0%

Topic Clusters

ClusterPapersKeywords
Self-improving agents3skill lifecycle, weight updates, meta-agent
Safety harnesses4finance, emotional support, jailbreak, RLHF
RAG optimization4retrieval, segmentation, coverage, competition
Multimodal auditing2hallucination, social deduction
Reasoning control2test-time scaling, audio-language
  • Self-Improving Agent Architectures Converge: MUSE-Autoskill and SIA independently arrive at nearly identical architectures—skill lifecycle management combined with weight/harness updates. This convergence suggests a canonical approach to agent autonomy is emerging. MUSE-Autoskill provides the theoretical framework (creation, memory, evaluation, refinement), while SIA validates it with 56.6% improvement on LawBench.

  • Domain-Specific Safety Harnesses Emerge: Generic agent safety frameworks are giving way to specialized solutions. FinHarness targets finance LLM agents with a three-module architecture (Query Monitor, Tool Monitor, Cascade), reducing attack success rate from 38.3% to 15.0% while preserving benign approval rates. ENPMR-Bench addresses emotional support agents with Maslow-grounded proactive memory retrieval. This specialization trend indicates one-size-fits-all safety is insufficient for production deployment.

  • RLHF Structural Vulnerability Identified: Alignment Tampering (accepted to ICML 2026) demonstrates a fundamental flaw in RLHF’s preference feedback loop—when LLM outputs influence preference datasets, the training process can amplify misaligned biases rather than correct them. This is not an implementation bug but a structural vulnerability in the RLHF paradigm itself.

  • Multimodal Hallucination Persists: QUACK reveals that top vision-language models hallucinate 15.1% of spatial claims and make more than 50% of accusations without grounded evidence in social deduction scenarios. The framework introduces a systematic auditing methodology, but the results underscore that multimodal grounding remains unsolved.

  • RAG Understanding Deepens: Three RAG papers advance retrieval understanding from different angles: The Coverage Illusion exposes the gap between synthetic and real query distributions; LitSeg brings narrative-aware segmentation to literary works; Semantic Competition isolates retrieval interference from context length effects. Collectively, these suggest production RAG systems have systematic blind spots.

🔺 Scout Intel: What Others Missed

Confidence: high | Novelty Score: 65/100

This week’s ArXiv snapshot reveals three emerging patterns that mainstream coverage overlooks:

1. Self-improving agent convergence: MUSE-Autoskill and SIA independently arrive at similar architectures—skill lifecycle combined with weight/harness updates—suggesting this may become the canonical approach for agent autonomy. The convergence across research teams (Huawei, independent researchers) indicates a theoretical attractor rather than coincidence.

2. Domain-specific safety harnesses: FinHarness (finance) and ENPMR-Bench (emotional support) demonstrate that general agent safety frameworks need domain-specific tuning to achieve practical protection rates. FinHarness’s 38.3% to 15.0% ASR reduction comes from finance-specific modules (Query Monitor, Tool Monitor, Cascade) that understand transaction semantics. Generic safety benchmarks systematically overestimate protection for vertical applications.

3. RLHF structural vulnerability: Alignment Tampering (ICML 2026) shows RLHF’s preference feedback loop can be exploited—a fundamental flaw that may require rethinking post-training alignment. The paper demonstrates that when LLM outputs influence preference datasets, the optimization process amplifies undesired behaviors rather than correcting them. This has implications for all frontier model providers currently relying on RLHF as their primary alignment mechanism.

Key Implication: Teams deploying agents in production should evaluate domain-specific safety harnesses rather than relying on generic safety benchmarks—FinHarness’s 23.3 percentage point ASR improvement demonstrates that safety measurement is currently misaligned with deployment reality.

Previous Snapshots

Sources

Collection Note: This snapshot achieved partial coverage due to ArXiv API rate limits affecting cs.AI and cs.MA categories. Full coverage is expected to resume in next week’s snapshot.

hbxwk91k5ctrl1oly9kn████tzj24x37ym03ut20hhdopsvanshbjei1████ipqwmb6g90pboxbk44x0ojiidoxmdh0jl████ttz5dnohtnm3linjo0bks2gm4sspxbht████ctv5gwztueih3q0tnvq6oekz8fd09kvtd░░░ub21pk9jqqehi31wwg5az42g27o1dihdf████4eq2zfows8qr5lv07g1abf4ugyjliehd████aci9oifqs46ip7duqutlmhzyuhqiv6qk░░░bjjc092nr8nu9otrlt2vde7bfdli5s6qq░░░318npf64fu6gm6nz1bq531air1f2uklu████f22qov15hbahxz0x01xhmf32adn7n8p░░░z79twmxptgoaqcoh9bvinwglr5ay7yhj9░░░xyjuf8j4qwb0oxv6j1hkf2fu6vyynkaft8░░░b71hv0gobzh4obawtllqxjdsdckhpgra4░░░1dtqhx24bjonvtgquqv9riunc2o6u9cs████swdz2q2i03kgokli0txjw0xex6vq70s9░░░et5ayv1m6ycyh4kq9rudwldew0el91v3j░░░e1aqgffm3sv1z49n3j79bdhumjlreq2yw░░░honmeig9sohf8caxi6b4ccelhvku8y5████f8mw9n0b367bsjh2giillt0of9qx8k3n2g████al51fsmxctuvpmd0fifx8yv6kttjdbg9░░░ujnq8rayvhodfwtgw2iykcetamiq3r25e░░░1a5mcl5k3il0vro1jub99nfqpowo6t1xnn████b64yhezglqahkbnejad57stff3ylk3bn████xeyexcq0b5nfdmyraccefwe338i4z33a████k5jpb6obc1tvbrf9pp4t9h8ve5jvx3rg4████et5vw37vqvkibap97em1k4cfxnmzj62q████b20rqo9kpen8i7awjp8fr3fueyztejitr████zpl1v2tix4d93d1z4d4zr5u89t1qd7zi░░░acxlxf14z0olo3w1nudo28fi31psaz2q6████al7kmalrtr6mrmr7mytoclus424sm8f░░░3v6y8cglsm20uib8zru5a3oebxf3itf6░░░bcggu38vyrb1ukgk7shaxwdkbtg9gk62e░░░ijbu5jfwmkghrsixcuwg5qxg3j9zgm7yb░░░3bwrlenf59buu7bjhyyfropabbjius████hw6wifpbnd9ha83w85plyc4obwvryktyr████zsit559k72h3uqp7nbauazq4a6pdm3c7████5juxsp39wqkecioq8k45t48pccpx0ed5g████y2tuwyu7f38ld9n02n5snsm5tjshi65i████h6yfd7ayvkibs3rhpiydf54jl44k9748k████t3p0jk1opmxvytsovv3ca9kuts50fxbd░░░izkojzyga8ca7pnnjgia0o85holhksl3q████81cipn2y9mogep9kirbpwm3itefup12░░░ex81lfn6bq94xtyrrbfgt57hkw5f3auge████chuzip9ot2tunqxg41x759pzs63sa476░░░zh9qpwv6arhyut6tz54gi9k5fzwmqvng░░░f7lssnro04kv572nzlwsohm56vieimlnd░░░aq1wh6t2yq9pefzgjqqjlmjxconz42am████iksnra5dgbhenwnotnkvectk1m3rgg6e████hw68bo55sfxslnsdr4pdt4pp2sbxr2d7░░░x2h8bifoues