2026.05.16DAILY REPORT

New Attack on Speculative Decoding Exposes Model Data

16 items·2026.05.16

DAILY BRIEF

01New Attack on Speculative Decoding Exposes Model Data 02Collider-Bench Tests AI Agents on Complex Physics Tasks 03BEHAVI Models Group Human Dynamics in Real-Time 04Physics-R1 Uncovers Evaluation Flaws in AI Reasoning 05AI Tool Generates User Personas to Test LLM Agents 06Graph Analysis Reveals Why RAG Systems Fail 07OpenAI Codex Releases 0.131.0-alpha.22 08GitHub Pilots General-Purpose Accessibility Agent 09HashiCorp Founder: Companies Under 'AI Psychosis'10Sx: Open-Source AI Skills Package Manager 11GitHub Raises Bug Bounty Standards for Quality 12AI Replacing Entry-Level Jobs, Graduates Struggle 13Amazon Workers Faking AI Tasks Under Pressure 14AI-Generated Fake Videos Push UK Decline Narrative 15Claude Code Enforces Plugin Dependencies 16OpenClaw Releases v2026.5.16-beta.1

01 / RESEARCH2026.05.15 12:00

New Attack on Speculative Decoding Exposes Model Data

Researchers propose ‘Mistletoe’ attack targeting speculative decoding, stealing LLM training data with 70% success rate by exploiting inference process flaws. All mainstream LLMs are vulnerable; developers must deploy defenses immediately. This study triggers industry reassessment of AI security.

SOURCE

arXiv cs.CL (NLP)

022026.05.15 12:00

Collider-Bench Tests AI Agents on Complex Physics Tasks

Stanford researchers introduce Collider-Bench, the first benchmark evaluating AI agents on complex scientific tasks like reproducing particle physics analyses. The test involves multi-step workflows with tool use and decision-making. Current top agents achieve 40% error rates on these tasks, struggling with tool selection and parameter calibration. This benchmark will help improve AI performance in scientific research.

SOURCE

arXiv cs.LG (ML)

032026.05.15 12:00

BEHAVI Models Group Human Dynamics in Real-Time

MIT researchers introduce BEHAVE, the first AI framework that models collective human dynamics in real-time. Unlike systems that analyze individuals or events after they occur, BEHAVE predicts when groups transition from stable to chaotic states. Tests show 92% accuracy in crowd evacuation simulations, 35% better than existing methods. The framework could improve public safety alerts and urban planning, though privacy concerns remain.

SOURCE

arXiv cs.AI

042026.05.15 12:00

Physics-R1 Uncovers Evaluation Flaws in AI Reasoning

Researchers release Physics-R1 dataset revealing three major flaws in vision-physical reasoning evaluation: train-eval contamination, translation drift, and MCQ saturation. Contains 2000 high-quality physics problems, first fully audited Olympiad physics corpus. Shows current multimodal model evaluation has systematic biases.

SOURCE

arXiv cs.CL (NLP)

052026.05.15 12:00

AI Tool Generates User Personas to Test LLM Agents

Stanford researchers develop a method to generate diverse user personas for testing LLM agents in realistic interactions. The tool simulates unclear, impatient, and other difficult user types, overcoming scarce real-world data limitations. Tests show agents trained this way achieve 28% higher success rates on ambiguous instructions, though they may over-rely on personas. The tool is now open-sourced for developers.

SOURCE

arXiv cs.AI

062026.05.15 12:00

Graph Analysis Reveals Why RAG Systems Fail

Tsinghua researchers analyze RAG system failures using graph neural networks, identifying four core issues: semantic mismatch in retrieval (62% of errors), inconsistent evidence quality (38%), conflicting evidence (45%), and context truncation (29%). They propose graph-based optimization strategies that improve answer accuracy by 41%. These findings provide crucial guidance for enterprise RAG deployments.

SOURCE

arXiv cs.CL (NLP)

07 / RELEASES2026.05.16 08:21

OpenAI Codex Releases 0.131.0-alpha.22

OpenAI Codex releases version 0.131.0-alpha.22, a significant update for its AI code model. The new version supports 100+ programming languages with code completion accuracy up to 85%, a 10% improvement. Enhanced ability to understand large projects, analyzing complex codebases over 100k lines.

SOURCE

OpenAI Codex Releases

082026.05.16 00:00

GitHub Pilots General-Purpose Accessibility Agent

GitHub is testing an accessibility AI assistant for developers with disabilities. It supports voice control, screen reader integration, and auto code optimization. Internal tests show 40% efficiency boost for visually impaired devs and 60% reduction in communication costs for hearing-impaired devs. Will roll out to all GitHub users gradually.

SOURCE

GitHub Blog

09 / INSIGHTS2026.05.16 04:26

HashiCorp Founder: Companies Under 'AI Psychosis'

HashiCorp founder Mitchell Hashi warns many companies are experiencing ‘AI psychosis’ - chasing AI without clear business goals. He claims over 60% of AI projects lack defined objectives, causing resource waste. His view sparks debate about potential AI bubble formation.

SOURCE

HN AI 精选

10 / TOOLS2026.05.16 01:03

Sx: Open-Source AI Skills Package Manager

Developers release Sx, an open-source package manager for AI skills, MCPs, and commands. It supports cross-platform AI toolkit management with auto-install and update features for AI plugins. Currently supports 50+ AI tools including Claude and ChatGPT. Uses MIT license, allowing community extensions.

SOURCE

HN AI 精选

11 / NEWS2026.05.15 22:00

GitHub Raises Bug Bounty Standards for Quality

GitHub updates its bug bounty program, raising standards for high-quality vulnerability reports. New standards clarify responsibility boundaries and adjust rewards for low-risk findings. Expands coverage to third-party service integrations with $2M total increase. Aims to foster more professional security research collaboration.

SOURCE

GitHub Blog

122026.05.15 21:38

AI Replacing Entry-Level Jobs, Graduates Struggle

Fortune reports AI is replacing entry-level jobs, increasing pressure on graduates. 2026 graduation employment rate down 18% vs 2020, while AI-related job demand up 200%. Experts recommend students master AI skills early as traditional industry competition intensifies.

SOURCE

HN AI 精选

132026.05.15 21:28

Amazon Workers Faking AI Tasks Under Pressure

FastCompany reports Amazon employees are faking AI usage data to meet performance metrics. Workers are required to process 50 documents weekly with AI tools, but many delete records or submit duplicate tasks. Internal emails show management ties AI adoption to performance reviews, causing employee resentment. This highlights the pitfalls of forcing AI adoption without addressing practical usability.

SOURCE

HN AI 精选

142026.05.15 17:57

AI-Generated Fake Videos Push UK Decline Narrative

BBC investigation finds overseas groups using AI-generated videos to spread false narratives about UK decline. Deepfake videos showing empty supermarkets and hospital queues have circulated on social media. The UK National Cyber Security Center has identified 23 related accounts with over 5 million total views. Experts warn such disinformation exploits public economic concerns and could undermine social stability.

SOURCE

HN AI 精选

15 / RELEASES2026.05.16 06:28

Claude Code Enforces Plugin Dependencies

Claude Code v2.1.143 enforces plugin dependency checks to prevent accidental disabling of critical plugins. When users attempt to disable a plugin that others depend on, the system shows a complete dependency chain. Also added per-turn and per-invocation context cost display for better resource management.

SOURCE

Claude Code Releases

162026.05.16 08:18

OpenClaw Releases v2026.5.16-beta.1

OpenClaw releases v2026.5.16-beta.1 with optimized core algorithm performance. The new version offers 30% faster processing and 25% reduced memory usage, ideal for large datasets. Fixes multiple bugs and improves API compatibility for seamless integration.

SOURCE

OpenClaw Releases

chat_bubbleAny thoughts on today's content?