2026.05.16DAILY REPORT

New Attack on Speculative Decoding Exposes Model Data

16 items·2026.05.16
01 / RESEARCH2026.05.15 12:00

New Attack on Speculative Decoding Exposes Model Data

Researchers propose ‘Mistletoe’ attack targeting speculative decoding, stealing LLM training data with 70% success rate by exploiting inference process flaws. All mainstream LLMs are vulnerable; developers must deploy defenses immediately. This study triggers industry reassessment of AI security.

022026.05.15 12:00

Collider-Bench Tests AI Agents on Complex Physics Tasks

Stanford researchers introduce Collider-Bench, the first benchmark evaluating AI agents on complex scientific tasks like reproducing particle physics analyses. The test involves multi-step workflows with tool use and decision-making. Current top agents achieve 40% error rates on these tasks, struggling with tool selection and parameter calibration. This benchmark will help improve AI performance in scientific research.

032026.05.15 12:00

BEHAVI Models Group Human Dynamics in Real-Time

MIT researchers introduce BEHAVE, the first AI framework that models collective human dynamics in real-time. Unlike systems that analyze individuals or events after they occur, BEHAVE predicts when groups transition from stable to chaotic states. Tests show 92% accuracy in crowd evacuation simulations, 35% better than existing methods. The framework could improve public safety alerts and urban planning, though privacy concerns remain.

042026.05.15 12:00

Physics-R1 Uncovers Evaluation Flaws in AI Reasoning

Researchers release Physics-R1 dataset revealing three major flaws in vision-physical reasoning evaluation: train-eval contamination, translation drift, and MCQ saturation. Contains 2000 high-quality physics problems, first fully audited Olympiad physics corpus. Shows current multimodal model evaluation has systematic biases.

052026.05.15 12:00

AI Tool Generates User Personas to Test LLM Agents

Stanford researchers develop a method to generate diverse user personas for testing LLM agents in realistic interactions. The tool simulates unclear, impatient, and other difficult user types, overcoming scarce real-world data limitations. Tests show agents trained this way achieve 28% higher success rates on ambiguous instructions, though they may over-rely on personas. The tool is now open-sourced for developers.

062026.05.15 12:00

Graph Analysis Reveals Why RAG Systems Fail

Tsinghua researchers analyze RAG system failures using graph neural networks, identifying four core issues: semantic mismatch in retrieval (62% of errors), inconsistent evidence quality (38%), conflicting evidence (45%), and context truncation (29%). They propose graph-based optimization strategies that improve answer accuracy by 41%. These findings provide crucial guidance for enterprise RAG deployments.

07 / RELEASES2026.05.16 08:21

OpenAI Codex Releases 0.131.0-alpha.22

OpenAI Codex releases version 0.131.0-alpha.22, a significant update for its AI code model. The new version supports 100+ programming languages with code completion accuracy up to 85%, a 10% improvement. Enhanced ability to understand large projects, analyzing complex codebases over 100k lines.

082026.05.16 00:00

GitHub Pilots General-Purpose Accessibility Agent

GitHub is testing an accessibility AI assistant for developers with disabilities. It supports voice control, screen reader integration, and auto code optimization. Internal tests show 40% efficiency boost for visually impaired devs and 60% reduction in communication costs for hearing-impaired devs. Will roll out to all GitHub users gradually.

09 / INSIGHTS2026.05.16 04:26

HashiCorp Founder: Companies Under 'AI Psychosis'

HashiCorp founder Mitchell Hashi warns many companies are experiencing ‘AI psychosis’ - chasing AI without clear business goals. He claims over 60% of AI projects lack defined objectives, causing resource waste. His view sparks debate about potential AI bubble formation.

10 / TOOLS2026.05.16 01:03

Sx: Open-Source AI Skills Package Manager

Developers release Sx, an open-source package manager for AI skills, MCPs, and commands. It supports cross-platform AI toolkit management with auto-install and update features for AI plugins. Currently supports 50+ AI tools including Claude and ChatGPT. Uses MIT license, allowing community extensions.

11 / NEWS2026.05.15 22:00

GitHub Raises Bug Bounty Standards for Quality

GitHub updates its bug bounty program, raising standards for high-quality vulnerability reports. New standards clarify responsibility boundaries and adjust rewards for low-risk findings. Expands coverage to third-party service integrations with $2M total increase. Aims to foster more professional security research collaboration.

122026.05.15 21:38

AI Replacing Entry-Level Jobs, Graduates Struggle

Fortune reports AI is replacing entry-level jobs, increasing pressure on graduates. 2026 graduation employment rate down 18% vs 2020, while AI-related job demand up 200%. Experts recommend students master AI skills early as traditional industry competition intensifies.

132026.05.15 21:28

Amazon Workers Faking AI Tasks Under Pressure

FastCompany reports Amazon employees are faking AI usage data to meet performance metrics. Workers are required to process 50 documents weekly with AI tools, but many delete records or submit duplicate tasks. Internal emails show management ties AI adoption to performance reviews, causing employee resentment. This highlights the pitfalls of forcing AI adoption without addressing practical usability.

142026.05.15 17:57

AI-Generated Fake Videos Push UK Decline Narrative

BBC investigation finds overseas groups using AI-generated videos to spread false narratives about UK decline. Deepfake videos showing empty supermarkets and hospital queues have circulated on social media. The UK National Cyber Security Center has identified 23 related accounts with over 5 million total views. Experts warn such disinformation exploits public economic concerns and could undermine social stability.

15 / RELEASES2026.05.16 06:28

Claude Code Enforces Plugin Dependencies

Claude Code v2.1.143 enforces plugin dependency checks to prevent accidental disabling of critical plugins. When users attempt to disable a plugin that others depend on, the system shows a complete dependency chain. Also added per-turn and per-invocation context cost display for better resource management.

162026.05.16 08:18

OpenClaw Releases v2026.5.16-beta.1

OpenClaw releases v2026.5.16-beta.1 with optimized core algorithm performance. The new version offers 30% faster processing and 25% reduced memory usage, ideal for large datasets. Fixes multiple bugs and improves API compatibility for seamless integration.

chat_bubbleAny thoughts on today's content?