2026.04.03DAILY REPORT

Moonlake: Multimodal Interactive World Models

20 items·2026.04.03

DAILY BRIEF

01Moonlake: Multimodal Interactive World Models 02Self-Routing: Parameter-Free Expert Routing from Hidden States 03ParetoBandit: Budget-Paced Adaptive Routing for LLM Serving 04Study: Tool-Integrated AI Reliability Bottleneck Identified 05New Method: Trajectory Sampling for Agent Interactions 06Decision-Centric Design for LLM Systems 07The Silicon Mirror: Dynamic Behavioral Gating for Anti-Sycophancy 08Gemma 4: Most Capable Open Models to Date 09Gemini API Adds Flex and Priority Tiers 10Vercel Launches Filesystem Snapshots in Sandbox 11OpenAI Acquires TBPN to Accelerate AI Conversations 12Smart PMs Using AI Coding to Cut Design Delays 13AI Video Generation Costs $65 Per User Monthly 14Claude Code v2.1.91 Adds MCP Tool Persistence 15Waldium Launches AI-Human Blog Platform 16Google Vids Now Offers Free AI Video Generation 17Age Verification AI Group Secretly Backed by OpenAI 18Men Abandon TV for YouTube as AI and Social Media Fatigue Rise 19OpenAI Introduces Flexible Codex Pricing for Teams 20Leaked Claude Code Files Reveal New Editor

01 / RESEARCH2026.04.03 01:55

Moonlake: Multimodal Interactive World Models

Chris Manning and Fan-yun Sun introduce Moonlake, a novel approach using game engine agents to build multiplayer, interactive world models that run long-term with multi-agent collaboration.

SOURCE

Latent Space

022026.04.02 12:00

Self-Routing: Parameter-Free Expert Routing from Hidden States

Researchers introduce Self-Routing, a parameter-free method for dynamically assigning experts in MoE models by analyzing hidden states directly. It eliminates the need for traditional routers, achieving 15% faster inference and 20% reduced memory usage while maintaining performance. This approach significantly lowers deployment costs and enhances efficiency for real-time LLM services.

SOURCE

arXiv cs.AI

032026.04.02 12:00

ParetoBandit: Budget-Paced Adaptive Routing for LLM Serving

ParetoBandit optimizes multi-model LLM serving by dynamically balancing cost and quality. The system adapts routing in real-time to handle pricing changes, quality regressions, and new model launches across a 530x cost range. Tests show 40% cost reduction while maintaining 95% accuracy, making it ideal for cloud providers and enterprise AI platforms.

SOURCE

arXiv cs.LG (ML)

042026.04.02 12:00

Study: Tool-Integrated AI Reliability Bottleneck Identified

arXiv study reveals tool-integrated AI failures stem from both tool-use accuracy and intrinsic tool accuracy. Proposes a community-driven framework to improve reliability.

SOURCE

arXiv cs.AI

052026.04.02 12:00

New Method: Trajectory Sampling for Agent Interactions

arXiv paper proposes trajectory sampling and triage for agentic interactions, improving post-deployment efficiency in multi-step LLM systems.

SOURCE

arXiv cs.AI

062026.04.02 12:00

Decision-Centric Design for LLM Systems

arXiv paper argues LLM systems need explicit decision-making for controls like answering or tool-calling, improving transparency and maintainability.

SOURCE

arXiv cs.AI

072026.04.02 12:00

The Silicon Mirror: Dynamic Behavioral Gating for Anti-Sycophancy

The Silicon Mirror framework detects user manipulation tactics in real-time and adjusts AI behavior accordingly. It identifies coercive questions and false authorities, forcing LLMs to maintain objectivity even when users attempt persuasion. Tests show 89% accuracy in rejecting inappropriate requests while keeping normal interactions fluid. Ideal for customer service and educational applications.

SOURCE

arXiv cs.AI

08 / RELEASES2026.04.03 00:00

Gemma 4: Most Capable Open Models to Date

Google DeepMind released Gemma 4, the most capable open model to date, purpose-built for advanced reasoning and agentic workflows with improved inference capabilities over previous versions.

SOURCE

Google DeepMind Blog

092026.04.03 00:00

Gemini API Adds Flex and Priority Tiers

Google introduced two new inference tiers, Flex and Priority, to the Gemini API to help developers balance cost and latency, providing more flexible API usage options.

SOURCE

Google AI Blog

102026.04.03 00:00

Vercel Launches Filesystem Snapshots in Sandbox

Vercel added filesystem snapshots to Sandbox, enabling teams to capture and restore complete sandbox filesystem states. Initial engineering focus ensured reliability, preventing snapshot failures or data loss.

SOURCE

Vercel Blog

11 / NEWS2026.04.02 18:30

OpenAI Acquires TBPN to Accelerate AI Conversations

OpenAI acquired TBPN to accelerate global AI conversations and support independent media, expanding engagement with developers, businesses, and the broader tech community.

SOURCE

OpenAI News

12 / INSIGHTS2026.04.03 03:00

Smart PMs Using AI Coding to Cut Design Delays

Product managers use AI coding and vibe coding techniques to address design handoff bottlenecks, saving days per feature and weeks per quarter in product development cycles.

SOURCE

Replit Blog

132026.04.03 03:49

AI Video Generation Costs $65 Per User Monthly

OpenAI charges $20 monthly for Sora while incurring $65 in compute costs per user, making AI video generation an extremely expensive business model and sparking discussions about AI cost-effectiveness.

SOURCE

HN AI 精选

14 / RELEASES2026.04.03 07:45

Claude Code v2.1.91 Adds MCP Tool Persistence

Claude Code v2.1.91 added MCP tool result persistence via _meta annotation, supporting larger results like database schemas without truncation, and added inline shell execution disable setting.

SOURCE

Claude Code Releases

15 / NEWS2026.04.03 00:00

Waldium Launches AI-Human Blog Platform

YC-backed Waldium launches agentic CMS for businesses, automating content creation and providing MCP server endpoints for AI agents to query directly.

SOURCE

Vercel Blog

16 / RELEASES2026.04.03 00:00

Google Vids Now Offers Free AI Video Generation

Google Vids integrates Lyria 3 and Veo 3.1 to offer free high-quality video creation, editing, and sharing for users at no cost.

SOURCE

Google AI Blog

17 / NEWS2026.04.03 00:30

Age Verification AI Group Secretly Backed by OpenAI

Group pushing AI age verification secretly backed by OpenAI, aiming to influence content regulation policies through undisclosed funding.

SOURCE

HN AI 精选

182026.04.02 17:43

Men Abandon TV for YouTube as AI and Social Media Fatigue Rise

Ofcom reports a significant shift as more men move from traditional TV to YouTube, driven by growing AI use and social media fatigue. The study reveals 35% of adult men now watch YouTube for over 10 hours weekly, a 12% decrease in TV viewing. AI companion usage has surged by 27%, with 40% of 18-34-year-olds using AI for social companionship. Additionally, 28% of adults use social media for side hustles, including 35% of men. The data underscores a major transformation in digital consumption habits, highlighting mounting pressure on traditional media from emerging platforms.

SOURCE

HN AI 精选

19 / RELEASES2026.04.02 18:00

OpenAI Introduces Flexible Codex Pricing for Teams

OpenAI introduces pay-as-you-go pricing for ChatGPT Business/Enterprise, offering teams flexible adoption and scaling without upfront costs.

SOURCE

OpenAI News

20 / NEWS2026.04.02 21:02

Leaked Claude Code Files Reveal New Editor

Leaked Claude Code files reveal a new file-based documentation system featuring an innovative markdown editor and April fools’ functionality, showcasing the team’s creative direction.

SOURCE

Ben's Bites

chat_bubbleAny thoughts on today's content?