OpenAI launches Agents SDK 2.0 with sandbox execution
OpenAI launches Agents SDK 2.0 with sandbox execution
OpenAI has updated the Agents SDK with native sandbox execution and a model-native harness, enabling developers to build secure, long-running agents that work across files and tools. The update significantly improves agent security and stability.
Cursor boosts PLG signups 5% with Vercel microfrontends
Cursor unified four web properties and ~100+ routes under cursor.com using Vercel microfrontends, increasing PLG signups by 5% through experimentation. The company also expanded localization from 4 to 11 languages, enabling its growth team to rapidly iterate on product design and rebranding.
Gemini 3.1 Flash TTS: Next-gen expressive AI speech
Google DeepMind’s newest audio model, Gemini 3.1 Flash TTS, introduces granular audio tags for precise control over AI speech generation. This breakthrough enables creation of more expressive audio content with unprecedented naturalness and control accuracy.
Replit Animation: Create videos in 10 minutes
Replit launched Animation, enabling users to create motion-style videos in minutes instead of days with professional tools like After Effects. The feature has generated over 10M organic impressions for Replit, allowing anyone to produce high-quality animations through simple commands.
Inside VAKRA: AI agent reasoning and failure analysis
Hugging Face dives deep into the VAKRA model, uncovering key findings about AI agent behavior in reasoning, tool usage, and failure patterns. The analysis reveals how agent systems behave in real-world applications and identifies limitations for building more reliable AI agents.
Agentic Systems Fail on Long-Horizon Tasks: Causes Identified
New arXiv research finds that LLM agents excel at short- and mid-horizon tasks but frequently break down on long-horizon tasks requiring extended, interdependent action sequences. The study identifies where and why these agentic systems fail, providing crucial insights for improvement.
Spatial Atlas Introduces Compute-Grounded Reasoning for Agent Benchmarks
New arXiv paper introduces compute-grounded reasoning (CGR) for spatial-aware research agents, resolving sub-problems deterministically before LLM generation. Spatial Atlas implements this paradigm, excelling in spatial-aware research benchmarks.
Order-Aware Hypergraph RAG Enhances LLM Retrieval Accuracy
New arXiv research introduces order-aware hypergraph RAG to address the unordered evidence problem in existing RAG systems. The method preserves knowledge order and context through hypergraph structures, significantly improving LLM retrieval accuracy and relevance.
Self-Distillation Zero: Binary Rewards to Dense Supervision
Stanford researchers propose Self-Distillation Zero, solving sparse supervision in AI training. The method enables models to convert binary rewards into dense supervision through self-revision, improving performance. Experiments show it outperforms existing methods in math reasoning and code generation, offering new approaches for training efficient AI models.
LoSA Speeds Up Diffusion Models by 40%
MIT researchers introduce LoSA, improving diffusion language models (DLMs) efficiency for long text generation. The method uses locality-aware sparse attention to boost speed by 40% while maintaining generation quality comparable to autoregressive models. DLMs generate tokens in any order, offering a promising alternative to traditional generation.
Libretto Makes AI Browser Automations Deterministic
Libretto, a Skill+CLI tool, makes AI browser automation scripts deterministic by shifting from runtime prompts to generating real scripts that are predictable and debuggable, ideal for developers needing stable automation.
GitHub Copilot CLI: Personal command center
A GitHub engineer demonstrates building a personal organization command center using Copilot CLI. The post details the development process of this productivity tool and how AI assisted in creating workflow automation to boost developer efficiency.
HoloTab Launches as AI Browser Companion
HCompany launched HoloTab, an AI browser companion that analyzes web content in real-time to provide smart assistance including content summaries, information extraction, and interactive suggestions, enhancing browsing efficiency.
Does AI-assisted cognition endanger humanity?
Hacker News discussion thread with 217 points and 166 comments exploring potential risks of AI-assisted cognition to human development. The post sparks deep debate about the boundaries of human-AI collaboration.
GitHub Updates Developer Policies on Liability and Transparency
GitHub updated its developer policies focusing on intermediary liability, copyright, and transparency requirements. The company refreshed its Transparency Center with full 2025 data and mandated clearer declarations of AI-generated content usage and copyright attribution.
US Court: AI Chat Content Not Covered by Attorney-Client Privilege
In US v. Heppner, the Southern District of New York ruled that AI chat content is not protected by attorney-client privilege, as AI systems are considered third-party services. This precedent highlights legal risks for lawyers using AI tools for sensitive matters.
Allbirds Pivots to AI, Stock Surges 175%
Footwear brand Allbirds announced a pivot to AI, stating it will leverage the technology to develop sustainable materials. The stock surged 175% in a single day, doubling its market value. The CEO claims AI will help reduce physical sample production and carbon emissions, marking a traditional consumer brand’s acceleration into AI.
Quiet reflection on work in the AI era
A reflective piece urging us to contemplate the essence of human work during AI’s rapid evolution. The post questions how we can redefine creativity and work value in an AI-assisted environment amidst technological noise.
LLM-HYPER: LLM-Based Hypernetworks for Ad Personalization
New arXiv research proposes LLM-HYPER, a framework treating LLMs as hypernetworks to address the cold-start problem of new ads on online platforms. It generates personalized CTR prediction models without relying on historical user feedback data.
Elevated Errors Reported on Claude.ai, API and Claude Code
Claude.ai, its API, and Claude Code services are experiencing elevated errors affecting normal usage. According to the status page, the team is investigating the issue with increased error rates reported on API calls and code generation. Engineers are working on a fix.