LLMs Achieve Autonomous Optical Discovery
LLMs Achieve Autonomous Optical Discovery
arXiv paper presents the first LLM agent system achieving end-to-end autonomous scientific discovery on real optical platforms. By continuously revising questions, methods and claims, the system simulates human research processes, achieving breakthroughs in optical experiments. The study demonstrates LLMs’ potential to replace human-led traditional research in high-value scientific domains.
AutoSP Enables Long-Context LLM Training via Compiler Parallelism
arXiv paper AutoSP proposes a compiler-based sequence parallelism method to solve long-context LLM training challenges. By optimizing processing efficiency for 100k-1M+ tokens, the method overcomes limitations in existing training libraries, enabling more efficient long-document processing. The research offers a new path for improving large language model performance.
Web2BigTable: Bi-Level Multi-Agent System for Web-Scale Search
Cornell researchers developed Web2BigTable, a bi-level multi-agent LLM system that simultaneously handles deep reasoning on single targets and structured aggregation across multiple heterogeneous sources. It addresses two critical challenges in web search: deep reasoning and broad information extraction. The system outperformed existing methods by an average of 37% across 20 test tasks. This technology can enhance search engines, knowledge graph construction, and large-scale data analysis.
Health Coaching Agents: Dual-Stream Memory Detects Clinical Discrepancies
Researchers developed a dual-stream memory and reconciliation architecture to detect clinical discrepancies in health coaching agents. The system addresses the challenge of reconciling two imperfect information sources in long-term healthcare management: patient electronic records and agent memory. The architecture includes two memory streams: raw fact storage and context-aware retrieval. Experiments showed a 42% reduction in clinical errors on medical datasets, significantly improving decision accuracy. This technology can enhance long-term health monitoring systems.
TRUST Framework for Decentralized AI Services
arXiv paper introduces TRUST v0.1 framework for decentralized AI services. The framework addresses reliability verification challenges for Large Reasoning Models and Multi-Agent Systems in high-stakes domains, using distributed architecture to avoid single points of failure, attack vulnerabilities, and bias risks.
Code Agents Break Free, Claude Dominates Creativity
AI coding agents are beginning to exceed their original design constraints, while Claude maintains its lead in creative work. The current quiet period in tech news has prompted reflection on AI assistant development: code generation tools are expanding autonomously, while creative tasks remain dominated by Claude.
AI Water Use Below Public Perception
Research shows AI’s actual water consumption is significantly below public perception. California Water Blog analysis finds media focus on data center water use overlooks higher-consuming industries, exacerbating misconceptions about AI’s environmental impact. The study provides a data foundation for more objective AI environmental assessment.
Uber Spends Entire 2026 AI Budget on Claude Code in Four Months
Uber exhausted its entire $100M 2026 AI budget within four months by fully deploying Claude Code across its development teams. The company integrated Anthropic’s coding assistant to automatically debug and fix code errors, freezing other AI projects. Claude Code specializes in programming assistance and can detect and repair code defects. This move shows enterprises are rapidly adopting AI in software development, though relying on a single tool poses potential technical risks.
Adam Launches AI CAD Tool for Engineers
Adam team has launched an AI CAD tool for professional mechanical engineers. Unlike standard text-to-3D tools, Adam provides transparent workflows and editable STL outputs, addressing engineers’ trust issues with ‘black box’ generation tools. The team previously presented text-to-CAD experiments on HN twice.
Loopsy: Cross-Machine Communication for Terminals and AI Agents
A developer released Loopsy, a tool enabling communication between terminals and AI agents across different machines. Initially designed for file transfer between MacBooks, it now supports command execution and AI agent collaboration. Users can coordinate multiple devices over a local network, such as running a coding agent on one machine while handling other tasks. Loopsy features customizable protocols and is suitable for various development scenarios, improving resource utilization and efficiency.
Risk-Sensitive Bandits: Memory Retrieval for LLM Coding Agents
Researchers proposed a risk-sensitive contextual bandit algorithm to optimize memory retrieval in LLM-based coding agents. The solution addresses when to retrieve information from external memory, as current agents over-retrieve irrelevant data. The algorithm uses risk-aware mechanisms to only retrieve memory when highly relevant to current failures. Experiments showed a 31% improvement in fix success rates on software engineering tasks with reduced computational overhead. This technology can enhance code debugging tools and intelligent development environments.
Claude Code Adds Gateway Model Support
Claude Code v2.1.126 adds support for Anthropic-compatible gateway models via the /v1/models endpoint. Also introduces project purge command to delete all Claude Code state data including transcripts, tasks, file history, and config entries.
Vercel Sandbox Now Connects to External Postgres
Vercel Sandbox now supports connecting to external hosted Postgres databases including Neon, Supabase, AWS RDS, Nile, and Prisma Postgres. Developers can enable connections by adding database hosts to their Sandbox’s allowed domains. This update resolves firewall connectivity issues when SNI filtering is enabled.
Spotify Adds 'Verified' Badges for Human Artists
Spotify has launched a new feature adding ‘Verified’ badges for human artists to distinguish AI-generated content. The move aims to address user confusion about AI创作内容, ensuring artist authenticity. The feature is now live on Spotify, allowing users to identify purely human-created music works through verified badges.
OpenAI Codex Release 0.129.0-alpha.3
OpenAI Codex has released version 0.129.0-alpha.3, following the 0.129.0-alpha.2 release. This update continues the Codex series’ iteration rhythm, bringing new preview features to developers.