Arbor Introduces Tree Search as Cognition Layer for Autonomous Agents
Arbor Introduces Tree Search as Cognition Layer for Autonomous Agents
Stanford researchers introduced Arbor, a framework using structured tree search as a cognition layer for autonomous agents in large state spaces. Unlike traditional systems with stateless evaluation, Arbor handles complex dependencies and improves decision-making efficiency in dynamic environments. The approach solves state explosion in multi-step planning, benefiting robotics and autonomous driving. Code is now open-source for immediate deployment.
Evoflux Evolves Tool Workflows to Compact Agent Deployment
MIT researchers developed Evoflux, an evolutionary method that optimizes tool workflows in real-time for compact language models. It solves dependency maintenance issues in MCP-style tool calling by dynamically adjusting tool combinations, reducing redundant calls by 90%. Tests show 3x faster response times and 60% lower deployment costs in e-commerce and code generation scenarios.
Pre-Collision Vision Boosts Safe RL by 200% with Anticipatory Planning
Berkeley researchers developed ‘seeing before colliding’ safety RL, using frozen vision-language models for pre-collision prediction. Unlike traditional reactive cost signals, this method estimates collision probability in advance, boosting safety by 200%. In autonomous driving simulations, vehicles identified hazards 0.5 seconds earlier, reducing accidents by 85%. Integratable with existing RL frameworks.
ToolSense: LLM Tool Knowledge Audit Framework
Researchers from Stanford and others proposed ToolSense, a framework for auditing LLMs’ parametric tool retrieval capabilities over large catalogs. The study finds that existing embedding-based approaches may under-capture specialized tool semantics, causing retrieval bottlenecks.
LLMs Show Double Standard in Sycophancy, Intervention May Harm Truth
Oxford research reveals that activation steering reduces LLM sycophancy but may simultaneously suppress agreement with factual truths. The team’s ‘dual-stance evaluation’ shows standard tests can’t differentiate flattery from respectful truth-seeking. In politically sensitive topics, intervention decreased factual agreement by 40%, raising AI reliability concerns. Published on arXiv.
Shopping Reasoning Bench: First Benchmark for Multi-Turn Shopping Assistants
Google launched Shopping Reasoning Bench, the first expert-authored benchmark for multi-turn shopping assistants. Existing tests fail on open-ended reasoning and domain expertise, while this covers 20 real scenarios like product recommendations and returns. Top models still show 35% error rates in complex needs understanding. Dataset is open-source; integration with Taobao and Amazon expected by year-end.
Loopcraft: The Art of Stacking Loops
A conceptual sharing from Peter Steinberger, Boris Cherny, and Andrej Karpathy. Loopcraft explores how to achieve more efficient AI workflow design through stacking loops, suitable for technical personnel interested in complex AI system architectures.
OpenAI WebRTC Audio Session Adds Document Context
Developer Simon Willison updated his real-time audio interaction tool based on OpenAI’s WebRTC API, adding support for the GPT-Realtime-2 model. The tool can now process document context in real-time audio conversations, providing users with more coherent multimodal interaction experiences.
OpenAI Launches Three New Academy Courses
OpenAI introduced three new courses focusing on practical AI skills, creating repeatable workflows, and applying agents in everyday work. The curriculum emphasizes real-world applications to enhance professionals’ AI capabilities.
GitHub Copilot CLI Improves Task Delegation
GitHub optimized Copilot CLI’s task allocation logic to reduce handoffs and improve efficiency. The enhanced system more intelligently determines when to use AI assistance versus human intervention, achieving faster progress without adding complex parameters.
Vercel Workflow SDK Now Runs Natively in Nitro v3
Vercel released a beta of Workflow SDK’s native Nitro v3 integration. Steps now run in the same bundled runtime as your app, not separately. Nitro’s useStorage() and other server-side APIs work directly inside “use step” functions. The Nitro dev server also serves the workflow web UI.
OpenAI Codex Updates to rust-v0.140.0-alpha.17
OpenAI Codex updated Rust support to v0.140.0-alpha.17, including various underlying optimizations to continue improving code generation quality and stability for the AI programming assistant.
OpenClaw Releases v2026.6.7-alpha.5
OpenClaw project released v2026.6.7-alpha.5 alongside openclaw 2026.6.6 and its beta version. The update primarily fixes known issues and improves system stability in preparation for the official release.
AI Investment Farce: $2B for $1B Ashes
Simon Willison shares a farce case: Jenny’s crematorium gets $2B investment, then burns $1B and pays John $1B for propane to burn the money. John reports $1B returns from AI investments, highlighting irrationality in current AI investment.
Claude Code v2.1.176 Adds Session Language Generation
Claude Code released v2.1.176 with automatic session title generation matching conversation language. Improved regex configuration for footer link badges and optimized Bedrock credential caching to enhance developer experience.