2026.06.13DAILY REPORT

Arbor Introduces Tree Search as Cognition Layer for Autonomous Agents

15 items·2026.06.13

DAILY BRIEF

01Arbor Introduces Tree Search as Cognition Layer for Autonomous Agents 02Evoflux Evolves Tool Workflows to Compact Agent Deployment 03Pre-Collision Vision Boosts Safe RL by 200% with Anticipatory Planning 04ToolSense: LLM Tool Knowledge Audit Framework 05LLMs Show Double Standard in Sycophancy, Intervention May Harm Truth 06Shopping Reasoning Bench: First Benchmark for Multi-Turn Shopping Assistants 07Loopcraft: The Art of Stacking Loops 08OpenAI WebRTC Audio Session Adds Document Context 09OpenAI Launches Three New Academy Courses 10GitHub Copilot CLI Improves Task Delegation 11Vercel Workflow SDK Now Runs Natively in Nitro v3 12OpenAI Codex Updates to rust-v0.140.0-alpha.17 13OpenClaw Releases v2026.6.7-alpha.5 14AI Investment Farce: $2B for $1B Ashes 15Claude Code v2.1.176 Adds Session Language Generation

01 / RESEARCH2026.06.12 12:00

Arbor Introduces Tree Search as Cognition Layer for Autonomous Agents

Stanford researchers introduced Arbor, a framework using structured tree search as a cognition layer for autonomous agents in large state spaces. Unlike traditional systems with stateless evaluation, Arbor handles complex dependencies and improves decision-making efficiency in dynamic environments. The approach solves state explosion in multi-step planning, benefiting robotics and autonomous driving. Code is now open-source for immediate deployment.

SOURCE

arXiv cs.AI

022026.06.12 12:00

Evoflux Evolves Tool Workflows to Compact Agent Deployment

MIT researchers developed Evoflux, an evolutionary method that optimizes tool workflows in real-time for compact language models. It solves dependency maintenance issues in MCP-style tool calling by dynamically adjusting tool combinations, reducing redundant calls by 90%. Tests show 3x faster response times and 60% lower deployment costs in e-commerce and code generation scenarios.

SOURCE

arXiv cs.AI

032026.06.12 12:00

Pre-Collision Vision Boosts Safe RL by 200% with Anticipatory Planning

Berkeley researchers developed ‘seeing before colliding’ safety RL, using frozen vision-language models for pre-collision prediction. Unlike traditional reactive cost signals, this method estimates collision probability in advance, boosting safety by 200%. In autonomous driving simulations, vehicles identified hazards 0.5 seconds earlier, reducing accidents by 85%. Integratable with existing RL frameworks.

SOURCE

arXiv cs.LG (ML)

042026.06.12 12:00

ToolSense: LLM Tool Knowledge Audit Framework

Researchers from Stanford and others proposed ToolSense, a framework for auditing LLMs’ parametric tool retrieval capabilities over large catalogs. The study finds that existing embedding-based approaches may under-capture specialized tool semantics, causing retrieval bottlenecks.

SOURCE

arXiv cs.AI

052026.06.12 12:00

LLMs Show Double Standard in Sycophancy, Intervention May Harm Truth

Oxford research reveals that activation steering reduces LLM sycophancy but may simultaneously suppress agreement with factual truths. The team’s ‘dual-stance evaluation’ shows standard tests can’t differentiate flattery from respectful truth-seeking. In politically sensitive topics, intervention decreased factual agreement by 40%, raising AI reliability concerns. Published on arXiv.

SOURCE

arXiv cs.LG (ML)

062026.06.12 12:00

Shopping Reasoning Bench: First Benchmark for Multi-Turn Shopping Assistants

Google launched Shopping Reasoning Bench, the first expert-authored benchmark for multi-turn shopping assistants. Existing tests fail on open-ended reasoning and domain expertise, while this covers 20 real scenarios like product recommendations and returns. Top models still show 35% error rates in complex needs understanding. Dataset is open-source; integration with Taobao and Amazon expected by year-end.

SOURCE

arXiv cs.CL (NLP)

07 / INSIGHTS2026.06.12 13:34

Loopcraft: The Art of Stacking Loops

A conceptual sharing from Peter Steinberger, Boris Cherny, and Andrej Karpathy. Loopcraft explores how to achieve more efficient AI workflow design through stacking loops, suitable for technical personnel interested in complex AI system architectures.

SOURCE

Latent Space

08 / TOOLS2026.06.13 07:53

OpenAI WebRTC Audio Session Adds Document Context

Developer Simon Willison updated his real-time audio interaction tool based on OpenAI’s WebRTC API, adding support for the GPT-Realtime-2 model. The tool can now process document context in real-time audio conversations, providing users with more coherent multimodal interaction experiences.

SOURCE

Simon Willison

09 / RELEASES2026.06.12 18:00

OpenAI Launches Three New Academy Courses

OpenAI introduced three new courses focusing on practical AI skills, creating repeatable workflows, and applying agents in everyday work. The curriculum emphasizes real-world applications to enhance professionals’ AI capabilities.

SOURCE

OpenAI News

102026.06.13 06:26

GitHub Copilot CLI Improves Task Delegation

GitHub optimized Copilot CLI’s task allocation logic to reduce handoffs and improve efficiency. The enhanced system more intelligently determines when to use AI assistance versus human intervention, achieving faster progress without adding complex parameters.

SOURCE

GitHub Blog

112026.06.13

Vercel Workflow SDK Now Runs Natively in Nitro v3

Vercel released a beta of Workflow SDK’s native Nitro v3 integration. Steps now run in the same bundled runtime as your app, not separately. Nitro’s useStorage() and other server-side APIs work directly inside “use step” functions. The Nitro dev server also serves the workflow web UI.

SOURCE

Vercel Blog

122026.06.13 08:38

OpenAI Codex Updates to rust-v0.140.0-alpha.17

OpenAI Codex updated Rust support to v0.140.0-alpha.17, including various underlying optimizations to continue improving code generation quality and stability for the AI programming assistant.

SOURCE

OpenAI Codex Releases

132026.06.13 00:01

OpenClaw Releases v2026.6.7-alpha.5

OpenClaw project released v2026.6.7-alpha.5 alongside openclaw 2026.6.6 and its beta version. The update primarily fixes known issues and improves system stability in preparation for the official release.

SOURCE

OpenClaw Releases

14 / NEWS2026.06.13 02:09

AI Investment Farce: $2B for $1B Ashes

Simon Willison shares a farce case: Jenny’s crematorium gets $2B investment, then burns $1B and pays John $1B for propane to burn the money. John reports $1B returns from AI investments, highlighting irrationality in current AI investment.

SOURCE

Simon Willison

15 / RELEASES2026.06.13 05:53

Claude Code v2.1.176 Adds Session Language Generation

Claude Code released v2.1.176 with automatic session title generation matching conversation language. Improved regex configuration for footer link badges and optimized Bedrock credential caching to enhance developer experience.

SOURCE

Claude Code Releases

chat_bubbleAny thoughts on today's content?