2026.03.05DAILY REPORT

Every Agent Needs a 'Box': Box Founder

16 items·2026.03.05

DAILY BRIEF

01Every Agent Needs a 'Box': Box Founder 02Anti-Patterns in Agent Engineering 03Federal Inference: Privacy-Preserving Model Collaboration 04GPT-5.2 Pro Derives Graviton Amplitudes 05Zipf-Preserving Alternative Model for Sequences 06RxnNano Trains Compact LLMs for Chemistry 07LangSmith Launches CLI and Skill Suite 08Anthropic $19B Revenue, Qwen Team Shakeup 09Qwen Team Changes Spark Concerns 10ERI Benchmarks Engineering Model Capabilities 11Meta's NLLB-200 Shows Universal Concept Structures 12LangChain Launches Skill Library Boosting AI Coding 13ATPO Algorithm Boosts Medical Dialogue Accuracy 14MoE Models Need Router Calibration for Efficient Compression 15SuperLocalMemory: Secure Multi-Agent Memory System 16Memory Extraction in Diffusion Language Models

01 / NEWS2026.03.05 08:54

Every Agent Needs a 'Box': Box Founder

Box founder Aaron Levie notes strong industry discussion around code review collaboration, suggesting growing demand for efficient coding tools. He encourages readers to review related content about improving development workflows.

SOURCE

Latent Space

02 / INSIGHTS2026.03.05 01:34

Anti-Patterns in Agent Engineering

Simon Willison identifies common anti-patterns in agent engineering to avoid, such as submitting unreviewed code to collaborators. He emphasizes developers should never submit pull requests without their own review, calling this a widespread and frustrating practice.

SOURCE

Simon Willison

03 / RESEARCH2026.03.04 13:00

Federal Inference: Privacy-Preserving Model Collaboration

An arXiv paper introduces Federal Inference (FI) technology, allowing multiple independently trained models to collaborate during inference without sharing data or parameters. This research addresses privacy issues in distributed inference through encrypted protocols ensuring data security. Experiments show FI reduces data leakage risks by 90% while maintaining model performance, making it suitable for sensitive fields like healthcare and finance.

SOURCE

arXiv cs.AI

04 / RELEASES2026.03.04 18:00

GPT-5.2 Pro Derives Graviton Amplitudes

OpenAI’s new preprint extends single-amplitude theory to gravitons, with GPT-5.2 Pro helping derive and validate non-zero graviton tree amplitudes in quantum gravity. This breakthrough could provide new tools for unified field theory.

SOURCE

OpenAI News

05 / RESEARCH2026.03.04 13:00

Zipf-Preserving Alternative Model for Sequences

arXiv paper introduces a novel symbolic sequence alternative model that preserves Zipf’s law distribution and long-range correlations in language and DNA data. The model shows potential applications in natural language processing and bioinformatics fields.

SOURCE

arXiv cs.CL (NLP)

062026.03.04 13:00

RxnNano Trains Compact LLMs for Chemistry

arXiv paper ‘RxnNano’ proposes training compact LLMs through curriculum learning for chemical reaction and retrosynthesis prediction. The model outperforms existing parameter-heavy methods in drug discovery tasks while reducing parameters by 30%.

SOURCE

arXiv cs.LG (ML)

07 / TOOLS2026.03.05 02:00

LangSmith Launches CLI and Skill Suite

LangChain released CLI tools and first skill suites, adding agent tracking, execution analysis, and performance evaluation to the LangSmith ecosystem. These features boosted Claude Code’s performance by 15% in testing sets, enhancing AI coding agents’ professional capabilities.

SOURCE

LangChain Blog

08 / NEWS2026.03.04 11:11

Anthropic $19B Revenue, Qwen Team Shakeup

Anthropic’s annual revenue reached $19 billion as Qwen’s core members departed, while Gemini and GPT recently updated with faster model versions. Market data shows intensified competition in large models as leading companies accelerate iterations.

SOURCE

Latent Space

09 / INSIGHTS2026.03.04 23:50

Qwen Team Changes Spark Concerns

Simon Willison noted multiple Qwen team members left within 24 hours after releasing the Qwen 3.5 open-source series. He worries the 3.5 models might be the team’s final work, suggesting future uncertainty despite recent open-source contributions.

SOURCE

Simon Willison

10 / RESEARCH2026.03.04 13:00

ERI Benchmarks Engineering Model Capabilities

arXiv’s Engineering Reasoning and Instruction benchmark released the first engineering instruction dataset covering 9 disciplines (e.g., civil engineering) for training and evaluating capable LLMs and agents. It includes 5,000 complex instructions for engineering domain evaluation.

SOURCE

arXiv cs.AI

112026.03.04 13:00

Meta's NLLB-200 Shows Universal Concept Structures

Meta studied the NLLB-200 model (200 languages) to analyze if neural MT models learn language-agnostic concepts. The model exhibits cross-lingual concept mapping rather than surface-level clustering, achieving 0.78 accuracy in semantic similarity tasks—outperforming language-family baselines. This research informs multilingual model design for better cross-lingual reasoning.

SOURCE

arXiv cs.CL (NLP)

12 / TOOLS2026.03.05 02:00

LangChain Launches Skill Library Boosting AI Coding

LangChain released the first skill library with open-source tools (LangChain, LangGraph, Deep Agents) that enhanced Claude Code’s task completion rate from 29% to 95%. The open library enables developers to quickly build AI agents with professional coding capabilities, with tests showing integrated agents independently completed 85% of code generation tasks.

SOURCE

LangChain Blog

13 / RESEARCH2026.03.04 13:00

ATPO Algorithm Boosts Medical Dialogue Accuracy

ArXiv introduces the ATPO algorithm to optimize information retrieval in multi-turn medical dialogues. It addresses incomplete information in diagnosis by enhancing LLM interaction through tree strategy optimization. ATPO improves accuracy by 15% in medical QA tasks and can be integrated into clinical dialogue systems for developers.

SOURCE

arXiv cs.LG (ML)

142026.03.04 13:00

MoE Models Need Router Calibration for Efficient Compression

Research shows MoE models face memory bottlenecks during deployment despite efficient scaling. The team introduces three compression paradigms—expert pruning, editing, and sharing—without retraining. Router calibration improves MoE inference speed by 40% while maintaining 92% performance, optimizing resource utilization for large-scale AI model deployment.

SOURCE

arXiv cs.LG (ML)

152026.03.04 13:00

SuperLocalMemory: Secure Multi-Agent Memory System

arXiv paper introduces SuperLocalMemory, a memory system for multi-agent AI using architectural isolation and Bayesian trust scoring to defend against memory poisoning. It achieves 40% lower error rates in OWASP ASI06 attack scenarios through adaptive learning, suitable for sensitive data applications.

SOURCE

arXiv cs.AI

162026.03.04 13:00

Memory Extraction in Diffusion Language Models

Diffusion language models (DLMs) differ from autoregressive models (ARMs) in memory handling. Research shows DLMs struggle with direct memory extraction but may expose memorized information during sampling. This finding has implications for model copyright and privacy, requiring developers to focus on data cleaning.

SOURCE

arXiv cs.CL (NLP)

chat_bubbleAny thoughts on today's content?