ESMFold2 and the Bitter Lesson: Large-Scale Data Drives Protein Prediction Breakthroughs
ESMFold2 and the Bitter Lesson: Large-Scale Data Drives Protein Prediction Breakthroughs
This article explores ESMFold2’s breakthroughs in protein structure prediction, emphasizing that large-scale datasets outweigh inductive bias. Developed by Alex Rives at BioHub, ESMFold2 leverages a deep learning model to achieve high-precision predictions, advancing programmable biology. Its performance surpasses traditional methods by achieving atomic-level accuracy on the CAMEO benchmark, providing a powerful tool for drug design and synthetic biology. Furthermore, the research indicates that model performance scales continuously with data volume. This validates the ‘bitter lesson’ in the biology domain, demonstrating that computation and massive data drive major progress in protein folding and AI-driven biological research.
Frontier AI Models Score Below 50% on ITBench-AA, First Benchmark for Agentic Enterprise IT
Artificial Analysis and IBM released ITBench-AA, the first benchmark designed to evaluate AI agents on real enterprise IT tasks such as incident troubleshooting, configuration management, and security compliance. All frontier models scored below 50%, revealing significant gaps in agents’ ability to autonomously handle enterprise IT operations. The benchmark gives developers and enterprise IT teams a concrete tool to measure how different models perform in real-world operational scenarios, moving beyond generic benchmark scores.
AI Inference Infrastructure Booms: Fireworks and Baseten Reach Decacorn Status, OpenRouter Raising
AI inference infrastructure companies Fireworks and Baseten have both completed major funding rounds, reaching decacorn valuations, while OpenRouter is currently raising. This reflects intense market demand for AI inference layer infrastructure. As model deployment scales, inference cost and efficiency become critical bottlenecks, driving capital toward inference providers. Developers should watch pricing and service changes across these platforms to choose the best fit for their use cases.
Cisco Partners with OpenAI Codex to Automate Defect Remediation and Scale AI-Native Development
Cisco has integrated OpenAI’s Codex into its enterprise engineering workflows. Specific applications include scaling AI-native development practices, accelerating Cisco AI Defense work, and automating software defect remediation. This represents another enterprise deployment case for Codex. For developers, it signals Codex’s expansion from individual coding assistance into large-scale enterprise engineering pipelines. Enterprise teams can reference Cisco’s approach to embed Codex into code review, defect detection, and automated remediation workflows.
Building Self-Improving Tax Agents with Codex: OpenAI, Thrive, and Crete Automate Tax Filing
OpenAI partnered with Thrive and Crete to demonstrate a self-improving tax agent built on Codex. The agent automates tax filing workflows, continuously improves accuracy, and accelerates processing. The system learns from each tax handling cycle to optimize subsequent performance. For developers, this illustrates how to build self-learning vertical domain agents with Codex—decompose domain tasks into iterable sub-processes and let the model accumulate data and improve with each execution.
YouTube to Automatically Label AI-Generated Videos
YouTube will automatically label AI-generated videos. When content is identified as synthetic or deepfake, the platform will attach an AI-generated label visible to users before viewing. This move follows last year’s requirement for creators to voluntarily disclose AI-generated content, marking a shift from manual reporting to automated detection. For content creators, this means the platform will auto-detect and label AI-generated content even without voluntary disclosure, reducing compliance costs but increasing the difficulty of technical evasion.
DuckDuckGo Visits Surge 28% After Google Pushes AI Search
DuckDuckGo saw a 28% increase in visits following Google’s push for AI search mode. Users seeking alternatives turned to the privacy-focused search engine, which doesn’t force AI features. This indicates a strong market demand for traditional search experiences.
GEM Uses Geometric Entropy for Optimal LLM Data Mixing
LLM pre-training efficacy depends more on data composition than sheer volume. The paper introduces GEM (Geometric Entropy Mixing), which bypasses flaws in human taxonomies and Euclidean clustering to optimize data mixture. Data engineers can use this to improve pre-training pipelines and reduce trial-and-error costs.
InfoQuant Reshapes Activations for Low-Bit LLM Quantization
Low-bit activation quantization is a major bottleneck in efficient LLM deployment. InfoQuant reshapes activation distributions to fit low-bit uniform quantization better. It reduces memory footprint and inference costs without sacrificing accuracy, enabling smoother LLM deployment on edge devices.
AI Agents Age Too: Reliability Drops the Longer They Run
Current AI agent evaluations focus on day-one performance, missing long-term reliability. The paper introduces Agent Lifespan Engineering, quantifying how long agents remain reliable after deployment. This helps enterprise teams build lifecycle management mechanisms to prevent unpredictable degradation in production.
Strict Constraints Degrade Accuracy in Small Model Outputs
Forcing strict structured outputs (like JSON) on small language models under 3B parameters significantly reduces their factual accuracy. The paper quantifies this ‘constraint tax.’ Developers using local SLMs for tool calls must carefully balance format compliance with logical correctness.
SPEAR: Code-Augmented Prompt Optimization Improves LLM Tasks
SPEAR introduces the code-as-action paradigm into automatic prompt engineering (APE), allowing the optimizer to write and execute code for prompt refinement. This dynamic approach breaks fixed pipeline limits and improves LLM performance on downstream tasks. Developers can use it to build more robust agent workflows.
Self-Verification Distillation: Unlocking Proprietary Synthetic Data Pipelines in Language Models
A recent paper (arXiv:2605.26132v1) introduces Self-Verification Distillation, a novel approach enabling LLMs to autonomously improve their performance during post-training. The study explores whether models can achieve self-evolution without relying on labeled data, external teachers, or tool feedback. Starting solely with unlabeled seed prompts and lacking ground truth answers, the proposed mechanism allows the model to independently generate and verify synthetic data. This process effectively constructs a proprietary data pipeline. Experimental results demonstrate that this method successfully enhances model capabilities, offering a viable pathway for unsupervised self-evolution. Specific performance metrics and further implementation details are provided in the full paper.
Why AI Agents Cannot Maintain Software Systems
The article analyzes why AI agents struggle with real-world software maintenance. Agents lack the understanding of global architecture and long-term evolution logic, limiting them to local patches. Systemic refactoring still requires human engineers. This reminds tech managers to define AI coding tool boundaries clearly.
PostHog Shares Experience: Training In-House AI Models from Scratch
Product analytics platform PostHog detailed their experience training proprietary AI models. The guide covers the full pipeline from data prep to fine-tuning. It serves as a practical reference for tech teams looking to reduce reliance on closed-source APIs and control long-term infrastructure costs.
TechCrunch: Tech CEOs Are Suffering from AI Psychosis
TechCrunch reports that many tech CEOs are displaying an AI psychosis,盲目 pursuing AI while ignoring basic product logic. Executives overpromise AI capabilities and divert resources from core features. This serves as a warning to investors and users to evaluate AI products based on practical problem-solving rather than marketing hype.
Claude Code v2.1.153 Adds skipLFS Option and npm Auto-Update Fix Notice
Claude Code v2.1.153 adds a skipLfs option for GitHub/Git plugin sources to skip Git LFS downloads during clone and update, speeding up operations on large repositories. When npm global install can’t auto-update, Claude Code now shows a one-time notice, and the /doctor command lists fixes. Status line commands now receive COLUMNS and LINES environment variables for better terminal display. Developers experiencing npm auto-update failures can follow the notice or /doctor guidance to resolve issues.
OpenAI Codex Releases 0.135.0-alpha.2
OpenAI Codex has released version 0.135.0-alpha.2. This version is currently in the alpha testing stage with no detailed changelog provided. Developers should note that alpha versions may be unstable and are advised to test in non-production environments before adoption.
OpenClaw Releases v2026.5.27-beta.1
OpenClaw has released v2026.5.27-beta.1, currently in beta testing. Previous releases include 2026.5.26 stable and v2026.5.27-alpha.1. No detailed changelog is available yet; users should watch for upcoming stable release notes.