2026.05.28DAILY REPORT

ESMFold2 and the Bitter Lesson: Large-Scale Data Drives Protein Prediction Breakthroughs

19 items·2026.05.28

DAILY BRIEF

01ESMFold2 and the Bitter Lesson: Large-Scale Data Drives Protein Prediction Breakthroughs 02Frontier AI Models Score Below 50% on ITBench-AA, First Benchmark for Agentic Enterprise IT 03AI Inference Infrastructure Booms: Fireworks and Baseten Reach Decacorn Status, OpenRouter Raising 04Cisco Partners with OpenAI Codex to Automate Defect Remediation and Scale AI-Native Development 05Building Self-Improving Tax Agents with Codex: OpenAI, Thrive, and Crete Automate Tax Filing 06YouTube to Automatically Label AI-Generated Videos 07DuckDuckGo Visits Surge 28% After Google Pushes AI Search 08GEM Uses Geometric Entropy for Optimal LLM Data Mixing 09InfoQuant Reshapes Activations for Low-Bit LLM Quantization 10AI Agents Age Too: Reliability Drops the Longer They Run 11Strict Constraints Degrade Accuracy in Small Model Outputs 12SPEAR: Code-Augmented Prompt Optimization Improves LLM Tasks 13Self-Verification Distillation: Unlocking Proprietary Synthetic Data Pipelines in Language Models 14Why AI Agents Cannot Maintain Software Systems 15PostHog Shares Experience: Training In-House AI Models from Scratch 16TechCrunch: Tech CEOs Are Suffering from AI Psychosis 17Claude Code v2.1.153 Adds skipLFS Option and npm Auto-Update Fix Notice 18OpenAI Codex Releases 0.135.0-alpha.2 19OpenClaw Releases v2026.5.27-beta.1

01 / NEWS2026.05.28 01:46

ESMFold2 and the Bitter Lesson: Large-Scale Data Drives Protein Prediction Breakthroughs

This article explores ESMFold2’s breakthroughs in protein structure prediction, emphasizing that large-scale datasets outweigh inductive bias. Developed by Alex Rives at BioHub, ESMFold2 leverages a deep learning model to achieve high-precision predictions, advancing programmable biology. Its performance surpasses traditional methods by achieving atomic-level accuracy on the CAMEO benchmark, providing a powerful tool for drug design and synthetic biology. Furthermore, the research indicates that model performance scales continuously with data volume. This validates the ‘bitter lesson’ in the biology domain, demonstrating that computation and massive data drive major progress in protein folding and AI-driven biological research.

SOURCE

Latent Space

02 / RELEASES2026.05.28 01:20

Frontier AI Models Score Below 50% on ITBench-AA, First Benchmark for Agentic Enterprise IT

Artificial Analysis and IBM released ITBench-AA, the first benchmark designed to evaluate AI agents on real enterprise IT tasks such as incident troubleshooting, configuration management, and security compliance. All frontier models scored below 50%, revealing significant gaps in agents’ ability to autonomously handle enterprise IT operations. The benchmark gives developers and enterprise IT teams a concrete tool to measure how different models perform in real-world operational scenarios, moving beyond generic benchmark scores.

SOURCE

Hugging Face Blog

03 / NEWS2026.05.27 11:33

AI Inference Infrastructure Booms: Fireworks and Baseten Reach Decacorn Status, OpenRouter Raising

AI inference infrastructure companies Fireworks and Baseten have both completed major funding rounds, reaching decacorn valuations, while OpenRouter is currently raising. This reflects intense market demand for AI inference layer infrastructure. As model deployment scales, inference cost and efficiency become critical bottlenecks, driving capital toward inference providers. Developers should watch pricing and service changes across these platforms to choose the best fit for their use cases.

SOURCE

Latent Space

04 / RELEASES2026.05.27 19:00

Cisco Partners with OpenAI Codex to Automate Defect Remediation and Scale AI-Native Development

Cisco has integrated OpenAI’s Codex into its enterprise engineering workflows. Specific applications include scaling AI-native development practices, accelerating Cisco AI Defense work, and automating software defect remediation. This represents another enterprise deployment case for Codex. For developers, it signals Codex’s expansion from individual coding assistance into large-scale enterprise engineering pipelines. Enterprise teams can reference Cisco’s approach to embed Codex into code review, defect detection, and automated remediation workflows.

SOURCE

OpenAI News

052026.05.27 15:00

Building Self-Improving Tax Agents with Codex: OpenAI, Thrive, and Crete Automate Tax Filing

OpenAI partnered with Thrive and Crete to demonstrate a self-improving tax agent built on Codex. The agent automates tax filing workflows, continuously improves accuracy, and accelerates processing. The system learns from each tax handling cycle to optimize subsequent performance. For developers, this illustrates how to build self-learning vertical domain agents with Codex—decompose domain tasks into iterable sub-processes and let the model accumulate data and improve with each execution.

SOURCE

OpenAI News

06 / NEWS2026.05.28 04:00

YouTube to Automatically Label AI-Generated Videos

YouTube will automatically label AI-generated videos. When content is identified as synthetic or deepfake, the platform will attach an AI-generated label visible to users before viewing. This move follows last year’s requirement for creators to voluntarily disclose AI-generated content, marking a shift from manual reporting to automated detection. For content creators, this means the platform will auto-detect and label AI-generated content even without voluntary disclosure, reducing compliance costs but increasing the difficulty of technical evasion.

SOURCE

HN AI 精选

072026.05.28 00:28

DuckDuckGo Visits Surge 28% After Google Pushes AI Search

DuckDuckGo saw a 28% increase in visits following Google’s push for AI search mode. Users seeking alternatives turned to the privacy-focused search engine, which doesn’t force AI features. This indicates a strong market demand for traditional search experiences.

SOURCE

HN AI 精选

08 / RESEARCH2026.05.27 12:00

GEM Uses Geometric Entropy for Optimal LLM Data Mixing

LLM pre-training efficacy depends more on data composition than sheer volume. The paper introduces GEM (Geometric Entropy Mixing), which bypasses flaws in human taxonomies and Euclidean clustering to optimize data mixture. Data engineers can use this to improve pre-training pipelines and reduce trial-and-error costs.

SOURCE

arXiv cs.LG (ML)

092026.05.27 12:00

InfoQuant Reshapes Activations for Low-Bit LLM Quantization

Low-bit activation quantization is a major bottleneck in efficient LLM deployment. InfoQuant reshapes activation distributions to fit low-bit uniform quantization better. It reduces memory footprint and inference costs without sacrificing accuracy, enabling smoother LLM deployment on edge devices.

SOURCE

arXiv cs.LG (ML)

102026.05.27 12:00

AI Agents Age Too: Reliability Drops the Longer They Run

Current AI agent evaluations focus on day-one performance, missing long-term reliability. The paper introduces Agent Lifespan Engineering, quantifying how long agents remain reliable after deployment. This helps enterprise teams build lifecycle management mechanisms to prevent unpredictable degradation in production.

SOURCE

arXiv cs.AI

112026.05.27 12:00

Strict Constraints Degrade Accuracy in Small Model Outputs

Forcing strict structured outputs (like JSON) on small language models under 3B parameters significantly reduces their factual accuracy. The paper quantifies this ‘constraint tax.’ Developers using local SLMs for tool calls must carefully balance format compliance with logical correctness.

SOURCE

arXiv cs.LG (ML)

122026.05.27 12:00

SPEAR: Code-Augmented Prompt Optimization Improves LLM Tasks

SPEAR introduces the code-as-action paradigm into automatic prompt engineering (APE), allowing the optimizer to write and execute code for prompt refinement. This dynamic approach breaks fixed pipeline limits and improves LLM performance on downstream tasks. Developers can use it to build more robust agent workflows.

SOURCE

arXiv cs.CL (NLP)

132026.05.27 12:00

Self-Verification Distillation: Unlocking Proprietary Synthetic Data Pipelines in Language Models

A recent paper (arXiv:2605.26132v1) introduces Self-Verification Distillation, a novel approach enabling LLMs to autonomously improve their performance during post-training. The study explores whether models can achieve self-evolution without relying on labeled data, external teachers, or tool feedback. Starting solely with unlabeled seed prompts and lacking ground truth answers, the proposed mechanism allows the model to independently generate and verify synthetic data. This process effectively constructs a proprietary data pipeline. Experimental results demonstrate that this method successfully enhances model capabilities, offering a viable pathway for unsupervised self-evolution. Specific performance metrics and further implementation details are provided in the full paper.

SOURCE

arXiv cs.CL (NLP)

14 / INSIGHTS2026.05.27 21:46

Why AI Agents Cannot Maintain Software Systems

The article analyzes why AI agents struggle with real-world software maintenance. Agents lack the understanding of global architecture and long-term evolution logic, limiting them to local patches. Systemic refactoring still requires human engineers. This reminds tech managers to define AI coding tool boundaries clearly.

SOURCE

HN AI 精选

152026.05.28 00:08

PostHog Shares Experience: Training In-House AI Models from Scratch

Product analytics platform PostHog detailed their experience training proprietary AI models. The guide covers the full pipeline from data prep to fine-tuning. It serves as a practical reference for tech teams looking to reduce reliance on closed-source APIs and control long-term infrastructure costs.

SOURCE

HN AI 精选

162026.05.27 23:20

TechCrunch: Tech CEOs Are Suffering from AI Psychosis

TechCrunch reports that many tech CEOs are displaying an AI psychosis,盲目 pursuing AI while ignoring basic product logic. Executives overpromise AI capabilities and divert resources from core features. This serves as a warning to investors and users to evaluate AI products based on practical problem-solving rather than marketing hype.

SOURCE

HN AI 精选

17 / TOOLS2026.05.28 08:52

Claude Code v2.1.153 Adds skipLFS Option and npm Auto-Update Fix Notice

Claude Code v2.1.153 adds a skipLfs option for GitHub/Git plugin sources to skip Git LFS downloads during clone and update, speeding up operations on large repositories. When npm global install can’t auto-update, Claude Code now shows a one-time notice, and the /doctor command lists fixes. Status line commands now receive COLUMNS and LINES environment variables for better terminal display. Developers experiencing npm auto-update failures can follow the notice or /doctor guidance to resolve issues.

SOURCE

Claude Code Releases

182026.05.28 06:08

OpenAI Codex Releases 0.135.0-alpha.2

OpenAI Codex has released version 0.135.0-alpha.2. This version is currently in the alpha testing stage with no detailed changelog provided. Developers should note that alpha versions may be unstable and are advised to test in non-production environments before adoption.

SOURCE

OpenAI Codex Releases

192026.05.28 08:51

OpenClaw Releases v2026.5.27-beta.1

OpenClaw has released v2026.5.27-beta.1, currently in beta testing. Previous releases include 2026.5.26 stable and v2026.5.27-alpha.1. No detailed changelog is available yet; users should watch for upcoming stable release notes.

SOURCE

OpenClaw Releases

chat_bubbleAny thoughts on today's content?