HumanMCP Dataset: Evaluating MCP Tool Retrieval Performance

022026.03.02 13:00

Universal Semantic Chunking Framework for Long Documents

Researchers introduced a universal semantic chunking framework addressing topic segmentation in ultra-long documents. This method overcomes traditional approaches’ limitations in fixed window sizes through discriminative models, showing exceptional performance in information retrieval and document understanding tasks, improving accuracy by 15% on documents over 1 million words.

SOURCE

032026.03.02 13:00

Representation Erasure Preference Optimization Reduces Toxic LLM Outputs

Researchers proposed a representation erasure preference optimization method that significantly reduces toxic output probability in large language models. While maintaining model performance, this approach decreases harmful content generation rate by 40%, outperforming traditional DPO and NPO algorithms, offering new insights for safe AI deployment.

SOURCE

04 / TOOLS2026.03.02 08:45

OpenAI Codex Updates to 0.107.0-alpha.9 Version

OpenAI Codex released version 0.107.0-alpha.9, the ninth alpha update in recent months. This update focuses on performance optimizations and bug fixes, continuing Codex series’ rapid iteration pace aimed at improving code generation quality and stability.

SOURCE

OpenAI Codex Releases

052026.03.02 13:03

OpenClaw 2026.3.1: Adaptive Inference Now Default

OpenClaw released version 2026.3.1, setting Anthropic Claude 4.6’s default inference level to adaptive while reserving lower settings for other high-performance models. The update includes a new built-in HTTP health check endpoint to enhance container gateway monitoring capabilities.

SOURCE

OpenClaw Releases

06 / RESEARCH2026.03.02 13:00

Smart LLM Framework Cuts AML News False Positives

Researchers introduced an intelligent LLM framework for financial anti-money laundering compliance news screening. This method addresses traditional keyword search’s high false positive rate by enhancing screening accuracy through semantic understanding. Successfully piloted in multiple banks, it reduces false positives by 70%.

SOURCE

072026.03.02 13:00

Task-Lens: Analyzes Low-Resource Indian Speech Datasets

Task-Lens is an analysis tool for low-resource Indian language speech datasets. It addresses the issue of insufficient awareness of task-specific resources in low-resource languages by optimizing dataset configuration through cross-task utility analysis. Research shows this method effectively improves NLP model performance in multilingual environments, applicable to speech recognition and NLP research. Developers can use it to quickly identify high-quality datasets and reduce data collection costs.

SOURCE

082026.03.02 13:00

U-CAN: Utility-Aware Forgetting for Generative Recommenders

U-CAN is a user data forgetting method for generative recommendation systems. It uses utility-aware contrastive decay to precisely remove sensitive user information while preserving recommendation functionality. Experiments prove it effectively reduces sensitive attribute encoding without significantly lowering recommendation accuracy, making it suitable for privacy protection scenarios. Companies can use this technology to compliantly process user logs and prevent data leakage risks.

SOURCE

092026.03.02 13:00

Counterfactual Data Causal Identification Study

This paper addresses counterfactual identification in Pearl’s causal hierarchy, proposing completeness and boundary results. The research expands causal identification beyond traditional observational and interventional data, proving feasibility under more complex conditions. Experiments show the method accurately handles multivariate counterfactual scenarios, providing a new tool for causal machine learning. Researchers can use this framework to build more robust causal models.

SOURCE

102026.03.02 13:00

Truncated Step Sampling for RAG with Process Rewards

This research introduces a retrieval-augmented reasoning method using truncated step sampling with process rewards. It solves the credit assignment problem in traditional reinforcement learning by introducing process rewards in multi-step trajectories. Experiments show this method reduces reasoning latency by 40% while maintaining accuracy comparable to Search-R1. It applies to complex reasoning tasks requiring real-time feedback, like interactive search engine Q&A.

SOURCE

112026.03.02 13:00

Long-Range Frequency Tuning in Quantum ML

This research proposes a long-range frequency tuning method for quantum machine learning. By optimizing Fourier series truncation of angle encoding, it significantly reduces quantum circuit depth requirements. Experiments show the method reduces parameter complexity to O(ω) while maintaining universal function approximation capability. It enhances QML model training efficiency on resource-constrained quantum computing devices.

SOURCE

122026.03.02 13:00

Causal POMDP for Distribution Shift Planning

This research introduces a causal partially observable Markov decision process framework to solve distribution shift problems in real-world environments. The method captures the impact of state distribution changes on planning through environmental dynamics modeling. Experiments demonstrate 25% higher planning success rates than traditional methods in dynamic environments. It applies to autonomous driving and robot control scenarios requiring environmental adaptation.

SOURCE

132026.03.02 13:00

CiteAudit: Benchmark for LLM Citation Verification

CiteAudit is the first benchmark specifically designed to verify the authenticity of large language model citations. The study reveals the severity of LLM-generated false citations, showing mainstream models have error rates up to 18%. The benchmark includes over 10,000 pairs of real and fake citations to assess models’ literature retrieval and verification capabilities. Research institutions can use it to review paper citation quality and prevent academic misconduct.

SOURCE

142026.03.02 13:00

Brain-OF: Multimodal Brain Imaging Foundation Model

Brain-OF is the first multimodal brain imaging foundation model supporting fMRI, EEG, and MEG simultaneously. The study achieves data fusion of three modalities through unified spatiotemporal feature extraction. Experiments show 12% higher accuracy in brain region classification tasks compared to single-modal models. It facilitates cross-modal analysis in neuroscience, helping doctors more precisely diagnose brain diseases.

SOURCE

152026.03.02 13:00

Reinforcement Learning Optimizes Min-Max TSP

This research proposes a reinforcement learning approach to solve the min-max multi-traveling salesman problem. A four-stage framework of construction, merging, solving, and adaptation effectively optimizes multi-path planning. Experiments show this method reduces the longest path length by 15% while maintaining overall efficiency. It applies to logistics and vehicle routing scenarios requiring balanced load distribution.

SOURCE

162026.03.02 13:00

FHIRPath-QA: First FHIR-Based EHR Q&A System

FHIRPath-QA is the first executable query system for electronic health records based on FHIR standards. It generates accurate answers directly from EHR data. Testing shows 89% accuracy in clinical question answering, far surpassing traditional interfaces. It enables patients to query medical records independently, helping non-professionals understand complex healthcare data.

SOURCE

172026.03.02 13:00

EvoX Tool Boosts Algorithm Optimization Accuracy by 35%

Meta researchers released EvoX, a tool combining LLM optimization with evolutionary search for cross-domain algorithm automation. Experiments show 35% average performance improvements in program generation, prompt optimization, and algorithm design tasks, outperforming existing AlphaEvolve solutions. EvoX speeds up optimization by reusing historical evaluation data and is suitable for AI model tuning and automated code generation. Developers can use its API to integrate into existing workflows.

SOURCE