---
id: 20260526-T0-05
title: "Transcoders可追踪视觉大模型幻觉来源：看清图像输入如何变成文字输出"
title_en: "Transcoders Trace Visual Grounding and Hallucinations in VLMs"
url: https://ai.daily.yangsir.net/daily/20260526-T0-05
issue_date: 2026-05-26
publish_date: 2026-05-25T04:00:00.000Z
category: research
source_name: "arXiv cs.LG (ML)"
source_url: https://arxiv.org/abs/2605.22902
---

# Transcoders可追踪视觉大模型幻觉来源：看清图像输入如何变成文字输出

视觉语言模型（VLM）在多模态推理中表现良好，但视觉输入如何转化为文本输出的机制仍不清晰。现有研究用稀疏自编码器（SAE）分析静态残差流，难以捕捉生成过程中的动态计算。Transcoders方法能够追踪VLM生成过程中的因果路径，精确定位视觉 grounding 和幻觉现象的来源特征。研究人员可利用该工具分析模型产生错误描述的具体环节，为提高视觉语言模型的可靠性提供可解释性支持。

## English Version

**Transcoders Trace Visual Grounding and Hallucinations in VLMs**

Vision-Language Models (VLMs) perform well on multimodal reasoning, but how visual inputs become text remains unclear. Existing SAEs analyze static residual streams and miss dynamic computation. Transcoders trace causal paths during VLM generation, pinpointing feature sources for visual grounding and hallucinations. Researchers can use this to identify exactly where models produce incorrect descriptions, providing interpretability support for VLMs.

---

**来源**：[arXiv cs.LG (ML)](https://arxiv.org/abs/2605.22902)

**详情页**：https://ai.daily.yangsir.net/daily/20260526-T0-05

---

*智语观潮 · Daily — https://ai.daily.yangsir.net/llms.txt*