要点 1:与幻觉共处
我的理解
幻觉不是 GenAI 的缺陷,而是有用 GenAI 的必然产物:下一个 token 预测永远给出概率最高的答案,消除所有不确定性只会得到放之四海而皆准的废话。传统软件是「精确输入、精确输出」,GenAI 是「模糊输入、模糊输出」,这是根本性的范式差异,而非临时的工程缺陷。短期策略是将 GenAI 融入现有生态系统并通过外部手段(RAG、文档约束)控制幻觉;但长期来看,「模糊 I/O」模式本身可能催生全新的产品形态与机会。理解幻觉是基本特性,才能从「等 AI 修好 bug」的被动等待,转向「如何设计系统来驯服它」的主动思维。
相关链接
- Ch05-L04 研究 生成式 AI 建立在共识之上 — 幻觉机制的研究基础与本课实践结论相呼应
- Ch05-L07 可落地的指南 哪些任务该交给 GenAI — 与幻觉共处直接指向任务委托的边界判断
- Ch04-L10 心得 4 风险管理 — 幻觉系统性控制对应模块三的风险管理策略
- Ch07-L06 LLM 推理中的 Temperature 温度 参数 — 温度参数是调控幻觉与创造力的直接机制
原文
Lesson 42 of 68 要点 1:与幻觉共处 / Takeaway 1: Work with hallucination
这一结论并非研究成果,而是我们基于对相关研究的理解所做出的推测。请审慎参考。
幻觉是大语言模型的根本特性,因为其底层机制是“下一个 token 预测”。所有输出都是大语言模型给出的概率最高的结果。换句话说,模型永远会给出它最有把握的答案。对模型而言,它根本无从分辨答案的对与错。
对于真正有用的 GenAI 系统而言,幻觉是不可避免的,因为世界本就不是确定性的。只说一些放之四海而皆准的废话毫无意义。事实上,本模块本身就是一个例子——为了给出有价值的预测与结论,我们必须跳出“永远正确”的舒适区。
即使在传统机器学习系统中,我们也必须允许模型去探索,否则它会很快陷入鞍点(局部最优)。对于 GenAI 模型,我们必须允许其具有灵活性,而鉴于其“下一个 token 预测”的本质,这种灵活性不可避免地会产生幻觉。
幻觉可以通过模型外部的变通方法加以控制。例如,New Bing 使用搜索引擎进行事实核查,并要求模型围绕搜索引擎返回的结果来构建答案。将文档作为上下文提供给模型,并基于文档约束 GenAI 的回答,同样有所帮助。
软件是“精确输入、精确输出”,而 GenAI 是“模糊输入、模糊输出”。GenAI 潜力巨大,但生态系统尚未就绪。每次向 GPT 发送 prompt,它都可能给出不同的答案;与此同时,我们也可以用多种不同的 prompt 方式来完成同一个任务。这种“模糊输入、模糊输出”与传统软件有着根本性的不同。
在中短期内,使用 GenAI 最有效的方式是将其与现有生态系统融合,这也是我们在本课程中所讲授的内容。然而,我希望大家持续关注 GenAI 这种“模糊输入、模糊输出”的本质。从长远来看,当整个生态系统朝这一模式演进时,巨大的机遇可能正从中浮现,因为这种模式让我们能够实现全然崭新的事物。
English Original
This conclusion is not a research finding, but our conjectures based on our understanding of the research. Take it with a grain of salt.
Hallucination is fundamental to large language models because the underlying mechanism is next token prediction. All the results are the highest probability outcomes of large language models. In other words, the model always gives the answer that it is most confident with. To the model, there is no way of knowing the difference between a right answer vs. a wrong answer.
Hallucination is inevitable for useful GenAI systems because the world is not deterministic. Truism is useless. In fact, this module is an example – in our attempt to provide useful predictions and conclusions, we have to step outside of being correct all the time.
Even with traditional ML systems, we have to allow models to explore, or it will get stuck in a saddle point (local optima) very quickly. For GenAI models, we have to allow flexibility, and given its nature of next token prediction, this inevitably creates hallucination.
Hallucination can be controlled by walkarounds outside of the model. For example, New Bing uses search engines to fact check and tell the model to construct its answers around the search engine results. Providing a document as context and constraining the responses of GenAI based on the docs, can also help.
Software is exact input, exact output, while GenAI is fuzzy input, fuzzy output. GenAI has great potential but the ecosystem is not ready yet. Every time we prompt GPT, it can give a different answer, but meanwhile, we can use multiple different ways of prompting GPT to complete the same task. This fuzzy input and fuzzy output is fundamentally different from old softwares.
In the short term to mid term, the most effective way of using GenAI is to merge with the existing ecosystem, which is what we teach here. However, I hope we keep attention on this fuzzy input, fuzzy output nature of GenAI. In the long run, when the ecosystem is evolving towards this pattern, big opportunities may emerge from here, because this pattern allows us to achieve something completely new.