心得 4:风险管理

我的理解

GenAI 与传统工具最根本的区别在于不确定性:它“可能成功也可能失败”,因此不能以固化思维在任何单一方案上过度押注。风险管理归结为两条原则:尽早快速失败(在流程早期主动探测幻觉或错误,避免缺陷悄然向下游传播),以及成本控制(提前设置时间检查点,在情感卷入之前就作出放弃或继续的决定)。通过 Copilot 做财务数据探测的示例形象地展示了如何在正式生成报告前就发现并修正幻觉,而不是在完整报告完成后才发现数据有误。保持灵活视角、防御性推进,是在充满不确定性的 AI 世界中保护投入产出比的基本姿态。

相关链接


原文

Lesson 35 of 68 心得 4:风险管理 / Learning 4: Risk management

回顾我们打造这款工具的过程,可以清楚地看到,我们采取的是一种防御性的方法。在全面投入这个项目之前,我们先做了几次探索性实验来评估潜在成本,因为当时我们对它在技术上或智能上的可行性并不确定。在设定好成功标准和测试方法之后,我们能够第一时间知道系统是否行不通,从而有机会考虑替代方案。

我们从来没有抱着“必须用 AI”的固化思维。相反,我们是在早期实验取得令人鼓舞的结果之后,才逐步倾向于采用基于 AI 的方案。这样做是有原因的。

退一步看,我们对传统工具已经非常熟悉,包括它们的确定性以及清晰界定的能力边界。因此,在使用传统工具的惯性下,我们很自然地会陷入一种固化思维:要么坚持用 AI 解决某个问题,要么坚决不用 AI。然而对大多数人来说,AI 仍然是一种充满不确定性的工具,即便我们已经掌握了一些原则,最初的判断也可能有误。在项目推进过程中,保持灵活、动态的视角通常会更好。这种灵活性是一种武器,可以避免我们在一个充满不确定性的世界中过度投入。除了机会规模评估之外,还有更多技巧可以用来应对不确定性。

那就是风险管理。要清楚风险可能来自哪里,主动评估这些风险,并在坏情况发生时控制其影响。归结起来就是两条原则:尽早快速失败,以及成本控制。我们已经在案例研究中看到了它们的应用。下面让我们探讨一下如何在其他场景中运用它们。

尽早快速失败

尽早快速失败是对抗幻觉(hallucination)的有效方法,而幻觉是 GenAI 常见的薄弱环节。AI 经常以自信、自然的方式呈现虚假事实,有时甚至显得比正确内容还要真实。开发缓解幻觉的技巧固然有用,但更重要的是建立一种检测机制。这样一来,当幻觉发生时,我们可以立刻识别问题并即时修正。

例如,如果我们想基于网上的信息撰写报告,可以手动进行在线搜索,筛选结果,并把内容复制粘贴到报告或表格中。不过,更好的方式可能是借助 AI 的帮助。这种场景通常容易出现幻觉。秉持风险管理的思维,在交给 AI 撰写完整报告之前,我们可以先做一些快速探测,检查我们打算使用的工具是否存在幻觉问题。

这里我们使用 Copilot,因为它结合了网页搜索与 AI 摘要能力,对幻觉具有更强的鲁棒性。通过 Copilot,我们可以先做初步测试,看它处理信息的效果如何,是否会引入虚假内容。这种方式让我们能够尽早发现并处理问题,从而保证最终输出的可靠性。

示例 Prompt

=======================================================

Give me a table comparing the key financial numbers in 2024 Q1’s ER between Meta and Apple.

=======================================================

Copilot 的回复

截图展示的是回答的前半部分,下方还有更多文字和链接没有出现在截图中。我们可以利用这些提供的链接来交叉核对数字是否正确。在这个例子中,我们发现 Apple 和 Microsoft 的数据是错误的,说明出现了幻觉。

为了解决这个问题,我们可以调整 Copilot 的设置。在使用 Copilot 时,可以选择让它更具创造性或更精确。默认设置是“平衡”。让我们把它调到更精确的一侧再试一次。这样的调整应该有助于减少幻觉,提供更准确的信息。

这一次它提供了正确的信息,我们可以通过链接进行验证。这证明了我们方法的有效性,同时也获得了无幻觉的准确信息。在后续轮次中,我们既可以把这些已核对的数字直接复制粘贴使用,避免出错;也可以更信任 Copilot,让它从搜索中收集更多信息。

在简单地发现并修复幻觉的背后,是一种更深层的策略:把任何潜在的失败提前暴露出来,让它尽早失败。与其让 bug 隐藏在程序中、在产品真正部署后才发现,不如让它在 ChatGPT 输出之后就立刻暴露失败。与其在收到一份依据这些数据得出结论的精美报告后才发现财务数据有误,不如让它在 Copilot 生成之后就立刻暴露失败。这种做法能够有效地对抗不确定性。如果某件事有相当的失败可能或不确定性,我们就应该让它尽早快速失败,以限制其影响并减少修正所花费的时间。

成本控制

当我们用锤子时,可以确定它会把钉子敲进木头。然而在使用 GenAI 时,常常无法确定能否把事情做成,只是因为大多数人尚未积累到足够的经验。我们很容易陷入不断尝试各种选项以改进结果的“兔子洞”,尤其是在任务带有主观性的时候。

退一步看,这对 GenAI 来说并不算全新的问题。我们总能继续打磨写作、再头脑风暴更多想法,或者尝试另一个可能的解决方案。所有开放式的探索性任务都存在过度投入时间的风险。GenAI 强大、灵活又有趣,特别容易让人不知不觉投入更多时间。但解决方案同样为人熟知:明确定义成功标准(这一点我们已经在做),并对成本设置明确的检查点。

最好提前设想好决策节点。例如,你可以决定在某个想法上花五分钟,如果不行就放弃。提前作决定的重要性在于:一旦开始,你会自然产生情感投入和继续下去的冲动。作出此类决定的最佳时机是开始之前,那时你还没有产生情感依附。

这些原则并不新鲜,但对于许多通常不涉及探索性工作的角色来说,这种方法可能需要特别留意。

English Original

Looking back at our practice of building this tool, it’s clear that we approached it defensively. Before fully committing to the project, we conducted a few probing experiments to gauge the potential cost, as we were uncertain about its technical or intelligent feasibility. After setting up the success criteria and tests, we ensured that we would immediately know if the system didn’t work, allowing us to consider potential alternatives.

We never had a fixed mindset that we must use AI. Instead, we gradually leaned towards an AI-based solution, driven by promising early experiment results. And we did this for a reason.

Stepping back, we are already familiar with traditional tools, including their deterministic nature and well-defined capabilities. Therefore, it is natural to follow the same practice of using traditional tools and fall into a fixed mindset that we must use AI or must not use AI to solve a certain issue. However, for most people, AI remains a tool full of uncertainty, and our initial assessment, despite the principles we have learned, may still be wrong. It is usually much better to maintain a flexible and dynamic viewpoint as a project progresses. This flexibility is a weapon that guards against over-investment in an uncertain world. In addition to opportunity sizing, there are more tricks to handle uncertainties.

That is risk management. Know where the risks may come from, assess them proactively, and limit their impact when bad things happen. It comes down to two principles: fail fast and early, and cost control. We have already seen their applications in the case study. Let’s explore how to apply them in other scenarios.

Fail Fast and Early

Fail fast and fail early is an effective way to combat hallucination, a common weak point of GenAI. The AI often presents false facts in a confident and natural way, sometimes appearing more genuine than correct content. While developing tricks to alleviate hallucination is useful, having a mechanism to detect it is even more important. This way, when hallucinations occur, we can recognize the problem and fix it right away.

For example, if we want to write reports based on information available online, we could manually perform an online search, identify the results, and copy-paste the content into our report or spreadsheet. However, a potentially better approach is to ask AI for help. This scenario is typically subject to hallucination. With a risk management mindset, before assigning the task of generating a full-fledged report, we could first conduct quick probes to check if the tool we plan to use has hallucination issues.

We use Copilot here because it combines web search with AI summarization capabilities, providing more robustness against hallucination. By using Copilot, we can perform initial tests to see how well it handles the information and whether it introduces any false content. This approach allows us to detect and address issues early, ensuring the reliability of the final output.

Example Prompt

=======================================================

Give me a table comparing the key financial numbers in 2024 Q1’s ER between Meta and Apple.

=======================================================

Copilot’s Response

The screenshot shows the first part of the answer with more text and links at the bottom, which are not included in the screenshot. We can use these provided links to cross-check whether the numbers are correct. In this case, we found that the numbers for Apple and Microsoft were incorrect, indicating that hallucination occurred.

To address this, we can adjust the settings in Copilot. When using Copilot, there is an option to specify whether we want it to be more creative or more precise. The default setting is balanced. Let’s move it to the more precise side and try again. This adjustment should help reduce hallucination and provide more accurate information.

This time, it provides the correct information, which we can verify using the links. This proves the effectiveness of our approach, and we also obtain accurate information without hallucination. In subsequent rounds, we could copy-paste these verified numbers to avoid potential mistakes or give Copilot more trust to gather more information from search.

Behind the simple discovery and fix of hallucination is a deeper strategy to move any potential failure ahead, making it fail early. Instead of having the bug hidden in the program and discovering it when we actually deploy the product, we make it fail fast right after it comes out of ChatGPT. Instead of finding the financial numbers wrong after we receive a fancy report with conclusions drawn from them, we make it fail fast right after being generated by Copilot. This approach effectively combats uncertainty. If there is a decent chance of failure or uncertainty, we should make it fail fast and early to limit its impact and reduce the time spent correcting it.

Cost Control

When using a hammer, we know for sure it will knock the nail into the wood. However, when using GenAI, it’s common to be uncertain whether we can get things done, simply because most of us haven’t accumulated sufficient experience. It’s easy to fall into the rabbit hole of exploring different options to improve results, especially when the task is subjective.

If we step back, this isn’t entirely new for GenAI. We can always polish our writing, brainstorm more ideas, or try out another potential solution. All open-ended exploratory tasks carry the risk of over-investing time. GenAI, being powerful, flexible, and fun to use, makes it particularly easy to sink more time into it. But the solution is also well-known: having clear success criteria defined (which we are already doing) and setting explicit checkpoints on the cost.

It’s best to consider decision-making points beforehand. For example, you might decide to spend five minutes on an idea and, if it doesn’t work, give up. The importance of making a decision beforehand lies in the fact that, once you start, emotional attachment and a natural desire to continue will emerge. The best time to make this decision is before you begin, when you’re not yet emotionally attached.

These principles aren’t new, but for many roles that don’t typically involve exploratory work, this approach may require special attention.