学习 2:面向 AI 的文档管理

我的理解

AI 辅助编程中,AI 扮演的本质是“翻译”角色:它需要两份输入——你的意图(自然语言)和目标接口的文档(描述“目标语言”)。当 AI 训练数据中缺乏某个库的最新文档时,主动提供文档就是解锁能力的关键,这被称为文档管理,也可能是 AI 辅助编程中最重要的单一技能。AI 友好型文档的有效性从高到低依次为:代码接口与注释(最精确)、示例代码(易找但可能引入歧义)、描述性文本(次优但聊胜于无)。维护 AI 友好型文档还会产生复利效应:你自己构建的工具越容易被 AI 理解,日后在其上继续构建就越省力,这也是面向未来用户的差异化产品优势。

相关链接


原文

Lesson 31 of 68 学习 2:面向 AI 的文档管理 / Learning 2: Document management for AI

前面的例子非常有意思。我们在写程序时被卡住了,原因是 AI 没能生成正确的代码来调用 GPT-4。在前 AI 时代,编程被卡住通常意味着缺乏编码经验,比如不熟悉某门编程语言的细节,或者难以调试代码。如果说卡住是因为“缺文档”,听起来有点荒唐——文档在网上随手可得,真正费力气的部分往往是改造文档或示例代码、让它跑起来,而不是去找文档。

然而,回头看看我们刚才的过程,情况其实正好相反。我们大部分时间并没有花在处理代码上,而是花在处理文档上。我们之所以被卡住,是因为 AI 在训练时没有读到这份文档。解决办法就是把文档找出来,喂给 AI。从某种意义上说,我们构建工作的产物并不是程序,而是文档。AI 负责完成最后一公里:把我们提供的文档和提示词翻译成计算机能理解的代码。

因此,从抽象的角度看,在 AI 辅助编程中,AI 本质上扮演的是一名“翻译”。它要完成翻译需要两份信息:你的意图,以及一份描述目标语言的文档。前者就是自然语言,永远都有;后者可能来自它已有的知识(例如 GPT 对 Python 已经很熟悉),也可能是一份单独的文档(例如新版 Python 库的文档)。后者通常正是麻烦的来源——如果 AI 在训练时没有充分接触到它,我们就需要主动把它提供给 AI。这件事被称为“文档管理”,也许是 AI 辅助编程中最重要的一项技能。

不过这里有一点要注意。为人类准备的传统“文档”对 AI 来说未必是最佳形式,特别是考虑到上下文窗口长度的限制,以及 AI 在面对杂乱上下文时的理解困难。把 OpenAI API 的整个文档网站都喂给 ChatGPT 既困难又昂贵。因此,为 AI 辅助编程构造文档本身就是一门专门的技能。

关于如何恰当地构造文档、以便高效利用 GPT 的上下文窗口,有一些常见做法:

直接使用代码和注释:AI 经过训练能够轻松读写代码,所以不要犹豫,直接把代码(接口)和注释作为提示词的一部分粘贴进去,即使它很长也无妨。这是与 AI 沟通的一种高效且精确的方式。

包含使用示例代码:示例代码是另一种构造文档的有效方式。它能让 AI 进行类比推理,通常更容易找到,篇幅也更短。但示例代码也可能引入歧义,从而引发幻觉。

描述性的文字文档:这就是我们通常所说的传统文档,对人类很友好,但缺乏让 AI 写出可运行程序所需的精确度。它们虽然不理想,但聊胜于无。

遵循这些做法,你就能构造出帮助 AI 充分利用上下文窗口、生成准确且可运行代码的文档。

现在正是把所学应用起来、为后续构建打下另一块基石的好时机。

练习

=======================================================

如果你想用 Python(而不是上一课展示的 JavaScript)写一个函数,调用 GPT API 来解释“LLM 中的上下文窗口是什么”,你应该怎么做?

=======================================================

这个练习同样有趣。如果你不做任何文档管理,直接让 ChatGPT 写一段 Python 程序,它其实能生成一段正确调用 GPT-4 API 的程序,如下面的截图所示:

这大概是因为 ChatGPT 在训练过程中接触到的 Python 示例代码远多于 JavaScript,所以它学会了如何用 Python 调用 GPT-4。但这段程序在某些机器上可能会失败:

这相当令人困惑。原因和上一课类似。当我们最新版 ChatGPT 背后的模型 GPT-4-Turbo 进行训练时,OpenAI 的 Python 库还处于测试阶段。在 GPT 的知识截止日期之后,OpenAI Python 库发布了 1.0 版本,并对接口做了破坏性改动,这导致 ChatGPT 写出来的程序无法继续运行。因此,如果我们想使用最新最好的库,仍然需要进行文档管理。

正如之前展示过的,OpenAI Python 库体量很大,要找到一份涵盖所有函数的完整清单并不容易。所以我们采用第二种方法:直接把 examples/demo.py 的内容复制下来,作为提示词的一部分。

示例提示词

=======================================================

请写一段 Python 程序,调用 GPT-4-Turbo 来回答我的问题:“What is a context window in LLM?” 我们不需要流式输出。

下面是一段示例程序:

<examples/demo.py 的内容>

=======================================================

示例响应(仅代码):

=======================================================

#!/usr/bin/env -S poetry run python

import openai

Create an OpenAI API client

client = openai.OpenAI()

Non-streaming request to GPT-4-Turbo

print(”----- standard request -----”)

completion = client.chat.completions.create(

model="gpt-4-turbo",

messages=[

    {

        "role": "user",

        "content": "What is a context window in LLM?",

    },

],

)

print(completion.choices[0].message.content)

=======================================================

这段代码可以在最新版的库上正常运行。

这同样很有启发性。如果 OpenAI 当初为我们提供了这样一份对 AI 友好的文档,我们就可以省下所有这些时间和精力,直接使用新版库。同样,当你在构建自己的产品时,提供这样一份对 AI 友好的文档,会让你的客户极其轻松地用上 AI 辅助编程,从而立刻在面对非 AI 用户时获得竞争优势;这反过来也让你相对那些尚未意识到“面向 AI 友好”这件事重要性的竞争对手获得优势。而且这种效应会在你的构建过程中复利积累:当你打造自己的工具时,重视文档管理会让这些工具对 AI 友好;当你需要在它们之上构建更复杂的工具时,AI 也能立刻用得起来。

让我们把这种“构建 AI 友好型产品”的思路应用到我们的 Gmail Prioritizer 上。在我们这个具体的例子里,因为代码非常短,最好的文档其实就是代码本身。但为了演示当代码长到无法塞进上下文窗口时该怎么办,我们让 ChatGPT 帮我们生成一份文档。这里的关键词是“docstring”。它是一种特别适合编程场景的注释/文档格式。

示例提示词:

=======================================================

请阅读以下代码并写一段 docstring。结果将用来向 AI 描述这个函数,让它知道如何使用该函数来编写程序。请精确简洁,并明确说明输入参数的格式和返回值。

function isGenAIEmail(email) {

}

=======================================================

示例输出:

=======================================================

/**

  • Determines if a given email is related to Generative AI by using OpenAI’s GPT-3.5-turbo model.

  • @param {Object} email - The email object containing body as a string representing the email content.

  • @returns {boolean} - Returns true if the email is identified as related to Generative AI, otherwise false.

    • The email object should have a body field that contains the text content of the email.
    • The function sends up to the first 10,000 characters of the email body to the OpenAI API for classification.
    • The classification is done by prompting the API to return “yes” or “no” based on relevance to Generative AI.
    • Ensure the YOUR_OPENAI_API_KEY placeholder is replaced with an actual API key before using this function.

*/

function isGenAIEmail(email) {

// Function implementation

}

=======================================================

我们将在后续课程中使用这份文档。

English Original

The previous example is very interesting. We were writing programs and got blocked because AI was not generating the correct code to invoke GPT-4. In the pre-AI era, being blocked in programming typically meant lacking coding expertise, such as not understanding a programming language’s details or having difficulty debugging code. It sounds ridiculous that a lack of documents would be the issue because documents are readily available online. The heavy lifting usually involves adapting the document or sample code and making it work, rather than finding the documents.

However, look at what we did just now. It was actually the opposite. Most of our time wasn’t spent on dealing with the code but on handling documents. We were blocked because the AI didn’t have access to the document during its training. The solution was to find the document and feed it to the AI. To some extent, the result of our building effort isn’t programs but documents. The AI does the last mile delivery, translating the documents and prompts we provide into computer-compatible code for us.

So from an abstract perspective, in AI-assisted programming, AI’s role is essentially that of a translator. It needs two pieces of information to perform its job: your intention and a document describing the target language. The former is just natural language and is always available. The latter could be its existing knowledge (e.g., GPT knows Python well) or a separate document (e.g., documentation for new Python libraries). This document is usually the troublemaker and something we need to supply to the AI if it didn’t receive sufficient training on it. This is called document management and is probably the most important skill for AI-assisted programming.

There is a caveat, though. Traditional “documents” for humans might not be the best form for AI, especially because of the context window length limit and AI’s difficulty in understanding messy context windows. Feeding the entire website of OpenAI API documents to ChatGPT is hard and expensive. Therefore, document construction for AI-assisted programming is a specialized expertise.

There are some common practices on how to properly construct a document to effectively use GPT’s context window:

Use Code and Comments Directly: AI is trained to read and write code with ease, so don’t hesitate to directly paste the code (interfaces) and comments as part of the prompt, even if it’s long. This is an effective and precise communication method with AI.

Include Sample Code on Usage: Sample code is another effective way to construct the document. It allows AI to perform deduction, is usually easier to find, and tends to be shorter. However, sample code may also introduce ambiguity, potentially causing hallucination.

Descriptive Text Documents: These are the traditional documents we refer to, which are easy for humans to digest but lack the precision needed for AI to write functional programs. While not ideal, they are better than nothing.

By following these practices, you can construct documents that help AI effectively utilize its context window and generate accurate, functional code.

It’s a good time to apply what we’ve learned to build another fundamental block for future use.

Exercise

=======================================================

If you want to write a Python function (instead of Javascript we showed in the previous lesson) to invoke the GPT API to explain what is a context window for LLMs, what should you do?

=======================================================

This exercise is also interesting. If you directly ask ChatGPT to write a Python program without document management, it can actually generate a correct program invoking GPT-4 APIs, as demonstrated in the following screenshot:

This is probably because ChatGPT encountered much more Python sample code compared to JavaScript during the training process, so it learned how to write Python to call GPT-4. However, it might fail on some machines:

It’s quite confusing. The reason is similar to what we saw in the last lesson. When the underlying model of our latest ChatGPT, GPT-4-Turbo, was trained, the OpenAI Python library was still in a testing phase. After the GPT knowledge cutoff date, the OpenAI Python library released its 1.0 version and made breaking changes to the interface. This made the program written by ChatGPT no longer executable. Therefore, we still need document management if we want to use the latest and greatest library.

As we showed before, the OpenAI Python library is large, and finding a comprehensive list of all the functions is difficult. Therefore, we use the second approach: we directly copy the content of examples/demo.py and include it in the prompt.

Example prompt

=======================================================

Write a python program to invoke GPT-4-Turbo to answer my question of “What is a context window in LLM?” We don’t need streaming.

Here is an example program:

<Content of examples/demo.py>

=======================================================

Example response (code only):

=======================================================

#!/usr/bin/env -S poetry run python

import openai

Create an OpenAI API client

client = openai.OpenAI()

Non-streaming request to GPT-4-Turbo

print(”----- standard request -----”)

completion = client.chat.completions.create(

    model=“gpt-4-turbo”,

    messages=[

        {

            “role”: “user”,

            “content”: “What is a context window in LLM?”,

        },

    ],

)

print(completion.choices[0].message.content)

=======================================================

And it works with the latest version of the library.

This again is inspiring. If OpenAI had provided this AI-friendly document for us, we may save all this time and effort and can directly use the updated library right away. Similarly, when you are building a product, having such an AI-friendly document will make it very easy for your customers to use AI-Assisted Programming. And this brings your customers immediate competitive advantages against the non-AI users. This, in turn, also brings you a competitive advantage against the competitors that didn’t realize the importance of AI-friendly product design. And it has a compound effect in your building process. When you build your own tool, paying attention to the document management will make these tools AI-friendly. And when you need to build more complicated tools on top of them, they will be readily available to AI.

Let’s apply this mindset of building AI-friendly products to our Gmail Prioritizer. In our specific case, because our code is very short, the best document is the code itself. But just to demonstrate the case where the code is too long to fit in the context window, we ask ChatGPT to generate a document for us. Here the keyword is “docstring”. It’s a specific format of comments/documents especially suitable for programming.

Example prompt:

=======================================================

Read the following code, and write a docstring. The result will be used to describe the function to an AI. So it knows how to write a program using this function. Make it precise and brief. Be specific on the format of the input variables and return values.

function isGenAIEmail(email) {

}

=======================================================

Example output:

=======================================================

/**

 * Determines if a given email is related to Generative AI by using OpenAI’s GPT-3.5-turbo model.

 *

 * @param {Object} email - The email object containing body as a string representing the email content.

 * @returns {boolean} - Returns true if the email is identified as related to Generative AI, otherwise false.

 *

 * - The email object should have a body field that contains the text content of the email.

 * - The function sends up to the first 10,000 characters of the email body to the OpenAI API for classification.

 * - The classification is done by prompting the API to return “yes” or “no” based on relevance to Generative AI.

 * - Ensure the YOUR_OPENAI_API_KEY placeholder is replaced with an actual API key before using this function.

 */

function isGenAIEmail(email) {

    // Function implementation

}

=======================================================

We will use this document in the next lessons.