为什么我们要讨论开源模型?

我的理解

开源模型在智能水平上虽然不及闭源商业模型,但它们填补了大量闭源方案无法覆盖的场景:涉及隐私合规不能外传 PII、带宽不稳定的离线环境、超高吞吐量或海量数据处理的成本约束等。更深层的价值在于控制权:开源模型允许对安全对齐进行精细调整、通过微调或蒸馏构建垂直领域专有模型,在延迟和成本上同时优化。从商业竞争角度看,自部署开源模型能够构筑技术护城河,让管理层和投资人更清晰地认识到技术负责人的贡献价值。开源与闭源并非非此即彼,而是互补的工具箱——商业模型解决通用需求,开源模型打开长尾场景。

相关链接


原文

Lesson 45 of 68 为什么我们要讨论开源模型?/ Why Are We Discussing Open-Source Models?

自 ChatGPT 发布以来,我们做出过若干预测。在过去几年里,大语言模型(LLM)无论是形态还是商业价值都经历了显著的发展。然而,我们最初的许多判断在很大程度上仍然成立。其中我们一直坚持、并且仍然认为正确的一个判断是:在智能水平和技术能力上,闭源 LLM 将持续领先于开源模型。

这也是为什么在构建本课程的过程中,我们主要聚焦于商业化的闭源模型。那么,为什么我们还要在本模块中探索开源模型与工具链的世界?原因在于,在其他模块覆盖了使用商业模型的主要应用场景之后,仍有大量应用无法仅依靠闭源模型来驱动。换句话说,在各类业务场景中,特别是长尾业务场景里,开源模型确实能够提供闭源模型无法带来的价值。

商业 LLM 无法覆盖的业务场景

具体而言,LLM 是一种通用性极强的技术,能够支持许多不同的应用场景。然而,并非所有场景都可以通过调用 API 来解决。例如,有些场景存在法律合规问题,因而不能连接外部互联网,也不能将个人身份信息(PII)发送到远程服务器,比如敏感的金融或医疗数据。又比如由于带宽受限,我们无法依赖通过互联网访问的远程 API 来提供低延迟、高可靠的服务,例如主要服务于偏远地区、移动信号不稳定、因此无法依赖互联网调用 API 的场景。

在某些情况下,即便从网络和合规角度可以使用 API 支撑业务场景,服务商的限额也会让我们无法获得足够的吞吐量。例如,如果通过 AWS Bedrock 调用 Anthropic Claude 模型,Claude 3.5 Sonnet 1022 的吞吐量仅为 50 QPM(每分钟 50 次查询),也就是每分钟最多处理 50 个请求,这一额度非常有限。如果业务需求超过该上限,我们要么以每小时数百美元的高昂成本使用 provisioned throughput(预置吞吐量),要么进行各种工程优化来降低对 API 吞吐量的需求。

另一种可能的场景是,我们的商业场景需要进行海量数据处理。例如,对比特币交易进行响应或挖掘,调用商业 API 的成本将极为高昂。由此可见,对于许多有价值的业务场景而言,仅依赖按 token 计费的闭源 API 无法满足需求。拥有一套本地化、私有且无调用上限的 API,是一种极具价值的选择。

灵活性与优化空间

然而,开源模型与工具链的价值远不止隐私与成本上的优势。一个更核心的优势是,它们让我们对整个系统拥有更大的掌控力。比如,使用过 GPT-4o 或 Claude 3.5 这类视觉模型来理解图像的用户可能会注意到,这些模型有时会以“无法从用户处提取个人信息”等理由拒绝请求。但实际上我们并没有让它做人脸识别之类的事情,而是其他任务。换言之,LLM 服务商所做的安全对齐可能不够精细,也未必完全符合我们的需求,这会显著阻碍某些业务场景的落地。如果使用开源模型,由于整个模型架构和权重都掌握在我们手中,就可以通过特定方法对其对齐进行微调,让模型接受并处理这类任务。

同样地,在一些细分的垂直领域中,商业方案可能无法完全达到准确率的要求。如果一家公司积累了大量标注数据,就可以通过 fine-tuning(微调)或 continued training(持续训练)来提升模型在特定领域的表现。这样,在保留模型的世界背景知识、逻辑推理能力和自然语言理解能力的同时,又叠加了该垂直领域的专业知识,从而完成相关任务。

另一个常见的应用场景是在特定的业务操作中。一方面,由于模型规模较大,延迟可能较长,从而影响客户体验;另一方面,为满足特定需求,prompt 可能很长,这也会推高 API 成本。针对此类情况,一种典型的解决方案是蒸馏一个更小的 LLM,使其在该特定领域内达到大模型的水准。这样既能降低延迟,也能改善用户体验。

开源工具链也能带来类似的收益。一方面,它们提供了非常灵活的定制空间;另一方面,它们的方案有时甚至比现有的商业方案更加易用。例如,对于常用的 RAG(检索增强生成),如果使用预制的商业方案(如 GPTs,或直接将文件上传到 GPT),那基本就是一锤子买卖——好用就用,不好用也无从改进。但如果使用 Open WebUI 这类 pipeline 实现 RAG,我们就可以对其中许多具体模块进行定制乃至增强,例如在建立索引时选择特定的 embedding 及参数,无需写代码即可调优 RAG 的表现。

竞争优势与价值认知

从更务实的角度看,开源模型与工具链还有一个重要价值:它们能够改变外界对你作为技术负责人价值的认知。无论是在大公司还是创业公司中,当我们基于 LLM 为用户提供技术方案时,如果方案仅仅是对商业 API 的一层封装,就会面临两大挑战:第一,难以进行各种定制和优化;第二,相对容易被他人复制。结果是,无论是大公司的管理者还是创业公司的投资人,都很难从这样的方案中看到价值。然而,如果我们通过蒸馏、微调等方法部署属于自己的模型,就能在业务和技术上同时获得竞争优势,构筑更深的护城河,也更容易让管理层或投资人认可这项技术以及你的贡献价值。

综上所述,开源模型在技术上未必最先进,但它们能为我们打开一片新天地,让我们更好地支撑某些应用,并在灵活性与易用性之间取得更佳平衡,同时也帮助更多人理解我们作为技术负责人所创造的价值。这正是我们决定专门开辟一个模块来介绍开源 LLM 的原因。

English Original

Since the release of ChatGPT, we have made several predictions. Over the past few years, large language models (LLMs) have seen significant development in both their form and commercial value. However, many of our initial predictions still hold true to a large extent. One prediction that we have consistently upheld and continue to believe is correct is that closed-source LLMs will maintain an advantage over open-source models in terms of intelligence and technology.

This is why, in the process of building this course, we have largely focused on commercial, closed-source models. However, why do we want to explore the world of open-source models and toolchains in this module? The reason is that after covering major application scenarios using commercial models in other modules, there are still many applications that cannot be driven by closed-source models alone. In other words, among various business scenarios, especially those long-tail business scenarios, open-source models can indeed provide value that closed-source models cannot.

Business Scenarios that Commercial LLMs Cannot Cover

Specifically, LLMs are a very versatile technology and can support many different application scenarios. However, not all scenarios can be solved by calling APIs. For example, some scenarios have legal compliance issues and therefore cannot connect to the external internet or send personally identifiable information (PII) to remote servers, such as sensitive financial or medical data. Another case is that due to bandwidth limitations, we cannot rely on remote APIs over the internet to provide low-latency, high-reliability services, for instance, in scenarios primarily serving remote areas where mobile signals are unstable and thus cannot depend on calling APIs from the internet.

In some cases, even if it is possible to use APIs to support business scenarios from a network and compliance perspective, service provider limitations prevent us from providing sufficient throughput. For example, if using AWS Bedrock to access the Anthropic Claude model, the throughput of Claude 3.5 Sonnet 1022 is only 50 queries per minute (QPM), meaning it can handle a maximum of 50 requests per minute. This quota is very low. If business requirements exceed this limit, we either have to use provisioned throughput at a high cost of hundreds of dollars per hour or apply various engineering optimizations to reduce API throughput demands.

Another possible scenario is that our commercial scenarios require massive data processing. For example, responding to or mining from Bitcoin transactions would incur huge costs when calling commercial APIs. Therefore, it can be seen that for many valuable business scenarios, relying solely on closed-source APIs that bill on tokens cannot meet the needs of these scenarios. Having a local, private, and unlimited API is a very valuable option.

Flexibility and Optimization Potential

However, the value of open-source models and toolchains goes beyond just privacy and cost advantages. A more core advantage is that they provide us with greater control over the entire system. For example, users who have used visual models like GPT-4o or Claude 3.5 to understand images might notice that these models sometimes reject requests for reasons such as being unable to extract personal information from users. However, we did not actually ask them to perform tasks like facial recognition but other tasks instead. In other words, the safety alignment performed by LLM service providers may not be precise enough or fully meet our needs, which can significantly hinder the realization of certain business scenarios. If using open-source models, since the entire model architecture and weights are under our control, we can fine-tune its alignment through certain methods to make it accept and handle such tasks.

Similarly, in some niche vertical domains, commercial solutions may not fully meet the accuracy bar. If a company has accumulated a large amount of labeled data, they can improve the model’s performance in a specific domain through fine-tuning or continued training. This way, while maintaining the model’s background knowledge about the world, logical reasoning ability, and natural language understanding capability, it adds specialized knowledge for that vertical domain, thereby completing relevant tasks.

Another common application scenario is in specific business operations. Sometimes, due to the large size of the model, the latency may be long, thus affecting customer experience; on the other hand, the prompts may be long to accommodate specific requirements, which also increases API costs. A typical solution for such situations is to distill a smaller LLM so that it performs at the level of the larger model in this particular domain. This not only reduces latency, but also improves user experience.

Open-source toolchains can offer similar benefits. On one hand, they provide very flexible customization options; on the other hand, their solutions may be even more user-friendly than existing commercial solutions. For example, with commonly used RAG (Retrieval-Augmented Generation), if using pre-built commercial solutions (such as GPTs or directly uploading files to GPT), it is a one-time deal—use it if it works well, otherwise there’s no way to improve it. However, if using pipelines like Open WebUI for RAG, we can customize and even enhance many specific modules inside, such as choosing specific embeddings and parameters when indexing, adjusting the performance of RAG without writing code.

Competition Advantage and Perceived Value

From a pragmatic perspective, open-source models and toolchains have another significant value: they can change the perception of your worth as a technical owner. When developing technology solutions that leverage LLMs to serve users, whether it’s in a corporation setting or in a startup setting, if our solution is merely a wrapper around commercial APIs, we face two major challenges. First, making various customizations and improvements becomes difficult; second, replication by others is relatively easy. As a result, both our managers (in corporations) and investors (of startups) struggle to see the value in such solutions. However, if we serve our own model through methods like distillation and fine-tuning, we can gain a competitive edge in both business and technology, build a stronger moat, and make it easier for management or investors to recognize the value of this technology and your contribution.

In summary, while open-source models may not be the most advanced technologically, they can open up a new world for us, enabling us to better support certain applications and achieve a better balance in terms of flexibility and ease-of-use. They also help more people understand the value of our work as technical owners. This is why we have decided to dedicate an entire module to introducing open-source LLMs.