构建一个检索增强生成(RAG)应用 #

https://python.langchain.com/docs/tutorials/rag/

大语言模型所实现的最强大应用之一是复杂的问答(Q&A)聊天机器人。这些应用能够回答关于特定源信息的问题。这些应用使用一种称为检索增强生成(RAG)的技术。

本教程将展示如何基于文本数据源构建一个简单的问答应用。在此过程中，我们将介绍典型的问答架构，并强调更高级问答技术的额外资源。我们还将看到LangSmith如何帮助我们追踪和理解我们的应用。随着我们的应用复杂度增加，LangSmith将变得越来越有用。

如果你已经熟悉基本的检索技术，你可能也会对不同检索技术的高层次概述感兴趣。

RAG是什么？ #

RAG是一种用额外数据增强大语言模型知识的技术。

大语言模型可以对广泛的主题进行推理，但它们的知识仅限于训练时截止日期前的公开数据。如果你想构建能够对私有数据或模型截止日期后引入的数据进行推理的人工智能应用，你需要用特定信息来增强模型的知识。检索适当信息并将其插入模型提示的过程被称为检索增强生成（RAG）。

LangChain有许多组件旨在帮助构建问答应用，以及更广泛的RAG应用。

注意：这里我们专注于非结构化数据的问答。如果你对结构化数据的RAG感兴趣，可以查看我们关于SQL数据问答的教程。

概念 #

一个典型的RAG应用有两个主要组成部分：

索引(Indexing)：从数据源获取数据并建立索引的管道(pipeline)。这通常在离线状态下进行。

检索和生成(Retrieval and generation)：实际的RAG链，在运行时接收用户查询，从索引中检索相关数据，然后将其传递给模型。

从原始数据到答案的最常见完整顺序如下：

索引(Indexing) #

加载(Load)：首先我们需要加载数据。这是通过文档加载器Document Loaders完成的。
分割(Split)：文本分割器Text splitters将大型文档(Documents)分成更小的块(chunks)。这对于索引数据和将其传递给模型都很有用，因为大块数据更难搜索，而且不适合模型有限的上下文窗口。
存储(Store)：我们需要一个地方来存储和索引我们的分割(splits)，以便后续可以对其进行搜索。这通常使用向量存储VectorStore和嵌入模型Embeddings model来完成。

检索和生成(Retrieval and generation) #

检索(Retrieve)：给定用户输入，使用检索器Retriever从存储中检索相关的文本片段。
生成(Generate)： ChatModel使用包含问题和检索到的数据的提示来生成答案。

环境准备 #

Jupyter Notebook #

这篇教程和其他一些教程一样在Jupyter Notebook中运行。关于如何安装的说明，请参见此处。

Installation #

本教程需要安装下面这些包：

1pip install -Uq langchain langchain-community langchain-chroma

有关更多详细信息，请参阅我们的安装指南。

加载环境变量配置 #

OPENAI_API_KEY, OPENAI_BASE_URL, MODEL_NAME, EMBEDDING_MODEL_NAME从.env文件中配置:

1pip install python-dotenv

1from dotenv import load_dotenv
2assert load_dotenv()
3
4import os
5MODEL_NAME = os.environ.get("MODEL_NAME")
6EMBEDDING_MODEL_NAME = os.environ.get("EMBEDDING_MODEL_NAME")

LangSmith跟踪配置(可选) #

略，参见这里

预览 #

在本教程中，我们将构建一个应用程序，以回答有关网站内容的问题。我们将使用的特定网站是Lilian Weng的LLM Powered Autonomous Agents博文，这使我们能够询问关于该博文内容的问题。

我们可以创建一个简单的索引管道和RAG链，代码大约20行：

1pip install -qU langchain-openai

1from langchain_openai import ChatOpenAI
2llm = ChatOpenAI(model=MODEL_NAME)

beautifulsoup4
Beautiful Soup是一个可以从HTML或XML文件中提取数据的Python库。它能用你喜欢的解析器和习惯的方式实现文档树的导航、查找、和修改

1pip install beautifulsoup4

 1import bs4
 2from langchain import hub
 3from langchain_chroma import Chroma
 4from langchain_community.document_loaders import WebBaseLoader
 5from langchain_core.output_parsers import StrOutputParser
 6from langchain_core.runnables import RunnablePassthrough
 7from langchain_openai import OpenAIEmbeddings
 8from langchain_text_splitters import RecursiveCharacterTextSplitter
 9
10# Load, chunk and index the contents of the blog.
11loader = WebBaseLoader(
12    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
13    bs_kwargs=dict(
14        parse_only=bs4.SoupStrainer(
15            class_=("post-content", "post-title", "post-header")
16        )
17    )
18    
19)
20docs = loader.load()
21
22text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
23splits = text_splitter.split_documents(docs)
24vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings(model=EMBEDDING_MODEL_NAME))
25
26# Retrieve and generate using the relevant snippets of the blog.
27retriever = vectorstore.as_retriever()
28prompt = hub.pull("rlm/rag-prompt")
29print(prompt)
30
31def format_docs(docs):
32    return "\n\n".join(doc.page_content for doc in docs)
33
34rag_chain = (
35    {"context": retriever | format_docs, "question": RunnablePassthrough()}
36    | prompt
37    | llm
38    | StrOutputParser()
39)
40
41rag_chain.invoke("What is Task Decomposition?")

1'Task Decomposition is a process where a complex task is broken down into smaller, simpler steps or subtasks. This technique is utilized to enhance model performance on complex tasks by making them more manageable. It can be done by using language models with simple prompting, task-specific instructions, or with human inputs.'

详细步骤解析 #

让我们逐步深入分析上述代码，以真正理解其中的运作原理。

1. 索引：加载(Indexing: Load) #

我们首先需要加载博客文章内容。我们可以使用文档加载器来完成这个任务，它们是从数据源加载数据并返回文档列表的对象。Document是一个包含page_content（str)）和metadata（dict）的对象。

在这种情况下，我们将使用WebBaseLoader，它使用urllib从网址加载HTML，并使用BeautifulSoup将其解析为文本。我们可以通过向BeautifulSoup解析器传递参数来自定义HTML到文本的解析过程，这些参数通过bs_kwargs传递（参见BeautifulSoup文档）。在这个案例中，只有类名为"post-content"、“post-title"或"post-header"的HTML标签是相关的，所以我们将删除所有其他标签。

 1import bs4
 2from langchain_community.document_loaders import WebBaseLoader
 3
 4# Only keep post title, headers, and content from the full HTML.
 5bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
 6loader = WebBaseLoader(
 7    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
 8    bs_kwargs={"parse_only": bs4_strainer},
 9)
10docs = loader.load()
11
12len(docs[0].page_content)

1print(docs[0].page_content[:500])

1 LLM Powered Autonomous Agents
2    
3Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng
4
5
6Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
7Agent System Overview#
8In

深入了解:

DocumentLoader：从数据源加载数据并以Document列表形式返回的对象。

Docs：关于如何使用DocumentLoader的详细文档。
Integrations：可供选择的160多种集成方案。
Interface：基本接口的API参考。

2. 索引：分割(Indexing: Split) #

我们加载的文档长度超过42000个字符。这对于许多模型的上下文窗口来说太长了。即使对于那些能够将整篇文章放入上下文窗口的模型，在非常长的输入中查找信息也可能会遇到困难。

为了解决这个问题，我们将把Document分割成小块(chunks)，用于嵌入和向量存储。这应该能帮助我们在运行时只检索博客文章中最相关的部分。

在这种情况下，我们将文档分割成1000个字符的块，每个块之间有200个字符的重叠。重叠有助于减少将一个陈述与其相关的重要上下文分开的可能性。我们使用递归字符文本分割器RecursiveCharacterTextSplitter，它会使用常见的分隔符（如换行符）递归地分割文档，直到每个块达到适当的大小。这是通用文本用例推荐的文本分割器。

我们设置add_start_index=True，这样每个分割文档在初始文档中的起始字符索引将作为元数据属性"start_index"被保留。

1from langchain_text_splitters import RecursiveCharacterTextSplitter
2
3text_splitter = RecursiveCharacterTextSplitter(
4    chunk_size=1000, chunk_overlap=200, add_start_index=True
5)
6all_splits = text_splitter.split_documents(docs)
7
8len(all_splits)

1len(all_splits[0].page_content)

1all_splits[10].metadata

1{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
2 'start_index': 7056}

深入了解:

文本分割器TextSplitter：将Documents列表分割成更小块的对象。它是DocumentTransformers的子类。

通过how-to docs文档了解更多使用不同方法分割文本的信息
Code (py or js)
Scientific papers
接口：基本接口的API参考。

文档转换器DocumentTransformer：对Document对象列表执行转换的对象。

文档：关于如何使用DocumentTransformers的详细文档
Integrations
接口：基本接口的API参考。

3. 索引：存储(Indexing: Store) #

现在我们需要对66个文本块进行索引，以便在运行时能够搜索它们。最常见的方法是嵌入每个文档分割的内容，并将这些嵌入插入到向量数据库（或向量存储）中。当我们想要搜索我们的分割时，我们采用文本搜索查询，对其进行嵌入，然后执行某种"相似性"搜索，以识别存储的分割中与我们的查询嵌入最相似的嵌入。最简单的相似性度量是余弦相似度 — 我们测量每对嵌入（它们是高维向量）之间角度的余弦值。

我们可以使用Chroma向量存储和OpenAIEmbeddings模型，通过一个命令来嵌入和存储所有的文档分割。

1from langchain_chroma import Chroma
2from langchain_openai import OpenAIEmbeddings
3
4vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings(model=EMBEDDING_MODEL_NAME))

深入了解：

Embeddings：文本嵌入模型的包装器，用于将文本转换为嵌入。

文档：关于如何使用嵌入的详细文档。
Integrations：可供选择的30多种集成方案。
接口：基本接口的API参考。

VectorStore：向量数据库的包装器，用于存储和查询嵌入。

文档：关于如何使用向量存储的详细文档。
Integrations：可供选择的40多种集成方案。
接口：基本接口的API参考。

这完成了管道的**索引(Indexing)**部分。此时，我们有了一个可查询的向量存储，其中包含了我们博客文章的分块内容。给定一个用户问题，我们理想情况下应该能够返回回答该问题的博客文章片段。

4. 检索和生成：检索 #

现在让我们编写实际的应用逻辑。我们想要创建一个简单的应用，它接收用户问题，搜索与该问题相关的文档，将检索到的文档和初始问题传递给模型，然后返回答案。

首先，我们需要定义搜索文档的逻辑。LangChain定义了一个检索器接口Retriever，它包装了一个索引，可以根据字符串查询返回相关的Documents。

最常见的Retriever类型是向量存储检索器VectorStoreRetriever，它使用向量存储的相似性搜索功能来实现检索。任何VectorStore都可以通过VectorStore.as_retriever()轻松转换为Retriever：

1retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
2
3retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")
4
5len(retrieved_docs)

1print(retrieved_docs[0].page_content)

1Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
2Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

深入了解：

向量存储常用于检索，但还有其他检索方式。

Retriever：根据文本查询返回Document的列表对象

文档：关于接口和内置检索技术的进一步文档。其中一些包括：
- MultiQueryRetriever生成输入问题的变体，以提高检索命中率。
- MultiVectorRetriever则生成嵌入的变体，同样是为了提高检索命中率。
- Max marginal relevance在检索到的文档中选择相关性和多样性，以避免传入重复的上下文。
- 在向量存储检索过程中，可以使用元数据过滤器对文档进行过滤，例如使用自查询检索器Self Query Retriever。
Integrations：与检索服务的集成。
接口：基本接口的API参考。

5. 检索与生成：生成 #

让我们将所有内容整合成一个链，接受问题，检索相关文档，构建提示，将其传递给模型，并解析输出。

我们将使用gpt-4o-mini OpenAI聊天模型，但任何LangChain LLM或ChatModel都可以替换使用。

1pip install -qU langchain-openai

1from langchain_openai import ChatOpenAI
2llm = ChatOpenAI(model=MODEL_NAME)

1from langchain import hub
2
3prompt = hub.pull("rlm/rag-prompt")
4
5example_messages = prompt.invoke(
6    {"context": "filler context", "question": "filler question"}
7).to_messages()
8
9example_messages

1[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: filler question \nContext: filler context \nAnswer:")]

1print(example_messages[0].content)

1You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
2Question: filler question 
3Context: filler context 
4Answer:

我们将使用LCEL Runnable协议来定义链，这样可以：

以透明的方式将组件(components)和函数(functions)连接在一起
自动在LangSmith中追踪我们的链
开箱即用地实现流式(streaming)、异步(async)和批量调用(batched calling)

以下是实现代码：

 1from langchain_core.output_parsers import StrOutputParser
 2from langchain_core.runnables import RunnablePassthrough
 3
 4
 5def format_docs(docs):
 6    return "\n\n".join(doc.page_content for doc in docs)
 7
 8
 9rag_chain = (
10    {"context": retriever | format_docs, "question": RunnablePassthrough()}
11    | prompt
12    | llm
13    | StrOutputParser()
14)
15
16for chunk in rag_chain.stream("What is Task Decomposition?"):
17    print(chunk, end="", flush=True)

1Task decomposition involves breaking down a complex task into smaller, manageable steps to facilitate problem-solving. This can be achieved using large language models (LLMs) through simple prompts, task-specific instructions, or human inputs. The process enhances the model's performance by allowing it to tackle each step systematically.

让我们分析一下LCEL以理解其运作。

首先：这些组件（retriever, prompt, llm等）都是Runnable的实例。这意味着它们实现了相同的方法，例如sync和async的.invoke、.stream或.batch，这使得它们更容易连接在一起。它们可以通过|操作符连接成一个RunnableSequence - 另一个Runnable。

当LangChain遇到|操作符时，会自动将某些对象转换为runnable。在这里，format_docs被转换为RunnableLambda，包含"context"和"question"的字典被转换为RunnableParallel。细节不如整体观点重要，关键在于每个对象都是一个Runnable。

现在让我们追踪输入问题如何流经上述runnable。

正如我们上面所看到的，prompt的输入期望是一个包含"context"和"question"键的字典。因此，这个链的第一个元素构建了runnable，以便从输入问题中计算出这两个值：

retriever | format_docs将问题传递给检索器，生成Document列表对象，然后传递给format_docs以生成字符串；
RunnablePassthrough()则将输入问题原封不动地传递。

也就是说，如果你构建了

1chain = (
2    {"context": retriever | format_docs, "question": RunnablePassthrough()}
3    | prompt
4)

然后，chain.invoke(question)将构建一个格式化的提示，准备进行推理。（注意：在使用LCEL进行开发时，使用这样的子链进行测试是很实用的。）

链的最后一步是llm，它执行推理，以及StrOutputParser()，它仅从LLM的输出消息中提取字符串内容。

您可以通过LangSmith的追踪分析这个链的各个步骤。

内置链 #

如果需要，LangChain包括实现上述LCEL的便捷函数。我们组合了两个函数：

create_stuff_documents_chain指定如何将检索到的上下文输入到提示和LLM中。在这种情况下，我们将“填充”内容到提示中——即，我们将包含所有检索到的上下文，而不进行任何总结或其他处理。它大致实现了我们上面的rag_chain，输入键为context和input。它使用检索到的上下文和查询生成答案。
create_retrieval_chain添加了检索步骤，并将检索到的上下文传播通过链，提供与最终答案一起的上下文。它的输入键为input，并在输出中包含input、context和answer。

 1from langchain.chains import create_retrieval_chain
 2from langchain.chains.combine_documents import create_stuff_documents_chain
 3from langchain_core.prompts import ChatPromptTemplate
 4
 5system_prompt = (
 6    "You are an assistant for question-answering tasks. "
 7    "Use the following pieces of retrieved context to answer "
 8    "the question. If you don't know the answer, say that you "
 9    "don't know. Use three sentences maximum and keep the "
10    "answer concise."
11    "\n\n"
12    "{context}"
13)
14
15prompt = ChatPromptTemplate.from_messages(
16    [
17        ("system", system_prompt),
18        ("human", "{input}"),
19    ]
20)
21
22
23question_answer_chain = create_stuff_documents_chain(llm, prompt)
24rag_chain = create_retrieval_chain(retriever, question_answer_chain)
25
26response = rag_chain.invoke({"input": "What is Task Decomposition?"})
27print(response["answer"])

1Task decomposition is the process of breaking down a complicated task into smaller, manageable steps or subgoals. It can be achieved using various methods, such as prompting a language model, giving task-specific instructions, or incorporating human inputs. This approach enhances understanding and efficiency in problem-solving.

返回来源:

通常在问答应用中，展示给用户用于生成答案的来源非常重要。 LangChain的内置函数create_retrieval_chain会将检索到的源文档传递到输出的"context"键中：

1for document in response["context"]:
2    print(document)
3    print()

 1page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
 2
 3page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1585}
 4
 5page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 2192}
 6
 7page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
 8
 9page_content='Resources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
10
11page_content='Resources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 29630}

定制提示: 如上所示，我们可以从Prompt hub加载提示（例如此RAG提示）。提示也可以轻松定制：

 1from langchain_core.prompts import PromptTemplate
 2
 3template = """Use the following pieces of context to answer the question at the end.
 4If you don't know the answer, just say that you don't know, don't try to make up an answer.
 5Use three sentences maximum and keep the answer as concise as possible.
 6Always say "thanks for asking!" at the end of the answer.
 7
 8{context}
 9
10Question: {question}
11
12Helpful Answer:"""
13custom_rag_prompt = PromptTemplate.from_template(template)
14
15rag_chain = (
16    {"context": retriever | format_docs, "question": RunnablePassthrough()}
17    | custom_rag_prompt
18    | llm
19    | StrOutputParser()
20)
21
22rag_chain.invoke("What is Task Decomposition?")

1'Task Decomposition is the process of breaking down a complex task into smaller, manageable steps or subgoals. This can be achieved through various methods, such as prompting a language model, using task-specific instructions, or providing human inputs. It helps enhance performance by allowing a systematic approach to solving difficult problems. Thanks for asking!'

下一步 #

我们已经介绍了构建基本问答应用的步骤：

[使用文档加载器加载数据
使用文本拆分器对索引数据进行分块，使其更容易被模型使用
将数据嵌入到向量存储中
响应传入的问题时检索之前存储的块
使用检索到的块作为上下文生成答案

在上述每个部分中，都有很多功能、集成和扩展可以探索。除了上面提到的更深入的资源外，好的下一步包括：

返回来源：了解如何返回源文档
流式传输：了解如何流式传输输出和中间步骤
添加聊天历史：了解如何添加聊天历史到您的应用
检索概念指南：特定检索技术的高级概述
构建本地RAG应用程序：使用所有本地组件创建类似于上述应用程序