构建一个对话式RAG应用 #

https://python.langchain.com/docs/tutorials/qa_chat_history/

先决条件
本教程假定您对以下概念有一定了解：
Chat history
Chat models
Embeddings
Vector stores
Retrieval-augmented generation
Tools
Agents

在许多问答应用中，我们希望允许用户进行来回对话，这意味着应用需要某种形式的"记忆"来存储过去的问题和答案，以及一些逻辑来将这些信息加入当前的思考过程中。

在本指南中，我们重点关注添加逻辑以加入历史消息。关于聊天历史管理的更多细节在这里有所涉及。

我们将介绍两种方法：

Chains，我们总是执行检索步骤；
Agents，我们给予大语言模型决定是否以及如何执行检索步骤（或多个步骤）的自由。

对于外部知识源，我们将使用RAG教程中Lilian Weng撰写的博文LLM Powered Autonomous Agents。

环境准备 #

依赖 #

我们将使用OpenAI embeddings和一个简单的内存中的vector store，但这里展示的所有内容都适用于任何Embeddings、 VectorStore或Retriever。

我们将使用以下包：

1pip install --upgrade --quiet langchain langchain-community beautifulsoup4

加载环境变量配置 #

OPENAI_API_KEY, OPENAI_BASE_URL, MODEL_NAME, EMBEDDING_MODEL_NAME从.env文件中配置:

1pip install python-dotenv

1from dotenv import load_dotenv
2assert load_dotenv()
3
4import os
5MODEL_NAME = os.environ.get("MODEL_NAME")
6EMBEDDING_MODEL_NAME = os.environ.get("EMBEDDING_MODEL_NAME")

LangSmith跟踪配置(可选) #

略，参见这里

Chains #

让我们首先回顾一下我们在RAG教程中基于Lilian Weng撰写的博文LLM Powered Autonomous Agents构建的问答应用。

1pip install -qU langchain-openai

1from langchain_openai import ChatOpenAI
2
3llm = ChatOpenAI(model=MODEL_NAME)

 1import bs4
 2from langchain.chains import create_retrieval_chain
 3from langchain.chains.combine_documents import create_stuff_documents_chain
 4from langchain_community.document_loaders import WebBaseLoader
 5from langchain_core.prompts import ChatPromptTemplate
 6from langchain_core.vectorstores import InMemoryVectorStore
 7from langchain_openai import OpenAIEmbeddings
 8from langchain_text_splitters import RecursiveCharacterTextSplitter
 9
10# 1. Load, chunk and index the contents of the blog to create a retriever.
11loader = WebBaseLoader(
12    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
13    bs_kwargs=dict(
14        parse_only=bs4.SoupStrainer(
15            class_=("post-content", "post-title", "post-header")
16        )
17    ),
18)
19docs = loader.load()
20
21text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
22splits = text_splitter.split_documents(docs)
23vectorstore = InMemoryVectorStore.from_documents(
24    documents=splits, embedding=OpenAIEmbeddings(model=EMBEDDING_MODEL_NAME)
25)
26retriever = vectorstore.as_retriever()
27
28
29# 2. Incorporate the retriever into a question-answering chain.
30system_prompt = (
31    "You are an assistant for question-answering tasks. "
32    "Use the following pieces of retrieved context to answer "
33    "the question. If you don't know the answer, say that you "
34    "don't know. Use three sentences maximum and keep the "
35    "answer concise."
36    "\n\n"
37    "{context}"
38)
39
40prompt = ChatPromptTemplate.from_messages(
41    [
42        ("system", system_prompt),
43        ("human", "{input}"),
44    ]
45)
46
47question_answer_chain = create_stuff_documents_chain(llm, prompt)
48rag_chain = create_retrieval_chain(retriever, question_answer_chain)

1response = rag_chain.invoke({"input": "What is Task Decomposition?"})
2response["answer"]

1'Task Decomposition is the process of breaking down a complex task into smaller, manageable sub-tasks to simplify execution and enhance understanding. It often utilizes techniques like Chain of Thought (CoT) prompting, where the model is guided to "think step by step." This approach helps clarify the model\'s reasoning and improves performance on difficult tasks.'

注意，我们使用了内置的链构造器create_stuff_documents_chain和create_retrieval_chain，因此我们解决方案的基本组成部分是：

检索器retriever；
提示prompt；
大语言模型LLM。

这将简化加入聊天历史的过程。

添加聊天历史 #

我们构建的链直接使用输入查询来检索相关上下文。但在对话设置中，用户查询可能需要对话上下文才能被理解。例如，考虑以下对话：

Human：“什么是任务分解？”
AI：“任务分解涉及将复杂任务分解为更小、更简单的步骤，使其对代理或模型来说更易于管理。”
Human：“有哪些常见的执行方法？”

为了回答第二个问题，我们的系统需要理解"它"指的是"任务分解"。

我们需要更新现有应用的两个方面：

提示(Prompt)：更新我们的提示以支持历史消息作为输入。
上下文化问题(Contextualizing the question)：添加一个子链，该子链接收最新的用户问题并在聊天历史的上下文中重新表述它。这可以简单地理解为构建一个新的"历史感知"检索器。之前我们有：
- query -> retriever 现在我们将有：
- (query, conversation history) -> LLM -> rephrased query -> retriever

上下文化问题

首先，我们需要定义一个子链，该子链接收历史消息和最新的用户问题，并在历史信息中有任何引用时重新表述问题。

我们将使用一个包含名为"chat_history"的MessagesPlaceholder变量的提示。这允许我们使用"chat_history"输入键向提示传递消息列表，这些消息将被插入到系统消息之后和包含最新问题的人类消息之前。

注意，我们在这一步骤中利用了一个辅助函数create_history_aware_retriever，它处理chat_history为空的情况，否则按顺序应用prompt | llm | StrOutputParser() | retriever。

create_history_aware_retriever构造了一个链，它接受input和chat_history作为输入键，并具有与检索器相同的输出模式。

 1from langchain.chains import create_history_aware_retriever
 2from langchain_core.prompts import MessagesPlaceholder
 3
 4contextualize_q_system_prompt = (
 5    "Given a chat history and the latest user question "
 6    "which might reference context in the chat history, "
 7    "formulate a standalone question which can be understood "
 8    "without the chat history. Do NOT answer the question, "
 9    "just reformulate it if needed and otherwise return it as is."
10)
11
12contextualize_q_prompt = ChatPromptTemplate.from_messages(
13    [
14        ("system", contextualize_q_system_prompt),
15        MessagesPlaceholder("chat_history"),
16        ("human", "{input}"),
17    ]
18)
19history_aware_retriever = create_history_aware_retriever(
20    llm, retriever, contextualize_q_prompt
21)

这个链在我们的检索器前添加了输入查询的重新表述，以便检索过程能够包含对话的上下文。

现在我们可以构建我们的完整问答链。这只需要将检索器更新为我们新的history_aware_retriever（历史感知检索器）即可。

同样，我们将使用create_stuff_documents_chain来生成一个question_answer_chain（问答链），其输入键包括context、chat_history和input——它接受检索到的上下文，并结合对话历史和查询来生成答案。更详细的解释可以在这里找到。

我们使用create_retrieval_chain构建最终的rag_chain（RAG链）。这个链按顺序应用history_aware_retriever和question_answer_chain，并为方便起见保留了中间输出，如检索到的上下文。它的输入键包括input和chat_history，输出包含input、chat_history 、context和answer。

 1from langchain.chains import create_retrieval_chain
 2from langchain.chains.combine_documents import create_stuff_documents_chain
 3
 4qa_prompt = ChatPromptTemplate.from_messages(
 5    [
 6        ("system", system_prompt),
 7        MessagesPlaceholder("chat_history"),
 8        ("human", "{input}"),
 9    ]
10)
11
12
13question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
14
15rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

让我们来试试这个。下面我们将提出一个问题和一个需要上下文化才能返回合理回答的后续问题。由于我们的链包含一个"chat_history"输入，调用者需要管理聊天历史。我们可以通过将输入和输出消息追加到一个列表来实现这一点：

 1from langchain_core.messages import AIMessage, HumanMessage
 2
 3chat_history = []
 4
 5question = "What is Task Decomposition?"
 6ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})
 7chat_history.extend(
 8    [
 9        HumanMessage(content=question),
10        AIMessage(content=ai_msg_1["answer"]),
11    ]
12)
13
14second_question = "What are common ways of doing it?"
15ai_msg_2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})
16
17print(ai_msg_2["answer"])

1Common ways of task decomposition include: (1) using simple prompting techniques like asking the model for steps or subgoals, (2) employing task-specific instructions tailored to the task at hand, and (3) incorporating human inputs to guide the decomposition process. These methods help break down complex tasks into simpler components for easier management.

聊天历史的有状态管理 #

注意
本教程的这一部分之前使用了RunnableWithMessageHistory抽象。你可以在v0.2文档中访问该版本的文档。
从LangChain的v0.3版本开始，我们建议LangChain用户利用LangGraph persistence来将memory纳入新的LangChain应用中。
如果你的代码已经依赖于RunnableWithMessageHistory或BaseChatMessageHistory，你不需要做任何更改。我们在近期内不打算废弃这个功能，因为它适用于简单的聊天应用，任何使用RunnableWithMessageHistory的代码将继续按预期工作。
请查看如何迁移到LangGraph Memory以获取更多详细信息。

我们已经添加了支持聊天历史的应用逻辑，但我们仍然在手动将其贯穿于我们的应用中。在生产环境中，问答应用通常会将聊天历史持久化到数据库中，并能够适当地读取和更新它。

LangGraph实现了内置的持久层，这使得它非常适合支持多轮对话的聊天应用。

将我们的聊天模型包装在一个最小的LangGraph应用中，可以让我们自动持久化消息历史，从而简化多轮应用的开发。

LangGraph带有一个简单的内存检查点器，我们在下面使用它。查看其文档以获取更多细节，包括如何使用不同的持久化后端（例如，SQLite或Postgres）。

要详细了解如何管理消息历史，请参阅How to add message history (memory)指南。

 1from typing import Sequence
 2
 3from langchain_core.messages import BaseMessage
 4from langgraph.checkpoint.memory import MemorySaver
 5from langgraph.graph import START, StateGraph
 6from langgraph.graph.message import add_messages
 7from typing_extensions import Annotated, TypedDict
 8
 9
10# We define a dict representing the state of the application.
11# This state has the same input and output keys as `rag_chain`.
12class State(TypedDict):
13    input: str
14    chat_history: Annotated[Sequence[BaseMessage], add_messages]
15    context: str
16    answer: str
17
18
19# We then define a simple node that runs the `rag_chain`.
20# The `return` values of the node update the graph state, so here we just
21# update the chat history with the input message and response.
22def call_model(state: State):
23    response = rag_chain.invoke(state)
24    return {
25        "chat_history": [
26            HumanMessage(state["input"]),
27            AIMessage(response["answer"]),
28        ],
29        "context": response["context"],
30        "answer": response["answer"],
31    }
32
33
34# Our graph consists only of one node:
35workflow = StateGraph(state_schema=State)
36workflow.add_edge(START, "model")
37workflow.add_node("model", call_model)
38
39# Finally, we compile the graph with a checkpointer object.
40# This persists the state, in this case in memory.
41memory = MemorySaver()
42app = workflow.compile(checkpointer=memory)

该应用程序开箱即用地支持多个对话线程。我们传入一个配置字典，指定一个线程的唯一标识符，以控制哪个线程运行。这使得该应用程序能够支持与多个用户的交互。

1config = {"configurable": {"thread_id": "abc123"}}
2
3result = app.invoke(
4    {"input": "What is Task Decomposition?"},
5    config=config,
6)
7print(result["answer"])

1Task decomposition is the process of breaking down a complicated task into smaller, manageable steps. It often involves techniques like Chain of Thought (CoT), where a model is prompted to think step-by-step to simplify complex tasks. This approach enhances model performance by making it easier to tackle each subtask individually.

1result = app.invoke(
2    {"input": "What is one way of doing it?"},
3    config=config,
4)
5print(result["answer"])

1One way of doing task decomposition is by using simple prompting, such as asking the model, "What are the subgoals for achieving XYZ?" This method encourages the model to identify and outline the smaller tasks needed to accomplish the larger goal.

会话历史可以通过应用程序的状态查看。

1chat_history = app.get_state(config).values["chat_history"]
2for message in chat_history:
3    message.pretty_print()

 1================================[1m Human Message [0m=================================
 2
 3What is Task Decomposition?
 4==================================[1m Ai Message [0m==================================
 5
 6Task decomposition is the process of breaking down a complicated task into smaller, more manageable steps. Techniques like Chain of Thought (CoT) and Tree of Thoughts enhance this process by guiding models to think step by step and explore multiple reasoning possibilities. This approach helps in simplifying complex tasks and provides insight into the model's reasoning.
 7================================[1m Human Message [0m=================================
 8
 9What is one way of doing it?
10==================================[1m Ai Message [0m==================================
11
12One way of doing task decomposition is by using simple prompting, such as asking the model, "What are the subgoals for achieving XYZ?" This method encourages the model to identify and outline the smaller tasks needed to accomplish the larger goal.

将其整合在一起 #

为了方便，我们将所有必要的步骤整合在一个代码单元中。

  1from typing import Sequence
  2
  3import bs4
  4from langchain.chains import create_history_aware_retriever, create_retrieval_chain
  5from langchain.chains.combine_documents import create_stuff_documents_chain
  6from langchain_community.document_loaders import WebBaseLoader
  7from langchain_core.messages import AIMessage, BaseMessage, HumanMessage
  8from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
  9from langchain_core.vectorstores import InMemoryVectorStore
 10from langchain_openai import ChatOpenAI, OpenAIEmbeddings
 11from langchain_text_splitters import RecursiveCharacterTextSplitter
 12from langgraph.checkpoint.memory import MemorySaver
 13from langgraph.graph import START, StateGraph
 14from langgraph.graph.message import add_messages
 15from typing_extensions import Annotated, TypedDict
 16
 17llm = ChatOpenAI(model=MODEL_NAME, temperature=0)
 18
 19
 20### Construct retriever ###
 21loader = WebBaseLoader(
 22    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
 23    bs_kwargs=dict(
 24        parse_only=bs4.SoupStrainer(
 25            class_=("post-content", "post-title", "post-header")
 26        )
 27    ),
 28)
 29docs = loader.load()
 30
 31text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
 32splits = text_splitter.split_documents(docs)
 33vectorstore = InMemoryVectorStore.from_documents(
 34    documents=splits, embedding=OpenAIEmbeddings(model=EMBEDDING_MODEL_NAME)
 35)
 36retriever = vectorstore.as_retriever()
 37
 38
 39### Contextualize question ###
 40contextualize_q_system_prompt = (
 41    "Given a chat history and the latest user question "
 42    "which might reference context in the chat history, "
 43    "formulate a standalone question which can be understood "
 44    "without the chat history. Do NOT answer the question, "
 45    "just reformulate it if needed and otherwise return it as is."
 46)
 47contextualize_q_prompt = ChatPromptTemplate.from_messages(
 48    [
 49        ("system", contextualize_q_system_prompt),
 50        MessagesPlaceholder("chat_history"),
 51        ("human", "{input}"),
 52    ]
 53)
 54history_aware_retriever = create_history_aware_retriever(
 55    llm, retriever, contextualize_q_prompt
 56)
 57
 58
 59### Answer question ###
 60system_prompt = (
 61    "You are an assistant for question-answering tasks. "
 62    "Use the following pieces of retrieved context to answer "
 63    "the question. If you don't know the answer, say that you "
 64    "don't know. Use three sentences maximum and keep the "
 65    "answer concise."
 66    "\n\n"
 67    "{context}"
 68)
 69qa_prompt = ChatPromptTemplate.from_messages(
 70    [
 71        ("system", system_prompt),
 72        MessagesPlaceholder("chat_history"),
 73        ("human", "{input}"),
 74    ]
 75)
 76question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
 77
 78rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
 79
 80
 81### Statefully manage chat history ###
 82class State(TypedDict):
 83    input: str
 84    chat_history: Annotated[Sequence[BaseMessage], add_messages]
 85    context: str
 86    answer: str
 87
 88
 89def call_model(state: State):
 90    response = rag_chain.invoke(state)
 91    return {
 92        "chat_history": [
 93            HumanMessage(state["input"]),
 94            AIMessage(response["answer"]),
 95        ],
 96        "context": response["context"],
 97        "answer": response["answer"],
 98    }
 99
100
101workflow = StateGraph(state_schema=State)
102workflow.add_edge(START, "model")
103workflow.add_node("model", call_model)
104
105memory = MemorySaver()
106app = workflow.compile(checkpointer=memory)

1config = {"configurable": {"thread_id": "abc123"}}
2
3result = app.invoke(
4    {"input": "What is Task Decomposition?"},
5    config=config,
6)
7print(result["answer"])

1Task decomposition is the process of breaking down a complicated task into smaller, more manageable steps. Techniques like Chain of Thought (CoT) and Tree of Thoughts enhance this process by guiding models to think step by step and explore multiple reasoning possibilities. This approach helps in simplifying complex tasks and improving the model's performance.

1result = app.invoke(
2    {"input": "What is one way of doing it?"},
3    config=config,
4)
5print(result["answer"])

1One way of doing task decomposition is by using simple prompting, such as asking the model, "What are the subgoals for achieving XYZ?" This method encourages the model to identify and outline the smaller steps needed to complete the larger task.

Agents #

代理利用大型语言模型（LLMs）的推理能力在执行过程中做出决策。使用代理可以让你卸下一些检索过程中的决策负担。虽然它们的行为比链式结构更难预测，但在这种情况下它们提供了一些优势：

代理直接生成检索器的输入，不一定需要我们像上面那样明确地构建语境化；
代理可以执行多个检索步骤来响应一个查询，或者完全避免执行检索步骤（例如，在响应用户的一般问候时）。

检索工具(Retrieval tool) #

代理可以访问"工具"并管理它们的执行。在这种情况下，我们将把我们的检索器转换为一个LangChain工具，供代理使用：

1from langchain_core.tools.retriever import  create_retriever_tool
2
3tool = create_retriever_tool(
4    retriever,
5    "blog_post_retriever",
6    "Searches and returns excerpts from the Autonomous Agents blog post."
7)
8tools = [tool]

工具是LangChain的可运行对象(Runnable)，并实现了通常的接口：

1tool.invoke("task decomposition")

1'Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\n\nFig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\n\n(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user\'s request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\n\nFig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\nThe system comprises of 4 stages:\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\nInstruction:'

代理构造器 #

现在我们已经定义了工具和LLM，我们可以创建代理了。我们将使用LangGraph来构建代理。目前我们使用高级接口来构建代理，但LangGraph的好处是，这个高级接口背后有一个低级的、高度可控的API，以防你想修改代理逻辑。

1from langgraph.prebuilt import  create_react_agent
2agent_executor = create_react_agent(llm, tools)

我们现在可以试用一下了。请注意，到目前为止它还不是有状态的（我们仍需要添加到内存中）。

1query = "What is Task Decomposition?"
2
3for event in agent_executor.stream(
4    {"messages": [HumanMessage(content=query)]},
5    stream_mode="values",
6):
7    event["messages"][-1].pretty_print()

 1================================[1m Human Message [0m=================================
 2
 3What is Task Decomposition?
 4==================================[1m Ai Message [0m==================================
 5Tool Calls:
 6  blog_post_retriever (call_WKHdiejvg4In982Hr3EympuI)
 7 Call ID: call_WKHdiejvg4In982Hr3EympuI
 8  Args:
 9    query: Task Decomposition
10=================================[1m Tool Message [0m=================================
11Name: blog_post_retriever
12
13Fig. 1. Overview of a LLM-powered autonomous agent system.
14Component One: Planning#
15A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
16Task Decomposition#
17Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
18
19Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
20Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
21
22(3) Task execution: Expert models execute on the specific tasks and log results.
23Instruction:
24
25With the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.
26
27Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)
28The system comprises of 4 stages:
29(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.
30Instruction:
31==================================[1m Ai Message [0m==================================
32
33Task Decomposition is a process used in complex problem-solving where a larger task is broken down into smaller, more manageable sub-tasks. This approach enhances the ability of models, particularly large language models (LLMs), to handle intricate tasks by allowing them to think step by step.
34
35There are several methods for task decomposition:
36
371. **Chain of Thought (CoT)**: This technique encourages the model to articulate its reasoning process by thinking through the task in a sequential manner. It transforms a big task into smaller, manageable steps, which also provides insight into the model's thought process.
38
392. **Tree of Thoughts**: An extension of CoT, this method explores multiple reasoning possibilities at each step. It decomposes the problem into various thought steps and generates multiple thoughts for each step, creating a tree structure. The evaluation of each state can be done using breadth-first search (BFS) or depth-first search (DFS).
40
413. **Prompting Techniques**: Task decomposition can be achieved through simple prompts like "Steps for XYZ" or "What are the subgoals for achieving XYZ?" Additionally, task-specific instructions can guide the model, such as asking it to "Write a story outline" for creative tasks.
42
434. **Human Inputs**: In some cases, human guidance can be used to assist in breaking down tasks.
44
45Overall, task decomposition is a crucial component in planning and executing complex tasks, allowing for better organization and clarity in the problem-solving process.

我们可以再次利用LangGraph内置的持久化功能来保存对内存的有状态更新：

1from langgraph.checkpoint.memory import MemorySaver
2
3memory = MemorySaver()
4
5agent_executor = create_react_agent(llm, tools, checkpointer=memory)

这就是构建一个对话式RAG（检索增强生成）代理所需的全部内容。

让我们观察它的行为。请注意，如果我们输入一个不需要检索步骤的查询，代理就不会执行检索：

1config = {"configurable": {"thread_id": "abc123"}}
2
3for event in agent_executor.stream(
4    {"messages": [HumanMessage(content="Hi! I'm bob")]},
5    config=config,
6    stream_mode="values",
7):
8    event["messages"][-1].pretty_print()

1================================[1m Human Message [0m=================================
2
3Hi! I'm bob
4==================================[1m Ai Message [0m==================================
5
6Hello Bob! How can I assist you today?

此外，如果我们输入一个确实需要检索步骤的查询，代理会为工具生成输入：

1query = "What is Task Decomposition?"
2
3for event in agent_executor.stream(
4    {"messages": [HumanMessage(content=query)]},
5    config=config,
6    stream_mode="values",
7):
8    event["messages"][-1].pretty_print()

 1================================[1m Human Message [0m=================================
 2
 3What is Task Decomposition?
 4==================================[1m Ai Message [0m==================================
 5Tool Calls:
 6  blog_post_retriever (call_0rhrUJiHkoOQxwqCpKTkSkiu)
 7 Call ID: call_0rhrUJiHkoOQxwqCpKTkSkiu
 8  Args:
 9    query: Task Decomposition
10=================================[1m Tool Message [0m=================================
11Name: blog_post_retriever
12
13Fig. 1. Overview of a LLM-powered autonomous agent system.
14Component One: Planning#
15A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
16Task Decomposition#
17Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
18
19Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
20Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
21
22(3) Task execution: Expert models execute on the specific tasks and log results.
23Instruction:
24
25With the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.
26
27Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)
28The system comprises of 4 stages:
29(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.
30Instruction:
31==================================[1m Ai Message [0m==================================
32
33Task Decomposition is a technique used to break down complex tasks into smaller, more manageable steps. This approach is particularly useful in the context of autonomous agents and large language models (LLMs). Here are some key points about Task Decomposition:
34
351. **Chain of Thought (CoT)**: This is a prompting technique that encourages the model to "think step by step." By doing so, it can utilize more computational resources to decompose difficult tasks into simpler ones, making them easier to handle.
36
372. **Tree of Thoughts**: An extension of CoT, this method explores multiple reasoning possibilities at each step. It decomposes a problem into various thought steps and generates multiple thoughts for each step, creating a tree structure. This can be evaluated using search methods like breadth-first search (BFS) or depth-first search (DFS).
38
393. **Methods of Decomposition**: Task decomposition can be achieved through:
40   - Simple prompting (e.g., asking for steps to achieve a goal).
41   - Task-specific instructions (e.g., requesting a story outline for writing).
42   - Human inputs to guide the decomposition process.
43
444. **Execution**: After decomposition, expert models execute the specific tasks and log the results, allowing for a structured approach to complex problem-solving.
45
46Overall, Task Decomposition enhances the model's ability to tackle intricate tasks by breaking them down into simpler, actionable components.

上面的例子中，代理没有直接将我们的查询原封不动地插入工具，而是去掉了像"what"和"is"这样的不必要词语。这一原则也允许代理在必要时利用对话的上下文：

1query = "What according to the blog post are common ways of doing it? redo the search"
2
3for event in agent_executor.stream(
4    {"messages": [HumanMessage(content=query)]},
5    config=config,
6    stream_mode="values",
7):
8    event["messages"][-1].pretty_print()

 1================================[1m Human Message [0m=================================
 2
 3What according to the blog post are common ways of doing it? redo the search
 4==================================[1m Ai Message [0m==================================
 5Tool Calls:
 6  blog_post_retriever (call_bZRDF6Xr0QdurM9LItM8cN7a)
 7 Call ID: call_bZRDF6Xr0QdurM9LItM8cN7a
 8  Args:
 9    query: common ways of Task Decomposition
10=================================[1m Tool Message [0m=================================
11Name: blog_post_retriever
12
13Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
14Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
15
16Fig. 1. Overview of a LLM-powered autonomous agent system.
17Component One: Planning#
18A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
19Task Decomposition#
20Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
21
22Resources:
231. Internet access for searches and information gathering.
242. Long Term memory management.
253. GPT-3.5 powered Agents for delegation of simple tasks.
264. File output.
27
28Performance Evaluation:
291. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
302. Constructively self-criticize your big-picture behavior constantly.
313. Reflect on past decisions and strategies to refine your approach.
324. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.
33
34(3) Task execution: Expert models execute on the specific tasks and log results.
35Instruction:
36
37With the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.
38==================================[1m Ai Message [0m==================================
39
40According to the blog post, common ways to perform Task Decomposition include:
41
421. **Simple Prompting**: Using straightforward prompts such as "Steps for XYZ.\n1." or "What are the subgoals for achieving XYZ?" to guide the model in breaking down the task.
43
442. **Task-Specific Instructions**: Providing specific instructions tailored to the task at hand, such as asking for a "story outline" when writing a novel.
45
463. **Human Inputs**: Involving human guidance or input to assist in the decomposition process, allowing for a more nuanced understanding of the task requirements.
47
48These methods help in transforming complex tasks into smaller, manageable components, facilitating better planning and execution.

请注意，代理能够推断出我们查询中的"it"指的是"task decomposition"，并因此生成了一个合理的搜索查询——在这个例子中是"common ways of task decomposition"。

将其整合在一起 #

为了方便起见，我们将所有必要的步骤合并到一个代码单元中：

 1import bs4
 2from langchain.tools.retriever import create_retriever_tool
 3from langchain_community.document_loaders import WebBaseLoader
 4from langchain_core.vectorstores import InMemoryVectorStore
 5from langchain_openai import ChatOpenAI, OpenAIEmbeddings
 6from langchain_text_splitters import RecursiveCharacterTextSplitter
 7from langgraph.checkpoint.memory import MemorySaver
 8from langgraph.prebuilt import create_react_agent
 9
10memory = MemorySaver()
11llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
12
13
14### Construct retriever ###
15loader = WebBaseLoader(
16    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
17    bs_kwargs=dict(
18        parse_only=bs4.SoupStrainer(
19            class_=("post-content", "post-title", "post-header")
20        )
21    ),
22)
23docs = loader.load()
24
25text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
26splits = text_splitter.split_documents(docs)
27vectorstore = InMemoryVectorStore.from_documents(
28    documents=splits, embedding=OpenAIEmbeddings(model=EMBEDDING_MODEL_NAME)
29)
30retriever = vectorstore.as_retriever()
31
32
33### Build retriever tool ###
34tool = create_retriever_tool(
35    retriever,
36    "blog_post_retriever",
37    "Searches and returns excerpts from the Autonomous Agents blog post.",
38)
39tools = [tool]
40
41
42agent_executor = create_react_agent(llm, tools, checkpointer=memory)

下一步 #

我们已经介绍了构建基本对话式问答应用程序的步骤：

我们使用链来构建一个可预测的应用程序，为每个用户输入生成搜索查询；
我们使用代理来构建一个应用程序，该应用程序“决定”何时以及如何生成搜索查询。

要探索不同类型的检索器和检索策略，请访问检索器部分的入门指南。

有关LangChain的对话memory抽象的详细演练，请访问如何添加消息历史(记忆)指南。要了解更多关于代理的信息，请前往代理模块。