网站首页 > 博客文章正文

LangChainV0.2-进阶教程:构建一个RAG应用

baijin 2024-12-18 14:42:03 博客文章 9 ℃ 0 评论

LLMs能提供的最重要的应用就是问答机器人。问答机器人能回答专用领域的问题。其使用的技术就是RAG。

本课程将向你展示如何基于一个文本数据库构建一个简单的问答应用。我们将认真的探讨如何构建一个传统的问答系统，并重点讲解如何增强传统问答系统的能力。同时，我们也会看到LangSmith是如何帮助欠产跟踪和了解我们的应用的。当我们的应用越来越复杂时，我们发现LangSmith会越来越帮到我们。

什么是RAG？

RAG就是利用额外的数据增强LLM的样本库，使其获得更广泛的知识面。

LLMs可以在更广泛的领域进行推理，但其推理能力是基于某一特定时间点前的公开数据的。如果你想构建的AI应用，需要用到私有数据，或新近的数据时，你就需要增强模型的样本库了。实现这一点，就需要使用到RAG技术。

LangCahin拥有众多的组件来构建问答应用，使得问答应用更好用。

注意：本课程聚焦的问答应用是非结构化数据。

概念

一个典型的RAG应用包含两个主要的组件：

Indexing索引：是指将数据从源头加载进来并对数据建立索引的过程。

Retrieval and generation检索和生成：这才是真正的RAG工作流，实时的将用户的请求和检索器返回的相关数据，喂给模型。

索引创建过程

加载：首先我们通过DocumentLoaders加载数据。
拆分：然后使用Text splitters将大文档拆分成小块。拆分成小块有利用数据的索引和传递。因为大块的数据很难进行检索，也有可能会超过模型的输入上限。
存储：我们需有在某个地方存储数据及数据，以便稍后进行查询。这通常是由矢量数据库和向量化模型来处理的。

检索和生成

检索：是指获得用户的输入后，通过检索器获取相关的数据。
生成：ChatModel/LLM使用获得的用户问题和检索器返回的数据生成的提示词来回答用户的问题。

设置

Jupyter Notebook

本教程（及大部分教程）都是使用的Jupyter notebooks，并预先认为您也会使用。Jupyter notebooks非常适用来学习LLM系统或作为一个原型构建工具，因为我们在学习或开发应用过程中将会碰到很多异常情况（比如，不正确的输出，API挂掉了），使用Jupyter notebooks这种一步一步的、交互式的工具，可以让你迅速调试并学习。

Jupyter Notebook的安装和配置问题，请自行了解。

LangSmith

不再对LangSmith的安装和使用进行说明，前面有提到过。

预览

在本章中，我们会使用一篇在线的博文来构建问答应用，这篇博章讲述的是LLM Powered Autonomous Agents 智能体，我们会针对该博客的内容提出一些问题。

我们可以构建一个索引过程和RAG工作流来实现这一点，大约 20行代码就能实现。

详情

让我们来看一下上面的代码是如何一步一步运行的。

1. 索引过程：加载

首先，我们需要将博客的内容加载进来。这里使用的是DocumentLoaders，该方法返回的是一个“对象”，该“对象”会从一个数据源里加载数据进来，并返回文档列表。而一个文档也是一个“对象”，该“对象”包含一个文本格式的页面内容和元数据（字典格式）。

在本课程中，我们使用WebBaseLoader来加载数据，该方法使用urllib库将网页地址里的HTML加载进来，并使用BeautifulSoup将其解析成为文本。我们可以使用BeautifulSoup的bs_kwargs参数来定制从HTML 到文本的解析方式。在本课程中，我们只会解析HTML中样式标签为“post-content”, “post-title”, or “post-header”里的内容，其它的则不需要了。

下面是示例，最终返回的是文档 1的文本长度。

import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
   web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
   bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

len(docs[0].page_content)

打印文档 1的前 500个字；

print(docs[0].page_content[:500])

    LLM Powered Autonomous Agents
   
Date: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In

2. 索引过程：拆分

我们加载的文档已经超过 42k字符长度。这么长的文本是不适合作为很多模型的输入的。即使有些模型可以接收42k的长文本，该模型也将很难寻找从中寻找到有用的信息。

为解决这一点，我们会将 42k的文档进行切分变成大小相同的块，然后对这些块进行向量化和矢量存储。这将帮助我们从博客中检索到最相关的内容。

本课和中，我们将文档切分成每块 1000个字符，每两个块中将有 200个字符是重复的。重复的部分会帮助我们将分离的块关联起来。我们使用RecursiveCharacterTextSplitter方法来实现拆分，该方法能准确的将文档按照常用的分隔方式进行拆分，并尽量保证每个块的大小是一致的。该方法是我们推荐使用的文本拆分的方法。

我们设置 add_start_index=True，这样，在初始文档中会记录每个块的起始索引。

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

len(all_splits)

返回拆分的块数量

查询第一个块的长度

len(all_splits[0].page_content)

返回第一个块的长度

查询第11个块的元数据

all_splits[10].metadata

返回的是两个参数：来源和本块的起始索引。

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'start_index': 7056}

3. 索引过程：存储

现在，我们要为刚才创建好索引的 66个文本块进行存储了。常见的方法是对文本块进行向量化，然后将向量数值放入到矢量数据库里。当我们想要查询这些块时，我们需要输入一个文本查询请求，然后对该请求向量化，再使用“相似性”查询找出最相似的语言本块。最简单的相似性计算方法是余弦相似性— 在高维矢量空间里计算两个向量之间角的余弦值。

我们使用Chroma和OpenAIEmbeddings模型来完成所有文档块的向量化和存储。

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

4. 检索和生成

现在，我们开始编写应用的实际处理逻辑。我们需要构建的是这样一个简单应用：获取用户提出的问题，对相关文档进行检索，将检索到的文档（块）和初始的用户提问传递给模型，等待模型给予答复。

首先，我们需要编写查询文档（块）的逻辑。LangChain的Retriever接口对矢量数据库的索引进行了封装，可以向该接口传递一个文本字符串的请求，它会返回一些相关的文档（块）

最常见的检索器类型是VectorStroeRetriever，它使用的是矢量数据库的相似性查询能力。任何VectorStore矢量存储对象，在使用VectorStore.as_retriever()方法可以轻易的转换为一个检索器。以下示例创建了检索器，向检索器提问"What are the approaches to Task Decomposition?"，然后计算检索器返回的document（块）的长度；

retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")

len(retrieved_docs)

返回的长度是 6，应该就是 6个文档块；

打印，第一个文档块的page_content的内容；

print(retrieved_docs[0].page_content)

返回page_content的内容；

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

5. 检索和生成：生成

让我们把上面的内容放在一个工作流里，然后向这个工作流提问，获取相关的文档，然后构建提示词，再将提示词传递给模型，最后由输出解析器对模型返回的结果进行重新整理。

这里我们使用的是GPT-3.5-turbo，但你也可以选择其它聊天类模型。

我们会用到LCEL可运行协议来定义工作流，遵义该协议的工作流会允许我们：

以透明的方式将所有组件和函数串连起来；
能在LangSmith里自动跟踪工作流日志；
能立即使用流式、异步、和批量调用

见下面的代码：定义了一个函数format_docs，将检索器返回的文档（块）进行了简单的合并；构建了一个rag_chain工作流，最后调用流式的方法将模型返回的结果实时输出。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

for chunk in rag_chain.stream("What is Task Decomposition?"):
    print(chunk, end="", flush=True)

以下是流式输出的结果。

Task Decomposition is a process where a complex task is broken down into smaller, more manageable steps or parts. This is often done using techniques like "Chain of Thought" or "Tree of Thoughts", which instruct a model to "think step by step" and transform large tasks into multiple simple tasks. Task decomposition can be prompted in a model, guided by task-specific instructions, or influenced by human inputs.

让我们来仔细研究一下LCEL，这样我们会理解上面的代码是怎么运行的。

首先：每一个组件（retriever, prompt, llm, 等）都是“Runnable可运行”的一个实例。（备注：Runnable即是一种protocal协议规范，又是一个类），支持该协议“协议”的实例都能调用相同的方法-- 比如sync同步 and async异步 .invoke单次调用, .stream流式调用, 或.batch 批量调用-- 因为这些“实例”都使用相同的方法，因此就能将他们连接起来。他们通过连接符“｜”被连接成为一个RunnableSequence可运行队列 -- RunnableSequence也是“Runnable可运行”的实例。

LangChain在看到了连接符“｜”后，会自动的将某个“对象”转换为“可运行”的。这里，函数format_docs使用RunnableLambda方法实现的“可运行”转换，"context" 和 "question"这两个字典数据，将被转换为RunnableParallel（也就是这两个任务将并行运行）。这里需要特别注意的是工作流中的每个对象都是“可运行”的，如果要了解更多细节，可在后续慢慢学习。

让我们来看一下输入的问题是如何在工作流里进行流转的。

依据上面的代码，我们知道，要构建提示词，我们需要向提示词模板里传递两个变量的值，分别是："context" 和 "question"。因此，工作流的第一个元素就是构建一个可运行的“对象”，使之能通过用户输入的问题得到这两个变量的值；

retriever | format_docs：实现将用户的提问传递给检索器，检索器返回相关文档（块），接着format_docs对返回的相关文档进行重新整理，将list转换为strings。
RunnablePassthrough() ：将用户输入的问题直接传递给变量question，不做修改。

如下实现用户输入到提示词的步骤；

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
)

调用上面的chain.invoke(question)时，我们将得到一个整理好的提示词，等待提交至模型进行分析推理（注意：当使用LCEL来开发时，我们可以对这样的sub-chains子工作流进行测试验证。

工作流的下一步就是llm，它的作用是进行推理，然后是StrOutputParser()简单的输出解析器，其作用就是人llm返回的消息里将content这个变量的值提取出来。

你可以通过LangSmith分析工作流中每一个单独的步骤。

内建工作流

LangChain提供了更加便捷的函数可以将一些组件或函数转换为遵循LCEL

create_stuff_documents_chain：该方法实现将生成的文档内容传递给提示词模板和LLM的功能。在本课程里，我们将一些内容提交给了提示词模板-- 而这些内容没有做任何处理，通常对大文档来说，可以使用“摘要”提取功能进行缩减，或对大文档进行其它转换处理。该方法直接实现在我们在上面代码中的rag_chain的功能，只需要向他传递两个变量： context 和 input即可。
create_retrieval_chain：该方法将添加一个retrieval检索步骤，并向检索结果传递给工作流。调用时，你只需要输入变量input的值，它本身定义了三个变量： input、context和answer，其中answer作为输出变量使用。

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)


question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

response = rag_chain.invoke({"input": "What is Task Decomposition?"})
print(response["answer"])

Task Decomposition is a process in which complex tasks are broken down into smaller and simpler steps. Techniques like Chain of Thought (CoT) and Tree of Thoughts are used to enhance model performance on these tasks. The CoT method instructs the model to think step by step, decomposing hard tasks into manageable ones, while Tree of Thoughts extends CoT by exploring multiple reasoning possibilities at each step, creating a tree structure of thoughts.

返回文档的来源

通常，在Q&A应用中，需要向用户展示返回结果的来源，如来源是哪篇博文，哪本书，网址。langCahin有一个内建的方法create_retrieval_chain，文档的来源会在输出结果中的Key-“context”里找到。

for document in response["context"]:
    print(document)
    print()

在下面返回的document里，我们看到context又有两个key，source在第二个key-metadata元数据里。

page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}

page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1585}

page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 2192}

page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}

page_content='Resources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}

page_content='Resources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 29630}

制作提示词

如上面所示，我们能从prompt hub中加载提示词（如，this RAG prompt）。当然，你可以自己来制作提示词。如下 template是提示词模板，里面由两个参数组成：context，question。question是指用户的rlwty，context是更多的“上下文”。定制化的提示词custom_rag_prompt通过提示词模板的from_template从模板生成的方法来生成提示词。

from langchain_core.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")

上一篇： Python应用短文，如何自制一个简易的网络爬虫
下一篇：一个Python编写的小说下载器用python写小说

网站首页 > 博客文章正文

LangChainV0.2-进阶教程:构建一个RAG应用

什么是RAG？

概念

索引创建过程

检索和生成

设置

Jupyter Notebook

LangSmith

预览

详情

1. 索引过程：加载

2. 索引过程：拆分

3. 索引过程：存储

4. 检索和生成

5. 检索和生成：生成

内建工作流

返回文档的来源

制作提示词

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎你发表评论:

网站首页 > 博客文章 正文

LangChainV0.2-进阶教程:构建一个RAG应用

什么是RAG？

概念

索引创建过程

检索和生成

设置

Jupyter Notebook

LangSmith

预览

详情

1. 索引过程：加载

2. 索引过程：拆分

3. 索引过程：存储

4. 检索和生成

5. 检索和生成：生成

内建工作流

返回文档的来源

制作提示词

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎 你 发表评论:

网站首页 > 博客文章正文

取消回复欢迎你发表评论: