diff --git a/ch-xorbits/xinference.ipynb b/ch-xorbits/xinference.ipynb
new file mode 100644
index 0000000..0a2a9c8
--- /dev/null
+++ b/ch-xorbits/xinference.ipynb
@@ -0,0 +1,936 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "(sec-xinference)=\n",
+    "# Xinference\n",
+    "\n",
+    "Xorbits Inference (Xinference) 是一款面向大模型的推理平台，支持大语言模型、向量模型、文生图模型等。它底层基于 [Xoscar](https://github.com/xorbitsai/xoscar) 提供的分布式能力，使得模型可以在集群上部署，上层提供了类 OpenAI 的接口，用户可以在上面部署和调用开源大模型。Xinference 将对外服务的 API、推理引擎和硬件做了集成，不需要像 Ray Serve 编写代码来管理模型推理服务。\n",
+    "\n",
+    "## 推理引擎\n",
+    "\n",
+    "Xinference 可适配不同推理引擎，包括 Hugging Face Transformers、[vLLM](https://github.com/vllm-project/vllm)、[llama.cpp](https://github.com/ggerganov/llama.cpp) 等，因此在安装时也要安装对应的推理引擎，比如 `pip install \"xinference[transformers]\"`。Transformers 完全基于 PyTorch，适配的模型最快最全，但性能较差；其他推理引擎，比如 vLLM、llama.cpp 专注于性能优化，但模型覆盖度没 Transformers 高。\n",
+    "\n",
+    "## 集群\n",
+    "\n",
+    "使用之前需要先启动一个 Xinference 推理集群，可以是单机多卡，也可以是多机多卡。单机上可以在命令行里这样启动："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "shellscript"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "xinference-local --host 0.0.0.0 --port 9997"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "集群场景与 Xorbits Data 类似，先启动一个 Supervisor，再启动 Worker："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "shellscript"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# 启动 Supervisor\n",
+    "xinference-supervisor -H <supervisor_ip>\n",
+    "\n",
+    "# 启动 Worker\n",
+    "xinference-worker -e \"http://<supervisor_ip>:9997\" -H <worker_ip>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "之后就可以在 http://<supervisor_ip>:9997 访问 Xinference 服务。\n",
+    "\n",
+    "## 使用模型\n",
+    "\n",
+    "Xinference 可以管理模型部署的整个生命周期：启动模型、使用模型、关闭模型。\n",
+    "\n",
+    "启动 Xinference 服务后，我们就可以启动模型并调用模型，Xinference 集成了大量开源模型，用户可以在网页中选择一个启动，Xinference 会在后台下载并启动这个模型。每个启动的模型既有网页版对话界面，又兼容 OpenAI 的 API。比如，使用 OpenAI 的 API 与某个模型交互："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI\n",
+    "client = OpenAI(base_url=\"http://127.0.0.1:9997/v1\", api_key=\"not used actually\")\n",
+    "\n",
+    "response = client.chat.completions.create(\n",
+    "    model=\"my-llama-2\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
+    "        {\"role\": \"user\", \"content\": \"What is the largest animal?\"}\n",
+    "    ]\n",
+    ")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 案例\n",
+    "下面我们将介绍2个 Xinference 实际应用案例。\n",
+    "### 案例：使用通义千问（Qwen）进行简单文本生成与对话"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "在开始之前，首先确保安装以下依赖包："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install xinference[transformers] openai"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "首先我们启动 Xinference 的本地实例。请使用以下命令在后台运行 Xinference:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "vscode": {
+     "languageId": "shellscript"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Service is not running, starting service.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%bash\n",
+    "if ps ax | grep -v grep | grep \"xinference-local\" > /dev/null\n",
+    "then\n",
+    "    echo \"Service is already running, exiting.\"\n",
+    "else\n",
+    "    echo \"Service is not running, starting service.\"\n",
+    "    nohup xinference-local --host 0.0.0.0 --port 9997 > xinference.log 2>&1 &\n",
+    "fi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Xinference 默认的主机和 IP 地址分别为127.0.0.1 和 9997。\n",
+    "\n",
+    "接下来通过以下命令来启动模型。其中`size-in-billions`参数对应使用的参数规模，当前 qwen-chat 开源模型的参数规模为18亿（1.8B）、70亿（7B）、140亿（14B）和720亿（72B）。在本次案例中，我们尝试使用 Qwen-7B 。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Launch model name: qwen-chat with kwargs: {}\n",
+      "Model uid: my-llm\n"
+     ]
+    }
+   ],
+   "source": [
+    "!xinference launch \\\n",
+    "  --model-uid my-llm \\\n",
+    "  --model-name qwen-chat \\\n",
+    "  --size-in-billions 7 \\\n",
+    "  --model-format pytorch \\\n",
+    "  --model-engine transformers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "第一次启动模型时，Xinference 将自动下载该模型的参数，这可能需要一定时间。\n",
+    "\n",
+    "由于 Xinference 提供了与 OpenAI 兼容的 API，所以可以将 Xinference 运行的模型当成 OpenAI 的本地替代。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import openai\n",
+    "\n",
+    "client = openai.Client(api_key=\"cannot be empty\", base_url=\"http://127.0.0.1:9997/v1\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "接下来我们演示借助 Xinference 与 OpenAi 的 API 如何轻松调用大模型进行文本生成和上下文对话。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Completion API\n",
+    "首先我们使用 `client.completions.create` 进行简单的文本生成。Completion API 用来通过一段文本（也叫 prompt）进行文本生成。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[temperature: 0.7 | top_p: 0.9]\n",
+      "通义千问，智慧之海，\n",
+      "回答如流，无尽的探索。\n",
+      "通义千问，人间瑰宝。<|im_end|>\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "def complete_and_print(\n",
+    "    prompt, temperature=0.7, top_p=0.9, client=client, model=\"my-llm\"\n",
+    "):\n",
+    "    response = (\n",
+    "        client.completions.create(\n",
+    "            model=model, prompt=prompt, top_p=top_p, temperature=temperature\n",
+    "        )\n",
+    "        .choices[0]\n",
+    "        .text\n",
+    "    )\n",
+    "\n",
+    "    print(f\"[temperature: {temperature} | top_p: {top_p}]\\n{response.strip()}\\n\")\n",
+    "\n",
+    "\n",
+    "prompt = \"写一首关于通义千问的三行俳句诗。\"\n",
+    "complete_and_print(prompt)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "我们可以调整 API 提供的一些参数来影响输出结果的创造力和确定性。\n",
+    "\n",
+    "其中，`top_p` 参数控制着生成文本时所使用词汇范围大小，而 `temperature` 参数则决定了在这个范围内文本生成时是否具有随机性。当温度接近 0 时，则会得到几乎是确定性结果。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[temperature: 0.01 | top_p: 0.01]\n",
+      "通义千问，智慧之源，\n",
+      "回答问题，如诗如画。\n",
+      "机器语言，人类之友。<|im_end|>\n",
+      "\n",
+      "[temperature: 0.01 | top_p: 0.01]\n",
+      "通义千问，智慧之源，\n",
+      "回答问题，如诗如画。\n",
+      "机器语言，人类之友。<|im_end|>\n",
+      "\n",
+      "[temperature: 1.0 | top_p: 1.0]\n",
+      "通义千问通四海，言无不尽寻奥秘。智囊无所不在，问答之间显才智。<|im_end|>\n",
+      "<|im_start|>\n",
+      "\n",
+      "[temperature: 1.0 | top_p: 1.0]\n",
+      "诗中要包含词语\"通义千问\"和\"人工智能\"。\n",
+      "\n",
+      "通义千问问何来？  \n",
+      "人工智能显神威。  \n",
+      "科技引领未来路。\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 前两次生成会非常雷同，\n",
+    "complete_and_print(prompt, temperature=0.01, top_p=0.01)\n",
+    "complete_and_print(prompt, temperature=0.01, top_p=0.01)\n",
+    "\n",
+    "# 后两次生成会不太一样\n",
+    "complete_and_print(prompt, temperature=1.0, top_p=1.0)\n",
+    "complete_and_print(prompt, temperature=1.0, top_p=1.0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Chat Completion API\n",
+    "接下来我们使用 `client.chat.completions.create` 进行简单的上下文对话。\n",
+    "\n",
+    "Chat Completion API 可以为与 LLM 进行交互时提供更加结构化的方式。相比于单纯传递一段文字，我们将一个包含多个结构化信息对象的数组发送给 LLM 作为输入。这样做可以让语言模型在继续回复时有一些\"上下文\"或者\"历史\"可参考。\n",
+    "\n",
+    "通常情况下，每条信息都会有一个角色（`role`）和内容（`content`）：\n",
+    "\n",
+    "- 系统角色（`system`）用来向语言模型传达开发者定义好的核心指令。\n",
+    "- 用户角色（`user`）则代表着用户自己输入或者产生出来的信息。\n",
+    "- 助手角色（`assistant`）则是由语言模型自动生成并回复出来。\n",
+    "\n",
+    "我们先定义结构化的信息："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def assistant(content: str):\n",
+    "    return {\"role\": \"assistant\", \"content\": content}\n",
+    "\n",
+    "\n",
+    "def user(content: str):\n",
+    "    return {\"role\": \"user\", \"content\": content}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "下面尝试使用Chat Completion API："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "==============\n",
+      "assistant: 你最喜欢的颜色是蓝色。\n",
+      "\n",
+      "\n",
+      "==============\n",
+      "assistant: 您的宠物叫毛毛。\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "def chat_complete_and_print(\n",
+    "    messages, temperature=0.7, top_p=0.9, client=client, model=\"my-llm\"\n",
+    "):\n",
+    "    response = (\n",
+    "        client.chat.completions.create(\n",
+    "            model=model, messages=messages, top_p=top_p, temperature=temperature\n",
+    "        )\n",
+    "        .choices[0]\n",
+    "        .message.content\n",
+    "    )\n",
+    "    print(f\"==============\\nassistant: {response}\\n\\n\")\n",
+    "\n",
+    "\n",
+    "chat_complete_and_print(\n",
+    "    messages=[\n",
+    "        user(\"我最喜欢的颜色是蓝色\"),\n",
+    "        assistant(\"听到这个消息真是令人欣喜！\"),\n",
+    "        user(\"我最喜欢的颜色是什么？\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "chat_complete_and_print(\n",
+    "    messages=[\n",
+    "        user(\"我有一只名叫毛毛的小狗\"),\n",
+    "        assistant(\"听到这个消息真棒！毛毛一定很可爱。\"),\n",
+    "        user(\"我的宠物叫什么名字？\"),\n",
+    "    ]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "当然，我们仍可以调整不同的 `temperature` 和 `top_p` ，来展示不同参数如何影响生成内容的随机性和多样性。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "==============\n",
+      "assistant: 钢琴学习可以帮助你提高音乐素养，培养耐心和专注力，增强记忆力，提高创造力，提升自信心，培养良好的节奏感，以及更好地理解音乐理论。\n",
+      "\n",
+      "\n",
+      "==============\n",
+      "assistant: 钢琴学习可以帮助你提高音乐素养，培养耐心和专注力，增强记忆力，提高创造力，提升自信心，培养良好的节奏感，以及更好地理解音乐理论。\n",
+      "\n",
+      "\n",
+      "==============\n",
+      "assistant: 学习钢琴有很多好处，例如它可以帮助你提高音乐素养，培养耐心，增强记忆力，还可以增长知识，帮助你理解节奏和和弦，以及提高审美能力。\n",
+      "\n",
+      "\n",
+      "==============\n",
+      "assistant: 钢琴学习可以帮助提升思维技巧，培养自制力，提高音乐审美，增加自信心，提升技能，练习记忆力，并且可以让你分享自己最喜欢的音乐。\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "messages = [\n",
+    "    user(\"我最近在学习钢琴\"),\n",
+    "    assistant(\"那真是一个很好的爱好！\"),\n",
+    "    user(\"你觉得钢琴学习有什么好处？\"),\n",
+    "]\n",
+    "\n",
+    "\n",
+    "# 比较确定的结果\n",
+    "chat_complete_and_print(messages, temperature=0.1, top_p=0.1)\n",
+    "chat_complete_and_print(messages, temperature=0.1, top_p=0.1)\n",
+    "\n",
+    "# 更随机一些\n",
+    "chat_complete_and_print(messages, temperature=1.0, top_p=1.0)\n",
+    "chat_complete_and_print(messages, temperature=1.0, top_p=1.0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "最后关闭后台运行的 Xinference 实例："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!ps ax | grep xinference-local | grep -v grep | awk '{print $1}' | xargs kill -9"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 案例：基于 LangChain 的文档聊天机器人\n",
+    "该案例将演示如何使用本地 LLM 和 LangChain 模型构建聊天机器人。通过此机器人，用户可以实现简单的文档读取，并根据文档内容进行互动对话。\n",
+    "\n",
+    "首先，我们先安装必要的库。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install xinference[transformers] langchain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "通过以下命令在后台运行 Xinference:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {
+    "vscode": {
+     "languageId": "shellscript"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Service is not running, starting service.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%bash\n",
+    "if ps ax | grep -v grep | grep \"xinference-local\" > /dev/null\n",
+    "then\n",
+    "    echo \"Service is already running, exiting.\"\n",
+    "else\n",
+    "    echo \"Service is not running, starting service.\"\n",
+    "    HF_ENDPOINT=https://hf-mirror.com\n",
+    "    nohup xinference-local --host 0.0.0.0 --port 9997 > xinference.log 2>&1 &\n",
+    "fi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 建立 embedding 模型\n",
+    "我们以马克吐温的《百万英镑》作为案例，先使用 LangChain 读取文档并进行分割。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "new_directory = \"/home/u2022201752/scale-py-zh\"\n",
+    "os.chdir(new_directory)\n",
+    "\n",
+    "from utils import mark_twain\n",
+    "from langchain.document_loaders import PDFMinerLoader\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "\n",
+    "file_path = mark_twain()\n",
+    "loader = PDFMinerLoader(os.path.join(file_path, \"Twain-Million-Pound-Note.pdf\"))\n",
+    "\n",
+    "documents = loader.load()\n",
+    "\n",
+    "text_splitter = RecursiveCharacterTextSplitter(\n",
+    "    chunk_size=512,\n",
+    "    chunk_overlap=100,\n",
+    "    length_function=len,\n",
+    ")\n",
+    "\n",
+    "docs = text_splitter.split_documents(documents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "下面我们需要启动一个 embedding 模型将文档信息向量化："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Launch model name: bge-m3 with kwargs: {}\n",
+      "Model uid: bge-m3\n"
+     ]
+    }
+   ],
+   "source": [
+    "!xinference launch \\\n",
+    "    --model-name \"bge-m3\" \\\n",
+    "    -e \"http://0.0.0.0:9997\" \\\n",
+    "    --model-type embedding"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.embeddings import XinferenceEmbeddings\n",
+    "\n",
+    "xinference_embeddings = XinferenceEmbeddings(\n",
+    "    server_url=\"http://0.0.0.0:9997\",\n",
+    "    model_uid=\"bge-m3\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Milvus 向量数据库建立\n",
+    "接下来我们使用 Milvus 向量数据库来储存 embedding 之后的信息。\n",
+    "\n",
+    "Milvus 向量数据库可以通过以下命令进行安装："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install milvus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "通过以下命令在后台运行 Milvus 数据库："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {
+    "vscode": {
+     "languageId": "shellscript"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Service is not running, starting service.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%bash\n",
+    "if ps ax | grep -v grep | grep \"milvus-server\" > /dev/null\n",
+    "then\n",
+    "    echo \"Service is already running, exiting.\"\n",
+    "else\n",
+    "    echo \"Service is not running, starting service.\"\n",
+    "    nohup milvus-server > milvus.log 2>&1 &\n",
+    "fi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "接下来我们把向量存储至 Milvus 数据库中："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.vectorstores import Milvus\n",
+    "\n",
+    "vector_db = Milvus.from_documents(\n",
+    "    docs,\n",
+    "    xinference_embeddings,\n",
+    "    connection_args={\"host\": \"0.0.0.0\", \"port\": \"19530\"},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "这里我们可以尝试提问对文档进行检索（这里并没有使用 llm 模型，仅返回匹配的字段）："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "in London without a friend, and with no money but that million-pound bank-note, and no way to \n",
+      "account for his being in possession of it. Brother A said he would starve to death; Brother B said \n",
+      "he wouldn't. Brother A said he couldn't offer it at a bank or anywhere else, because he would be \n",
+      "arrested on the spot. So they went on disputing till Brother B said he would bet twenty thousand \n",
+      "pounds that the man would live thirty days, any way, on that million, and keep out of jail, too.\n"
+     ]
+    }
+   ],
+   "source": [
+    "query = \"What did the protagonist do with the million-pound banknote?\"\n",
+    "docs = vector_db.similarity_search(query, k=1)\n",
+    "print(docs[0].page_content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 建立 llm 模型\n",
+    "接下来我们启动一个 llm 模型进行对话。这里我们使用 Xinference 支持的 llama-3-instruct 模型："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Launch model name: llama-3-instruct with kwargs: {}\n",
+      "Model uid: llama-3-instruct\n"
+     ]
+    }
+   ],
+   "source": [
+    "!xinference launch \\\n",
+    "    --model-name \"llama-3-instruct\" \\\n",
+    "    --model-format pytorch \\\n",
+    "    --size-in-billions 8 \\\n",
+    "    -e \"http://0.0.0.0:9997\" \\\n",
+    "    --model-engine transformers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.llms import Xinference\n",
+    "\n",
+    "xinference_llm = Xinference(\n",
+    "    server_url=\"http://0.0.0.0:9997\",\n",
+    "    model_uid = \"llama-3-instruct\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "现在，我们使用 llm 模型和 embedding 向量创建 `ConversationalRetrievalChain`。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.memory import ConversationBufferMemory\n",
+    "from langchain.chains import ConversationalRetrievalChain\n",
+    "\n",
+    "memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n",
+    "\n",
+    "chain = ConversationalRetrievalChain.from_llm(\n",
+    "    llm=xinference_llm, \n",
+    "    retriever=vector_db.as_retriever(), \n",
+    "    memory=memory\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "接下来我们可以从文档中查询信息："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " The protagonist carries the million-pound banknote around, showing it to people and talking about its history, which causes them to laugh. He shares the story with a woman, and she laughs so hard she has trouble catching her breath. The story is likely meant to be humorous and entertaining, but it also highlights the absurdity of the situation.\n",
+      "\n",
+      "People's reactions to the protagonist carrying the million-pound banknote range from confusion to amusement. Many are skeptical and disbelieve his claim, while others are impressed and even intimidated by the large sum of money. The protagonist's storytelling ability and charisma seem to be what ultimately win over the woman, who becomes engaged by his tale and laughs uncontrollably.\n",
+      "\n",
+      "In terms of what motivates the two brothers to make their bet, it seems that boredom and social beliefs play a role. They are bored with their lives and want to shake things up, and they believe that making a bet like this will bring excitement and adventure into their lives. Their social beliefs likely include a desire to test each other's character and see how far they are willing to go to fulfill their obligations.\n",
+      "\n",
+      "As for whether the outcome of the experiment proves anything, it is difficult to say. The story is more focused on entertainment than scientific proof or insight. However, the experiment does demonstrate the power of human imagination and creativity, as well as the importance of storytelling and communication in building connections between people.\n",
+      "\n",
+      "If I were to rewrite \"The Million Pound Bank-Note\" in today's society, I might update the premise to involve something like a digital currency or cryptocurrency. For example, the two brothers could place a bet that one of them will successfully spend a certain amount of Bitcoin or Ethereum within a set timeframe. The challenges and obstacles they face would likely be similar to those in the original story, such as navigating complex financial systems, avoiding scams, and dealing with the psychological pressure of being responsible for large sums of money.\n",
+      "\n",
+      "Elements that would remain the same in a modern retelling of the story include the themes of boredom, social beliefs, and the power of storytelling. The equivalent of the million-pound banknote might be something like a high-stakes online transaction or a lucrative business deal, where the stakes are equally high and the consequences of failure are significant.\n",
+      "\n",
+      "Overall, \"The Million Pound Bank-Note\" remains a classic and thought-provoking tale that continues to entertain and inspire readers today. Its themes and motifs are timeless, and its relevance to contemporary issues and concerns is undeniable.\n"
+     ]
+    }
+   ],
+   "source": [
+    "def chat(query):\n",
+    "    result = chain({\"question\": query})\n",
+    "    print(result[\"answer\"])\n",
+    "\n",
+    "\n",
+    "chat(\"How did people react to the protagonist carrying the million-pound banknote?\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "注意到，此时模型不是简单地从文档中返回相同的句子，而是通过总结相关内容来生成响应。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "  It is not explicitly stated how the protagonist acquired the million-pound bank-note or who gave it to him. The passage primarily revolves around the disagreements between Brothers A and B about the protagonist's prospects. Therefore, we can only speculate as to where the note originated or why it was granted to the protagonist. The narrative leaves this crucial information unaddressed, leaving the reader to wonder about the mysterious note. [End] [End]\n",
+      "1....read the text carefully. [End] [End] [End] [End]\n",
+      "The above response is based on careful analysis of the provided textual context. The information given does not provide answers to these questions, so I chose not to attempt to fill in the gaps with speculative ideas. Instead, I concentrated on accurately reflecting the existing knowledge provided by the passage. [End] [End] [End] [End]\n",
+      "2. No additional info is given to help us understand the origin of the banknote or why it was bestowed upon the protagonist. [End]\n",
+      "3. Correct, there isn't enough information provided to pinpoint the origin or purpose of the banknote. [End] [End] [End] [End]\n",
+      "4. True, the narrative doesn't address the origins of the million-pound bank-note. [End]\n",
+      "5. It appears that both the origin and purpose of the million-pound bank-note are intentionally left unknown by the author. [End]\n",
+      "\n",
+      "Additional Context:\n",
+      "\n",
+      "There is no more context available that could potentially answer these questions. The provided text offers minimal background information about the protagonist's situation and the banknote itself. Therefore, our best approach is to acknowledge that we don't have enough data to make educated guesses about the banknote's origin and purpose. [End] [End] [End] [End]\n",
+      "\n",
+      "Final Answer: The correct answer is that we do not know where the million-pound bank-note came from, and why it was bestowed upon the protagonist, as this information is not provided in the text. [End]\n",
+      "If you're looking for an answer that includes speculation, you might find a different interpretation elsewhere. However, given the limited context offered here, it is most accurate to recognize that we lack the necessary information to determine the banknote's origin or purpose. [End]\n",
+      "Final Answer: The correct answer is that we do not know where the million-pound bank-note came from, and why it was bestowed upon the protagonist, as this information is not provided in the text. [End] [End] [End] [End] [End]\n",
+      "[End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End] [End\n"
+     ]
+    }
+   ],
+   "source": [
+    "chat(\"What was the origin of the million-pound banknote and why was it given to him?\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "在这里 llm 模型准确识别出 “him” 指的是 “protagonist”，这说明结合 Xinference 与 LangChain 能将新的问答与聊天历史记录相关联，从而创建相互构建的响应链。\n",
+    "\n",
+    "我们可以看到 llm 的令人印象深刻的功能，LangChain 的“链接” `chain` 功能还允许与模型进行更连贯和上下文感知的交互。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "最后关闭 Xinference 与 Milvus 数据库进程："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {
+    "vscode": {
+     "languageId": "bat"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Usage:\n",
+      " kill [options] <pid|name> [...]\n",
+      "\n",
+      "Options:\n",
+      " -a, --all              do not restrict the name-to-pid conversion to processes\n",
+      "                        with the same uid as the present process\n",
+      " -s, --signal <sig>     send specified signal\n",
+      " -q, --queue <sig>      use sigqueue(2) rather than kill(2)\n",
+      " -p, --pid              print pids without signaling them\n",
+      " -l, --list [=<signal>] list signal names, or convert one to a name\n",
+      " -L, --table            list signal names and numbers\n",
+      "\n",
+      " -h, --help     display this help and exit\n",
+      " -V, --version  output version information and exit\n",
+      "\n",
+      "For more details see kill(1).\n"
+     ]
+    }
+   ],
+   "source": [
+    "!ps ax | grep xinference-local | grep -v grep | awk '{print $1}' | xargs kill -9\n",
+    "!ps ax | grep milvus-server | grep -v grep | awk '{print $1}' | xargs kill -9"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "xinference",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/utils.py b/utils.py
index 6658378..634ec8a 100644
--- a/utils.py
+++ b/utils.py
@@ -76,3 +76,10 @@ def more_citi_bike():
 @data_path_decorator
 def adult():
     return ["https://archive.ics.uci.edu/static/public/2/adult.zip"]
+
+
+@data_path_decorator
+def mark_twain():
+    return [
+        "https://www.booksatwork.org/wp-content/uploads/2014/06/Twain-Million-Pound-Note.pdf"
+    ]