LangGraph 设计与实现

第16章预构建 Agent 组件

作者杨艺韬 · 13,111 字

第16章预构建 Agent 组件

16.1 引言

前面的章节深入剖析了 LangGraph 的底层基础设施——StateGraph、Channel、Pregel 调度、Checkpoint、Send、Runtime、Store。这些原语提供了极大的灵活性，但直接使用它们构建一个完整的 Agent 需要编写大量的样板代码：定义状态 schema、创建 ToolNode、编写条件边路由、处理错误和重试。

langgraph.prebuilt 模块正是为了解决这个问题。它在底层原语之上提供了一组经过实战验证的高层组件：create_react_agent 工厂函数可以一行代码创建完整的 ReAct Agent；ToolNode 封装了工具执行的并行化、错误处理和状态注入；tools_condition 提供了标准的条件路由；ValidationNode 支持工具调用的 schema 验证；InjectedState 和 InjectedStore 让工具可以直接访问图状态和持久化存储。

本章将从这些组件的源码出发，分析它们如何将底层能力组合成开发者友好的高层 API，同时保持完整的可扩展性。

本章要点

create_react_agent 工厂函数——从参数到编译图的完整构建流程
ToolNode 实现——并行执行、错误处理、状态注入、Command 支持
tools_condition 路由——标准的 Agent 循环条件判断
ValidationNode——工具调用的 Pydantic schema 验证
InjectedState、InjectedStore、ToolRuntime——工具级别的依赖注入

16.1.1 v1.0 以来的 API 迁移：`create_react_agent` 已 deprecated、`ToolNode` 保留

必须首先告知读者本章涉及的 API 迁移状态（和第 17 章开头的说明配套——自 LangGraph v1.0 起这层变化贯穿多章）：

create_react_agent——已标 @deprecated(LangGraphDeprecatedSinceV10)，源码注释（chat_agent_executor.py:274）指导迁移到 langchain.agents.create_agent
AgentState / AgentStatePydantic / AgentStateWithStructuredResponse / AgentStateWithStructuredResponsePydantic——同样 deprecated，迁移到 langchain.agents
ToolNode ✅ 仍是推荐入口、未 deprecated——它作为底层原语被新旧两条路径共同使用

对照 tool_node.py:619 的 ToolNode 类 docstring 原文：

For standard ReAct-style agents, use [`create_agent`][langchain.agents.create_agent]
instead. It uses `ToolNode` internally with sensible defaults for the agent loop,
conditional routing, and error handling.

——意思是 create_agent（langchain.agents 新版）内部仍然是 ToolNode，只是把 “agent loop + routing + error handling” 这些默认配置打包。

新旧架构对照：

用途	v0.x 写法（现 deprecated）	v1.0+ 推荐写法
标准 ReAct Agent	`from langgraph.prebuilt import create_react_agent`	`from langchain.agents import create_agent`
自定义工具执行节点	`from langgraph.prebuilt import ToolNode`	保持不变（ToolNode 是稳定底层原语）
工具条件路由	`from langgraph.prebuilt import tools_condition`	保持不变
状态注入（InjectedState/InjectedStore）	保持不变	保持不变

为什么 ToolNode 保留而 create_react_agent 迁走——分层原则：

ToolNode 本质是 LangGraph 图执行层的一个节点、和 StateGraph/Pregel 紧耦合、应该留在 LangGraph
create_react_agent 是”Agent 应用形态”的预设、和具体执行引擎解耦、应该在 LangChain 层

本章接下来的 create_react_agent 部分仍展示经典实现——有助于读者理解 create_agent 内部是怎么组装的（新 create_agent 实际做相似的事）。看到示例代码里 from langgraph.prebuilt import create_react_agent 时——知道它 deprecated 但仍可用、新项目按表里迁移即可。

16.2 create_react_agent 工厂函数

16.2.1 签名概览

create_react_agent 定义在 langgraph/prebuilt/chat_agent_executor.py 中，是构建 ReAct Agent 的一站式入口：

def create_react_agent(
    model: str | LanguageModelLike | Callable,
    tools: Sequence[BaseTool | Callable | dict] | ToolNode,
    *,
    prompt: Prompt | None = None,
    response_format: StructuredResponseSchema | None = None,
    pre_model_hook: RunnableLike | None = None,
    post_model_hook: RunnableLike | None = None,
    state_schema: StateSchemaType | None = None,
    context_schema: type[Any] | None = None,
    checkpointer: Checkpointer | None = None,
    store: BaseStore | None = None,
    interrupt_before: list[str] | None = None,
    interrupt_after: list[str] | None = None,
    debug: bool = False,
    version: Literal["v1", "v2"] = "v2",
    name: str | None = None,
) -> CompiledStateGraph:

16.2.2 构建流程

flowchart TB
    subgraph 参数解析
        Model[model 参数] --> |str| InitModel["init_chat_model()"]
        Model --> |Runnable| BindTools["bind_tools(tools)"]
        Model --> |Callable| DynModel[动态模型选择]
        Tools[tools 参数] --> |list| CreateTN[创建 ToolNode]
        Tools --> |ToolNode| UseTN[直接使用]
    end

    subgraph 图构建
        CreateTN --> AddNodes
        UseTN --> AddNodes
        AddNodes[添加节点] --> Agent["'agent' 节点<br/>prompt + LLM"]
        AddNodes --> ToolsN["'tools' 节点<br/>ToolNode"]
        AddNodes --> |可选| PreHook["'pre_model_hook' 节点"]
        AddNodes --> |可选| PostHook["'post_model_hook' 节点"]
        Agent --> AddEdges[添加边]
        AddEdges --> CondEdge["条件边<br/>agent -> tools / END"]
        AddEdges --> BackEdge["tools -> agent"]
    end

    subgraph 编译
        AddEdges --> Compile["compile(<br/>checkpointer, store,<br/>interrupt_before/after)"]
        Compile --> CSG[CompiledStateGraph]
    end

16.2.3 模型处理

create_react_agent 支持三种模型传入方式：

# 1. 字符串标识符（需要 langchain 包）
graph = create_react_agent("openai:gpt-4", tools)

# 2. LangChain ChatModel 实例
from langchain_openai import ChatOpenAI
graph = create_react_agent(ChatOpenAI(model="gpt-4"), tools)

# 3. 动态模型选择函数
def select_model(state, runtime: Runtime[ModelContext]):
    if runtime.context.use_premium:
        return ChatOpenAI(model="gpt-4").bind_tools(tools)
    return ChatOpenAI(model="gpt-3.5-turbo").bind_tools(tools)

graph = create_react_agent(select_model, tools, context_schema=ModelContext)

对于静态模型，框架自动调用 bind_tools 绑定工具。如果模型已经通过 model.bind_tools() 绑定了工具，框架会检查绑定的工具是否与传入的 tools 参数匹配。

16.2.4 v1 vs v2 版本差异

version 参数控制工具节点的执行策略：

graph LR
    subgraph "v1：单节点处理所有工具调用"
        AI1[AIMessage<br/>tool_calls: A, B, C] --> TN1[ToolNode]
        TN1 --> |并行执行 A B C| Result1[三个 ToolMessage]
    end

    subgraph "v2：Send API 分发工具调用"
        AI2[AIMessage<br/>tool_calls: A, B, C] --> Send2{Send 分发}
        Send2 --> TN2a["ToolNode(call A)"]
        Send2 --> TN2b["ToolNode(call B)"]
        Send2 --> TN2c["ToolNode(call C)"]
    end

v2 版本使用 Send API 将每个 tool_call 分发为独立的 ToolNode 实例。这种设计的优势：

中断粒度更细：可以单独中断/恢复某个工具调用
超时隔离：一个工具超时不影响其他工具
Checkpoint 更精确：每个工具调用有独立的 checkpoint 状态

16.2.3-bis pre_model_hook / post_model_hook / response_format：不提 agent loop，单讲附加图拓扑

create_react_agent 的签名里有三个容易被忽略的参数（§16.2.1 列出但未展开）：

参数	节点名	位置	常见用途
`pre_model_hook`	`"pre_model_hook"`	agent 前	messages 修剪/压缩（如超长历史摘要）、RAG 注入
`post_model_hook`	`"post_model_hook"`	agent 后（tool_calls 路由前）	审批、日志、结构化响应生成前的最后拦截
`response_format`	`"generate_structured_response"`	终止前	用 `with_structured_output` 把最后一条 message 转成 Pydantic 对象

从 chat_agent_executor.py:787-828 可以看到真实的拓扑装配顺序（这里 tool_calling_enabled=False 分支，更直观）：

workflow = StateGraph(state_schema, context_schema)
workflow.add_node("agent", RunnableCallable(call_model, acall_model), ...)
if pre_model_hook is not None:
    workflow.add_node("pre_model_hook", pre_model_hook)
    workflow.add_edge("pre_model_hook", "agent")
    entrypoint = "pre_model_hook"
else:
    entrypoint = "agent"

workflow.set_entry_point(entrypoint)

if post_model_hook is not None:
    workflow.add_node("post_model_hook", post_model_hook)
    workflow.add_edge("agent", "post_model_hook")

if response_format is not None:
    workflow.add_node("generate_structured_response", ...)
    if post_model_hook is not None:
        workflow.add_edge("post_model_hook", "generate_structured_response")
    else:
        workflow.add_edge("agent", "generate_structured_response")

有几个由代码暗示、但不会写进文档的行为：

1、entrypoint 的动态选择：如果有 pre_model_hook，它成为 START 之后的第一个节点；否则直接 agent 接手。这种”可选前置节点”的实现方式比 “总是有一个 noop pre_model_hook” 更简洁——少一个节点就少一份 Checkpoint 记录和调度开销。

2、post_model_hook 可以阻塞结构化响应生成：看 817 行的条件——如果同时存在 post_model_hook 和 response_format，边是 post_model_hook → generate_structured_response 而不是 agent → generate_structured_response。这暗示一个微妙的合约：post_model_hook 不仅能审计，还可以通过修改 state 里的 messages 影响后续的 structured response 生成。一个实际用法是”post_model_hook 把 AIMessage 的 content 清理一下再给结构化提取器”。

3、tool_calling 的分支（830 行的 should_continue）：当有 tool_calls 时默认进 tools 节点；没有 tool_calls 时路径是 agent → post_model_hook → (response_format ? generate_structured_response : END)。post_model_hook 位于”LLM 答完但 tools 还没跑”的缝里——它其实是在”是否继续循环”的决策之前被调用的；这给了它一个超能力：它可以直接改 state.messages.last.tool_calls，从而改变 should_continue 的路由决定。这对实现”审批类 agent”特别有用——post_model_hook 弹出 interrupt 等人确认、人批了再加个 tool_call 回去。

16.2.4-bis _should_bind_tools：三种”模型已经绑过工具了吗”的分支

create_react_agent 在接到 model 和 tools 后要决定是不是帮用户调 model.bind_tools(tools)。逻辑不复杂但坑很多，chat_agent_executor.py:173 的 _should_bind_tools 把这个判断拆成三层：

def _should_bind_tools(model, tools, num_builtin=0):
    # 层 1：如果是 RunnableSequence（prompt | model），挖出里面的 model
    if isinstance(model, RunnableSequence):
        model = next(
            (step for step in model.steps
             if isinstance(step, (RunnableBinding, BaseChatModel))),
            model,
        )

    # 层 2：不是 RunnableBinding（没调过 bind_tools）→ 需要绑
    if not isinstance(model, RunnableBinding):
        return True

    # 层 3：是 Binding 但 kwargs 里没 tools → 可能 bind 了别的（e.g. stop sequences）→ 需要绑
    if "tools" not in model.kwargs:
        return True

    # 层 4：已经绑了，但要验证绑的工具和 tools 参数是否匹配
    bound_tools = model.kwargs["tools"]
    if len(tools) != len(bound_tools) - num_builtin:
        raise ValueError(...)

    # 层 5：按名字集合比对（OpenAI / Anthropic 两种 schema 都支持）
    tool_names = set(tool.name for tool in tools)
    bound_tool_names = set()
    for bound_tool in bound_tools:
        if bound_tool.get("type") == "function":     # OpenAI
            bound_tool_name = bound_tool["function"]["name"]
        elif bound_tool.get("name"):                  # Anthropic
            bound_tool_name = bound_tool["name"]
        else:
            continue
        bound_tool_names.add(bound_tool_name)

    if missing_tools := tool_names - bound_tool_names:
        raise ValueError(...)
    return False

几个容易踩的点：

1、RunnableSequence 自动拆箱。用户如果写 model = prompt | ChatOpenAI().bind_tools(tools)，传进来的是 RunnableSequence([prompt, RunnableBinding])。这里的 next(...) 迭代 model.steps，找第一个 RunnableBinding/BaseChatModel 作为真实模型——这意味着你可以把 prompt 工程层包进 chain 再传给 create_react_agent，框架能识别出内部的真模型。如果整条 chain 里完全没有 Binding/ChatModel，next 的 default 会退回原 chain，层 2 的 not isinstance(RunnableBinding) 成立，再次调 bind_tools——这可能并不是用户想要的，但此时错在用户（他们传了一个框架无法理解的 chain）。

2、OpenAI vs Anthropic 的工具 schema 区别。OpenAI 风格是 {"type": "function", "function": {"name": "...", ...}}，Anthropic 风格是 {"name": "...", "description": ..., "input_schema": ...}。代码里两种都识别，未识别的 “unknown tool type so we’ll ignore it”——沉默地忽略，不 fail。这是向后兼容性的体现：新 LLM 厂商可能引入新 schema，框架不应该因此整个崩溃。

3、num_builtin 的存在暗示什么？ 这个参数允许”模型已绑定的工具数量比 tools 参数多 N 个”。N 通常对应 OpenAI 的 {"type": "code_interpreter"} / {"type": "web_search"} 等内建工具——这些不需要用户在 tools 参数里声明，但它们会出现在 model.kwargs["tools"]。框架给个偏移量，让 “LLM 的内建工具 + 用户声明的 tools” 的总数能对上。

这段验证逻辑的存在本身说明了一个设计判断：create_react_agent 相信用户能自己调 bind_tools、但不相信用户不会写错——所以宁可多花一次 set-diff 的代价保证”tools 传入的和已绑的是一致的”。这种”trust but verify”的风格贯穿了 LangGraph 的预构建组件层。

16.2.5 remaining_steps 安全机制

create_react_agent 使用 RemainingSteps managed value 来防止无限循环：

class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]
    remaining_steps: NotRequired[RemainingSteps]

当 remaining_steps 降至 2 以下且 LLM 仍在请求工具调用时，Agent 会返回一条友好的终止消息，而不是抛出 GraphRecursionError。

16.2.5-bis _are_more_steps_needed 的真实判断表

上面”降至 2 以下”的说法来自 _are_more_steps_needed（chat_agent_executor.py:620），真实判断是个二维表：

def _are_more_steps_needed(state, response):
    has_tool_calls = isinstance(response, AIMessage) and response.tool_calls
    all_tools_return_direct = (
        all(call["name"] in should_return_direct for call in response.tool_calls)
        if isinstance(response, AIMessage)
        else False
    )
    remaining_steps = _get_state_value(state, "remaining_steps", None)
    if remaining_steps is not None:
        if remaining_steps < 1 and all_tools_return_direct:
            return True
        elif remaining_steps < 2 and has_tool_calls:
            return True
    return False

两种兜底条件：

剩余步数	tool_calls 情况	结果
`< 1`	所有工具都是 `return_direct`（直接返回不再 loop）	终止、返回 “need more steps”
`< 2`	任意 tool_call 存在	终止、返回 “need more steps”

为什么一个是 1、一个是 2？因为 return_direct=True 的工具执行完就结束——从当前帧算起只需要 1 个后续步骤（工具执行那一步）；普通工具执行完还要回到 agent 节点再让 LLM 看一眼，需要 2 步。这是”剩余步数够不够走完最后一圈”的精确算账。

返回的”友好终止消息”原文是：

AIMessage(
    id=response.id,
    content="Sorry, need more steps to process this request.",
)

——注意这里复用了 response 的 id——意味着这条 AIMessage 替换掉了原本的 AIMessage（原版里带 tool_calls），而不是追加。为什么要这么做？因为如果追加一条新 AIMessage 但保留原版的 tool_calls，_validate_chat_history 下一轮会抛 “tool_calls without corresponding ToolMessage” 的错（见 16.2.5-ter 节）——必须用同 id 的消息就地替换，才能同时满足”聊天历史一致性”和”不执行已请求的工具”两个条件。

这种”用 id 替换”的技术在第 13 章 Checkpoint 讲 message reducer 时铺垫过——add_messages reducer 遇到同 id 的新消息就覆盖旧的。这里就是对这条规则的实际应用，框架自己的保护代码也严格遵守自己定的 reducer 语义。

16.2.5-ter _validate_chat_history：为什么 LLM 提供商都要求”tool_calls 必须有配对 ToolMessage”

_get_model_input_state 在把 state 转成 LLM 输入前会走一步 _validate_chat_history（第 243 行），这个校验看似小，背后的道理必须讲清楚：

def _validate_chat_history(messages):
    all_tool_calls = [tc for m in messages if isinstance(m, AIMessage) for tc in m.tool_calls]
    tool_call_ids_with_results = {
        m.tool_call_id for m in messages if isinstance(m, ToolMessage)
    }
    tool_calls_without_results = [
        tc for tc in all_tool_calls
        if tc["id"] not in tool_call_ids_with_results
    ]
    if tool_calls_without_results:
        raise ValueError(
            "Found AIMessages with tool_calls that do not have a corresponding ToolMessage. "
            "... Every tool call (LLM requesting to call a tool) in the message history MUST have a corresponding ToolMessage "
            "(result of a tool invocation to return to the LLM) - this is required by most LLM providers."
        )

为什么框架要在每次调用模型前做这个 O(N) 遍历？因为 LLM 提供商（OpenAI、Anthropic、Google 等）的 API 协议本身就要求：如果上一轮 assistant 消息里有 tool_calls，下一轮必须紧跟着对应 id 的 tool 消息，否则 API 直接 400 报错。

这条契约不是 LangGraph 发明的，是底层 API 的约束。但 LangGraph 把它提前到图层做校验——在发送给 LLM 之前就抛错，而不是让 LLM 提供商拒绝、再把 HTTP 错误翻译回来。好处：

错误定位更精确：能指出是哪条 tool_call 缺失（tool_calls_without_results[:3]），而不是”OpenAI returned 400”。
节省 token 费用：校验失败不发送请求，不花 API 钱。
错误信号更早：如果某个自定义节点漏生成 ToolMessage，这里就会炸出来，而不是等到下一轮 LLM 调用。

这条校验也解释了为什么 §16.2.5-bis 里”终止时用同 id 替换 AIMessage”是必须的——如果保留原 AIMessage 的 tool_calls 而不追加 ToolMessage，_validate_chat_history 会当场 raise。框架自己的代码也要遵守框架自己的校验规则——这是良好 API 设计的一致性表现。

16.3 ToolNode 实现

16.3.1 核心职责

ToolNode 是工具执行的中枢，负责：

从最后一条 AIMessage 中提取 tool_calls
查找对应的工具实现
注入状态、Store 等依赖
并行执行工具
处理错误和返回结果

class ToolNode(RunnableCallable):
    """A node that runs tools requested by an AI model."""

    def __init__(
        self,
        tools: Sequence[BaseTool | Callable],
        *,
        name: str = "tools",
        tags: list[str] | None = None,
        handle_tool_errors: bool | str | Callable | tuple = True,
    ):
        self.tools_by_name = {tool.name: tool for tool in resolved_tools}
        self.handle_tool_errors = handle_tool_errors

16.3.2 工具执行流程

sequenceDiagram
    participant State as 图状态
    participant TN as ToolNode
    participant Tool as 具体工具
    participant Result as 结果

    State->>TN: 最后一条 AIMessage
    TN->>TN: 提取 tool_calls

    loop 每个 tool_call
        TN->>TN: 查找工具 by name
        TN->>TN: 注入 InjectedState / InjectedStore
        TN->>Tool: invoke(args)
        alt 正常返回
            Tool-->>TN: 工具输出
            TN->>Result: ToolMessage(content=output)
        else 工具异常
            Tool-->>TN: Exception
            alt handle_tool_errors=True
                TN->>Result: ToolMessage(content=error_msg, status="error")
            else handle_tool_errors=False
                TN-->>State: 抛出异常
            end
        end
    end

    TN->>State: {"messages": [ToolMessage, ...]}

16.3.2-bis 源码核对：`handle_tool_errors` 的真实默认值是什么

ToolNode.__init__ 的默认值在社区资料里经常被写成 handle_tool_errors=True——看起来”所有错误都会被 catch 成 ToolMessage”。打开真实源码 libs/prebuilt/langgraph/prebuilt/tool_node.py:750：

handle_tool_errors: bool
| str
| Callable[..., str]
| type[Exception]
| tuple[type[Exception], ...] = _default_handle_tool_errors,

默认值不是 True，而是 _default_handle_tool_errors——一个定义在 381 行的 callable：

def _default_handle_tool_errors(e: Exception) -> str:
    """Default error handler for tool errors.

    If the tool is a tool invocation error, return its message.
    Otherwise, raise the error.
    """
    if isinstance(e, ToolInvocationError):
        return e.message
    raise e

这段代码背后有一个至关重要的心智转换：默认情况下，LangGraph 只吞”LLM 的锅”，不吞”工具的锅”。

ToolInvocationError 是参数校验失败——LLM 给了不合法的参数、Pydantic 校验失败、工具被调用前就出错。这类错误默认被转成 ToolMessage 让 LLM 重试修复（毕竟是 LLM 自己的问题）。
其他任何异常（ValueError from DB timeout、requests.ConnectionError、KeyError from 业务逻辑 bug）——默认抛出、中断整个图。

如果用户传 handle_tool_errors=True（显式的 True 而非默认），那才是”吞所有”的行为。二者的默认策略差异在 docstring 里有明确描述（677 行开始）：

Defaults to a callable that:
- Catches tool invocation errors (due to invalid arguments provided by the
    model) and returns a descriptive error message
- Ignores tool execution errors (they will be re-raised)

这是一个被反复误读的设计——默认”不吞业务异常”是 LangGraph 的 opinion：业务错误应该让系统 fail fast，被 Checkpoint 保存在 failed 状态，等待人类运维决策，而不是默默变成”LLM 看到的一段 ToolMessage 字符串”导致错误被隐藏。

16.3.3 错误处理策略

handle_tool_errors 支持多种配置：

# 默认：返回错误消息
ToolNode(tools, handle_tool_errors=True)
# ToolMessage(content="Error: ... Please fix your mistakes.")

# 自定义错误消息
ToolNode(tools, handle_tool_errors="Something went wrong, try again.")

# 自定义错误处理函数
def my_handler(e: ValueError) -> str:
    return f"Got a value error: {e}"
ToolNode(tools, handle_tool_errors=my_handler)

# 只捕获特定异常类型
ToolNode(tools, handle_tool_errors=(ValueError, TypeError))

# 不捕获错误，直接抛出
ToolNode(tools, handle_tool_errors=False)

当 handle_tool_errors 是一个 callable 时，框架会通过 _infer_handled_types 分析其类型注解，推断它能处理哪些异常类型：

def _infer_handled_types(handler: Callable) -> tuple[type[Exception], ...]:
    """分析 handler 的类型注解，推断可处理的异常类型"""
    sig = inspect.signature(handler)
    # 检查第一个参数的类型注解
    type_hints = get_type_hints(handler)
    # 支持 Union[ValueError, TypeError] 等联合类型
    ...

16.3.3-bis TOOL_CALL_ERROR_TEMPLATE：默认错误消息的原文

handle_tool_errors=True 时用的默认错误模板是什么？tool_node.py 第 109 行：

TOOL_CALL_ERROR_TEMPLATE = "Error: {error}\n Please fix your mistakes."

注意最后一句 “Please fix your mistakes.” ——这句指令直接发给 LLM。它既是错误消息的一部分，又是对下一轮模型推理的轻度提示（“你搞砸了，改”）。这种把”prompt 工程”嵌进框架默认值的做法，体现了 LangGraph 设计者对 LLM 行为的一条判断：错误消息不只是日志、它也是下一轮 prompt 的一部分——给一个”please fix”的礼貌措辞能让 LLM 更愿意尝试修复而不是陷入道歉循环。

这是值得记住的 LangGraph 特色：框架默认值里藏着对 LLM 心理学的 opinion。如果你发现你的 agent 总在报错后”说对不起但不改参数”，很可能要定制这个模板——比如改成 "Error: {error}\nThe arguments you provided did not conform to the schema. Retry with corrected arguments." 会比笼统的”fix your mistakes”效果好。

16.3.3-ter GraphBubbleUp：三种”永远不被吞”的中断

在 _execute_tool_sync（tool_node.py:947-957）里有一段特殊的异常处理：

# GraphInterrupt is a special exception that will always be raised.
# It can be triggered in the following scenarios,
# Where GraphInterrupt(GraphBubbleUp) is raised from an `interrupt` invocation
# most commonly:
# (1) a GraphInterrupt is raised inside a tool
# (2) a GraphInterrupt is raised inside a graph node for a graph called as a tool
# (3) a GraphInterrupt is raised when a subgraph is interrupted inside a graph
#     called as a tool
# (2 and 3 can happen in a "supervisor w/ tools" multi-agent architecture)
except GraphBubbleUp:
    raise

这段代码把 GraphBubbleUp（包括 GraphInterrupt）从 handle_tool_errors 的覆盖范围里挖了个洞——无论你怎么配置错误处理，一旦检测到它就立刻原样重抛。理由被注释写清楚了，对应第 11 章和第 13 章我们讲过的三种典型场景：

场景 (1) 工具里直接 interrupt()：人机确认类工具（“下单前问一下用户”）需要把控制权吐回 Pregel，由 Checkpoint 保存”等待人类答复”的状态。
场景 (2) 把一张子图当工具调用：子图自己 interrupt，这个 interrupt 必须穿过 ToolNode 向上传，不能被当成”工具执行错误”吞掉。
场景 (3) supervisor 多 agent：一个 supervisor 调一个 worker agent 当 tool，worker 里 interrupt，同样要 bubble up 到 supervisor 的 Pregel 循环。

没有这个例外，人机交互类工具根本无法在 ToolNode 里工作——interrupt 会被 handle_tool_errors=True 吞成一条普通的 ToolMessage，LLM 看到”Error: GraphInterrupt(…)”然后很礼貌地回答”好的我会注意的”——然后图就继续跑下去了，interrupt 的语义被完全破坏。

这是 LangGraph 在错误处理上的”有序分层”：应用层错误被吞（工具参数错了，吞）、控制流异常被穿透（interrupt 要暂停图，必须穿透）。理解这条分界能帮你在自己写 Wrapper 时避免”无意中把 interrupt 也吞了”的大坑——自定义的 wrap_tool_call 里的 try/except 必须只抓 Exception，绝不能抓 BaseException，否则就可能吞掉 GraphBubbleUp。

16.3.4 ToolCallRequest 与拦截器

v2 版本引入了 ToolCallRequest 和 ToolCallWrapper，支持工具调用的中间件模式：

@dataclass
class ToolCallRequest:
    tool_call: ToolCall        # 工具调用信息
    tool: BaseTool | None      # 工具实例
    state: Any                 # 当前图状态
    runtime: ToolRuntime       # 工具运行时

    def override(self, **overrides) -> ToolCallRequest:
        """创建修改后的请求副本"""
        return replace(self, **overrides)

拦截器模式允许在工具执行前后插入自定义逻辑：

ToolCallWrapper = Callable[
    [ToolCallRequest, Callable[[ToolCallRequest], ToolMessage | Command]],
    ToolMessage | Command,
]

# 拦截器示例：重试逻辑
def retry_wrapper(request, execute):
    for attempt in range(3):
        result = execute(request)
        if isinstance(result, ToolMessage) and result.status != "error":
            return result
    return result

# 拦截器示例：参数修改
def sanitize_wrapper(request, execute):
    modified_call = {**request.tool_call, "args": sanitize(request.tool_call["args"])}
    return execute(request.override(tool_call=modified_call))

16.3.4-bis _infer_handled_types：从类型注解推断异常

前面提到当 handle_tool_errors 是一个 callable 时，框架会根据它的类型注解推断能处理哪些异常。这个推断函数 _infer_handled_types 在 tool_node.py:442，它的实现比简单 inspect.signature 复杂：

sig = inspect.signature(handler)
params = list(sig.parameters.values())
if params:
    # If it's a method, the first argument is typically 'self' or 'cls'
    if params[0].name in ["self", "cls"] and len(params) == 2:
        first_param = params[1]
    else:
        first_param = params[0]

    type_hints = get_type_hints(handler)
    if first_param.name in type_hints:
        origin = get_origin(first_param.annotation)
        if origin in [Union, UnionType]:
            args = get_args(first_param.annotation)
            if all(issubclass(arg, Exception) for arg in args):
                return tuple(args)
            # ... 否则 raise ValueError("必须全是 Exception 子类")
        exception_type = type_hints[first_param.name]
        if Exception in exception_type.__mro__:
            return (exception_type,)
        # ... 否则 raise ValueError("必须是 Exception")
# If no type information is available, return (Exception,) for backwards compatibility.
return (Exception,)

三个技术细节值得拆开：

1、method 识别。如果 handler 是一个绑定方法（比如你在一个类里写 def handle(self, e: ValueError) -> str: ...），第一个参数是 self，真实的异常参数是第二个。代码用 params[0].name in ["self", "cls"] and len(params) == 2 判定——两个条件必须同时成立：名字是 self/cls，并且只有 2 个参数。如果你的方法有 3 个参数（e.g. def handle(self, e, extra_context)），这个判定就不匹配，回退到看第一个参数（也就是 self）——然后 get_type_hints 里 self 的注解通常是类自己，不是 Exception，会抛 ValueError。写错误处理器时保持签名简洁是语义必需。

2、Union 支持。def my_handler(e: ValueError | TypeError) -> str 会被正确识别为”处理 ValueError 或 TypeError”，前提是 Union 里全部是 Exception 子类。如果你写 def my_handler(e: ValueError | str) -> str——混进了 str——会抛 ValueError：“All types in the error handler error annotation must be Exception types.”

3、缺失类型提示的兜底。如果 handler 根本没有类型注解，函数返回 (Exception,)——意味着”处理一切异常”。注释写的是”for backwards compatibility”，也就是说 LangGraph 早期版本的用户可能写了没注解的 handler，不能因为升级破坏他们的代码。这是典型的向后兼容优先于精确性的 opinion。

这个函数还揭示了一条 Python 编程实践：在运行时根据类型注解选择行为是可行的（inspect.signature + get_type_hints），但要精心处理 UnionType（Python 3.10+ 的 X | Y 语法）vs typing.Union——get_origin 返回的 origin 可能是两个不同的对象，代码里把 [Union, UnionType] 都列进去才能兼容两种写法。

16.3.5 Command 返回支持

工具可以返回 Command 对象来直接控制图的执行流：

@tool
def transfer_to_agent(agent_name: str) -> Command:
    """将对话转移给另一个 Agent"""
    return Command(goto=agent_name, update={"transferred": True})

ToolNode 会识别 Command 类型的返回值，将其直接传播到图的控制流中，而不是包装为 ToolMessage。

16.3.5-bis msg_content_output：工具返回任意类型如何变成 ToolMessage.content

LLM 聊天协议里 ToolMessage.content 只能是 str 或 “content blocks 列表”。但我们写工具时函数可以返回任意 Python 对象——字典、Pydantic 实例、dataclass、甚至 None。这之间的转换由 msg_content_output（tool_node.py:307）承担：

def msg_content_output(output: Any) -> str | list[dict]:
    """Convert tool output to `ToolMessage` content format."""
    if isinstance(output, str) or (
        isinstance(output, list)
        and all(
            isinstance(x, dict) and x.get("type") in TOOL_MESSAGE_BLOCK_TYPES
            for x in output
        )
    ):
        return output
    # Technically a list of strings is also valid message content, but it's
    # not currently well tested that all chat models support this.
    # And for backwards compatibility we want to make sure we don't break
    # any existing ToolNode usage.
    try:
        return json.dumps(output, ensure_ascii=False)
    except Exception:
        return str(output)

三档转换：

已经是 str：直接返回。
已经是合法的 content blocks：直接返回（需要 list 里每个 dict 都有 type 字段且值在 TOOL_MESSAGE_BLOCK_TYPES 白名单里——通常是 "text"/"image_url"/"tool_use" 这些）。
其它一切：先尝试 json.dumps(..., ensure_ascii=False)，失败就 str(output) 兜底。

ensure_ascii=False 这个参数很关键——它让中文、日文、emoji 等非 ASCII 字符保留原样而不是变成 中文 这种转义序列。如果漏了这个参数，工具返回 {"city": "北京"} 发给 LLM 的就是 {"city": "北京"}——多数模型能恢复，但有 token 浪费且语义脆弱。这是一个被社区反复踩过的坑，框架把它固化在默认值里。

注释里的一段话也值得读：“Technically a list of strings is also valid message content, but it’s not currently well tested that all chat models support this.”——作者明确承认API 规范允许、但实际 LLM 支持不一致的情况，选择保守路径：不支持 list of strings 作为原样输出。这是”文档上能做的、和现实中能用的是两回事”的经典工程经验——开源框架要承担的向后兼容性负担比规范本身还重。

16.3.5-ter INVALID_TOOL_NAME_ERROR_TEMPLATE：LLM 产生幻觉工具名时的保护

_validate_tool_call（tool_node.py:1259）会检查 LLM 请求的工具名是否存在。如果 LLM “幻觉”出一个不存在的工具（比如把 search_web 写成 web_search），框架不会抛错让整个图炸，而是回一条 ToolMessage：

INVALID_TOOL_NAME_ERROR_TEMPLATE = (
    "Error: {requested_tool} is not a valid tool, try one of [{available_tools}]."
)

def _validate_tool_call(self, call):
    requested_tool = call["name"]
    if requested_tool not in self.tools_by_name:
        all_tool_names = list(self.tools_by_name.keys())
        content = INVALID_TOOL_NAME_ERROR_TEMPLATE.format(
            requested_tool=requested_tool,
            available_tools=", ".join(all_tool_names),
        )
        return ToolMessage(
            content, name=requested_tool, tool_call_id=call["id"], status="error"
        )

“try one of [...]” 又是一条塞给 LLM 的指令——错误消息里直接列出可用工具名，相当于在对话里”偷偷提示”LLM “你可以从这几个里挑”。这比抛 KeyError: 'web_search' 让图崩掉要友好得多——LLM 下一轮几乎一定能挑对（LLM 的 fuzzy matching 能力对”拼写相近但不一样”的 case 是强项）。

三条 *_ERROR_TEMPLATE（TOOL_CALL_ERROR_TEMPLATE、INVALID_TOOL_NAME_ERROR_TEMPLATE、TOOL_EXECUTION_ERROR_TEMPLATE）共同构成了 ToolNode 的错误消息 DSL——它们不只是错误信息、也是下一轮 prompt。这条设计哲学在第 7 章讲 Pregel 错误传播时没有展开，本章是它最具体的落地。

16.3.6 _inject_tool_args：防止 LLM 伪造 InjectedToolArg 的安全层

InjectedState 和 InjectedStore 提供了”工具看见图的内部状态”的能力。这个能力有一个明显的安全隐患——LLM 能不能通过伪造 tool_calls.args 来偷偷传入 state/store，绕过框架的预期？

答案在 _inject_tool_args 末尾（tool_node.py:1387-1395）的一段代码里：

# Strip any caller-supplied values for injected args, then add
# back only trusted values. This prevents an LLM from forging
# hidden InjectedToolArg fields via ToolCall.args.
stripped_args = {
    k: v
    for k, v in tool_call_copy["args"].items()
    if k not in injected.all_injected_keys
}
tool_call_copy["args"] = {**stripped_args, **injected_args}
return tool_call_copy

这段代码做了两件事：

剥离：把 LLM 生成的 args 里所有与”被注入字段”同名的键全部丢掉。
补回：用框架自己从 state/store/runtime 里取的可信值覆盖。

翻译成场景：假如工具是 def save_note(content: str, store: Annotated[BaseStore, InjectedStore])，LLM 如果生成了 {"content": "...", "store": {"evil_key": "..."}}（伪造一个 store 想把东西写进系统），这段代码会先把 store 键从 LLM 的 args 里剔除，然后用真正的 tool_runtime.store 补回去——LLM 的伪造数据进不了工具。

这是一个静默的、默认开启的、不容绕过的安全层。docstring 没大张旗鼓地宣传它，但源码注释里写明了目的——“This prevents an LLM from forging hidden InjectedToolArg fields via ToolCall.args.”——“LLM 被假定为不可信输入”，这是 LangGraph 的安全基本盘。这条和第 13 章讲过的 Checkpoint 不可篡改原则是一个哲学：系统状态只能通过受控通道修改，不能被业务层（甚至 LLM）直接注入。

实际使用时你不需要做任何额外配置就享受这个保护——但理解它存在能帮你回答一个常见问题：“我能不能让 LLM 直接传入一个自定义的 store？“——不能，框架会把它剥掉；如果你的业务真的需要 LLM 选择 store，那不该用 InjectedStore 注入（那条路径是框架保留的），应该用普通参数 + 业务代码映射。

16.3.7 _parse_input：四种输入格式的识别逻辑

ToolNode 的 docstring（tool_node.py:634-644）说它支持多种输入格式。真实的识别在 _parse_input（1215 行）里按顺序做四个判断：

def _parse_input(self, input):
    if isinstance(input, list):
        # Case A: 直接的 ToolCall 列表（编程调用/测试）
        if isinstance(input[-1], dict) and input[-1].get("type") == "tool_call":
            return cast(list[ToolCall], input), "tool_calls"
        # Case B: 消息列表
        input_type = "list"; messages = input
    elif (isinstance(input, dict)
          and input.get("__type") == "tool_call_with_context"):
        # Case C: Send API 传来的单个 tool_call_with_context
        return [input["tool_call"]], "tool_calls"
    elif isinstance(input, dict) and (messages := input.get(self._messages_key, [])):
        # Case D: 标准图状态 dict
        input_type = "dict"
    elif messages := getattr(input, self._messages_key, []):
        # Case D': dataclass/BaseModel 状态
        input_type = "dict"
    else:
        raise ValueError("No message found in input")

    # 从 messages 里找最后一条 AIMessage
    try:
        latest_ai_message = next(
            m for m in reversed(messages) if isinstance(m, AIMessage)
        )
    except StopIteration:
        raise ValueError("No AIMessage found in input")

    tool_calls = list(latest_ai_message.tool_calls)
    return tool_calls, input_type

这段代码的值在于判断顺序——它必须按以下优先级来，否则会误匹配：

list + 最后一项 type == "tool_call" 优先：允许 ToolNode([...]).invoke([{"name": "...", "args": {...}, "id": "1", "type": "tool_call"}]) 这种直接调用模式（测试/编程用）。注意要判断 input[-1] 而不是 input[0]，因为这种模式通常只有一个元素但原理上允许传入多个。
dict + __type == "tool_call_with_context" 其次：Send API 在 v2 分发时包装的格式（§16.7.1）。注意这里不是 get("type") 而是 get("__type")——两个下划线——是为了避免和普通业务字段撞名，用户几乎不会在自己的 state 里定义 __type 这种 dunder 风格的键。
dict + messages_key 存在 第三：最常见的图状态格式 {"messages": [...]}。
BaseModel + messages_key 属性 最后兜底：支持 dataclass/Pydantic state schema。

判断 isinstance(input[-1], dict) 的位置不能提前——否则 [AIMessage(...)] 会被错误识别（消息对象有时候表现为类似 dict 的行为）。这种”由外层到内层、由特殊到一般”的识别顺序是 Python 多态派发的经典写法。

从 messages 里取 AIMessage 也有讲究——reversed(messages) 是因为最近的 AI 消息才是当前要处理的 tool_calls 来源。如果历史里有多轮 AI/Tool 交替，只看最后一轮。这是 ReAct 循环的标准语义，第 2 章 StateGraph 基础里讲过 messages 是 append-only 的，这里的 reversed 是对这条不变量的直接应用。

16.4 tools_condition 路由

16.4.1 实现

def tools_condition(
    state: list[AnyMessage] | dict[str, Any] | BaseModel,
    messages_key: str = "messages",
) -> Literal["tools", "__end__"]:
    """Conditional routing: if tool_calls present, route to 'tools'; else END."""
    if isinstance(state, list):
        ai_message = state[-1]
    elif isinstance(state, dict):
        ai_message = state[messages_key][-1]
    elif messages := getattr(state, messages_key, None):
        ai_message = messages[-1]
    else:
        raise ValueError(f"No messages found in state: {state}")

    if hasattr(ai_message, "tool_calls") and len(ai_message.tool_calls) > 0:
        return "tools"
    return "__end__"

这个函数实现了 ReAct Agent 的核心循环逻辑：如果 LLM 输出包含 tool_calls，继续执行工具；否则结束。它支持三种状态格式——列表、字典和 BaseModel。

16.4.2 在图中使用

builder = StateGraph(AgentState)
builder.add_node("agent", call_model)
builder.add_node("tools", ToolNode(tools))
builder.add_edge(START, "agent")
builder.add_conditional_edges("agent", tools_condition)
builder.add_edge("tools", "agent")
graph = builder.compile()

graph LR
    Start[START] --> Agent[agent]
    Agent -->|tool_calls 存在| Tools[tools]
    Agent -->|无 tool_calls| End[END]
    Tools --> Agent

16.5 ValidationNode

16.5.1 设计动机

在提取（extraction）和结构化输出场景中，我们经常需要让 LLM 生成符合特定 schema 的数据。ValidationNode 不执行工具，而是用 Pydantic 验证工具调用的参数是否合法：

class ValidationNode(RunnableCallable):
    def __init__(
        self,
        schemas: Sequence[BaseTool | type[BaseModel] | Callable],
        *,
        format_error: Callable | None = None,
        name: str = "validation",
    ):
        self.schemas_by_name: dict[str, type[BaseModel]] = {}
        for schema in schemas:
            # 支持 BaseModel、BaseTool、Callable 三种输入
            ...

16.5.2 验证流程

sequenceDiagram
    participant LLM as 模型
    participant Val as ValidationNode
    participant State as 状态

    LLM->>Val: AIMessage with tool_calls
    loop 每个 tool_call
        Val->>Val: 查找 schema by name
        Val->>Val: schema.model_validate(args)
        alt 验证通过
            Val->>State: ToolMessage(content=validated_json)
        else 验证失败
            Val->>State: ToolMessage(content=error, is_error=True)
        end
    end
    State->>LLM: 重新生成（如果有错误）

16.5.3 使用示例

from pydantic import BaseModel, field_validator

class SelectNumber(BaseModel):
    a: int

    @field_validator("a")
    def a_must_be_meaningful(cls, v):
        if v != 37:
            raise ValueError("Only 37 is allowed")
        return v

builder = StateGraph(Annotated[list, add_messages])
llm = ChatAnthropic(model="claude-3-5-haiku-latest").bind_tools([SelectNumber])
builder.add_node("model", llm)
builder.add_node("validation", ValidationNode([SelectNumber]))
builder.add_edge(START, "model")

def should_validate(state):
    if state[-1].tool_calls:
        return "validation"
    return END

builder.add_conditional_edges("model", should_validate)
builder.add_conditional_edges("validation", should_reprompt)
graph = builder.compile()

16.5.4 源码核对：ValidationNode 也 deprecated 了，默认错误消息的原文

打开 libs/prebuilt/langgraph/prebuilt/tool_validator.py，第 43 行：

@deprecated(
    "ValidationNode is deprecated. Please use `create_agent` from `langchain.agents` with custom tool error handling.",
    category=LangGraphDeprecatedSinceV10,
)
class ValidationNode(RunnableCallable):

ValidationNode 和 create_react_agent 一起在 v1.0 被标弃——未来 “参数校验 + LLM 重试” 的模式被建议走 create_agent 的 handle_tool_errors 自定义路径。但 ValidationNode 类本身依然保留（只是告警），存量代码可以继续跑。

默认错误格式化函数（tool_validator.py:34）：

def _default_format_error(
    error: BaseException,
    call: ToolCall,
    schema: type[BaseModel] | type[BaseModelV1],
) -> str:
    """Default error formatting function."""
    return f"{repr(error)}\n\nRespond after fixing all validation errors."

最后一句 “Respond after fixing all validation errors.” 又是一条直接塞给 LLM 的 prompt 级指令——和 §16.3.3-bis 讲的 TOOL_CALL_ERROR_TEMPLATE 一脉相承。但措辞更精准：“Respond after fixing”（改完再回）比单纯的”please fix”多了一个行为指令：先改，再回。对于 extraction/schema 校验场景，这种明确的”先修后答”效果更好——LLM 不会在”我知道错了”和”但我不知道怎么改”之间徘徊。

注意它同时 import 了 pydantic.v1.BaseModel 和 pydantic.BaseModel（第 28-30 行）——原因是 LangChain 生态里存量的 schema 分别用这两个版本的 Pydantic 写过，ValidationNode 必须两种都吃。这种同时支持两代基础库的兼容代码，在高速演进的 Python AI 生态里是常态；框架把”我兼容两种”藏在实现里，用户不用关心自己的 schema 是 v1 还是 v2。

16.5.5 _filter_validation_errors：不要把框架内部字段暴露给 LLM

回到 tool_node.py:508 的 _filter_validation_errors——当工具参数校验失败、框架抛 ToolInvocationError 准备让 LLM 重试时，这个函数负责把错误消息里跟 InjectedState/InjectedStore/ToolRuntime 相关的字段删掉：

def _filter_validation_errors(
    validation_error: ValidationError,
    injected_args: _InjectedArgs | None,
) -> list[ErrorDetails]:
    # Collect all injected argument names
    injected_arg_names: set[str] = set()
    if injected_args:
        if injected_args.state:
            injected_arg_names.update(injected_args.state.keys())
        if injected_args.store:
            injected_arg_names.add(injected_args.store)
        if injected_args.runtime:
            injected_arg_names.add(injected_args.runtime)

    filtered_errors: list[ErrorDetails] = []
    for error in validation_error.errors():
        # Check if error location contains any injected argument
        # ... 删掉属于 injected 的错误条目

docstring 解释了动机（tool_node.py:512-519）：

When a tool invocation fails validation, only errors for arguments that the LLM
controls should be included in error messages. This ensures the LLM receives
focused, actionable feedback about parameters it can actually fix. System-injected
arguments (state, store, runtime) are filtered out since the LLM has no control
over them.

一句话翻译：“LLM 看到的错误必须是它能改的东西，它改不了的内部注入字段要从错误里剪掉。”

举个具体场景：假如工具是 def save(content: str, runtime: ToolRuntime) -> str，Pydantic 会对全部参数做校验。如果 LLM 生成的 args 漏了 content、但框架注入 runtime 时出现了 type mismatch（比如 runtime 是 None 因为 store 没传），原始 ValidationError 可能长这样：

[
  {"loc": ("content",), "msg": "field required", ...},
  {"loc": ("runtime",), "msg": "none is not an allowed value", ...}
]

不过滤的话 LLM 会看到两条错误，然后可能尝试生成 {"content": "...", "runtime": "fake"}——这个 fake runtime 进来又会被 §16.3.6 的 _inject_tool_args 剥掉，框架再次校验失败，陷入错误循环。过滤后 LLM 只看到 "field required: content"，它一次就能改对。

这又是一个**“LLM 的错误反馈也是 prompt 的一部分”**的工程观。第 3 章讨论”提示工程是一等公民”时我们说过，LangGraph 的很多默认行为其实是 prompt 策略的固化；_filter_validation_errors 就是其中一条：错误消息也要被精心设计，不能一股脑抛给 LLM。

16.6 InjectedState 与 InjectedStore

16.6.1 InjectedState

InjectedState 让工具函数能够访问图的当前状态，而不需要将状态字段显式作为工具参数：

from langgraph.prebuilt import InjectedState
from typing import Annotated

@tool
def get_context(
    question: str,
    state: Annotated[dict, InjectedState]
) -> str:
    """根据对话历史回答问题"""
    messages = state["messages"]
    context = "\n".join(m.content for m in messages[-5:])
    return f"Based on context: {context}\nAnswer: ..."

InjectedState 标记的参数不会出现在工具的 schema 中（LLM 不会尝试填充它），它由 ToolNode 在执行时自动注入。

16.6.2 InjectedStore

类似地，InjectedStore 让工具直接访问 Store：

@tool
def save_note(
    content: str,
    store: Annotated[BaseStore, InjectedStore]
) -> str:
    """保存笔记"""
    store.put(("notes",), f"note_{hash(content)}", {"content": content})
    return "Note saved."

16.6.3 ToolRuntime：统一的工具运行时

LangGraph 1.1.6 引入了 ToolRuntime，统一了 InjectedState、InjectedStore 和其他注入：

class ToolRuntime(_DirectlyInjectedToolArg, Generic[ContextT, StateT]):
    """Runtime context automatically injected into tools."""

    context: ContextT          # 运行时上下文（与 Runtime 共享）
    store: BaseStore | None    # 持久化存储（与 Runtime 共享）
    stream_writer: StreamWriter  # 流式写入器（与 Runtime 共享）
    config: RunnableConfig     # 工具特有：当前配置
    state: StateT              # 工具特有：图状态
    tool_call_id: str          # 工具特有：工具调用 ID

graph TB
    subgraph "Runtime（节点级）"
        R_ctx[context]
        R_store[store]
        R_sw[stream_writer]
        R_prev[previous]
        R_ei[execution_info]
        R_si[server_info]
    end

    subgraph "ToolRuntime（工具级）"
        TR_ctx[context]
        TR_store[store]
        TR_sw[stream_writer]
        TR_config[config]
        TR_state[state]
        TR_tcid[tool_call_id]
    end

    R_ctx -.->|共享| TR_ctx
    R_store -.->|共享| TR_store
    R_sw -.->|共享| TR_sw

使用 ToolRuntime 的工具示例：

@tool
def smart_tool(query: str, runtime: ToolRuntime) -> str:
    """一个能访问所有运行时信息的工具"""
    # 访问图状态
    history = runtime.state["messages"]

    # 访问 Store
    cached = runtime.store.get(("cache",), query) if runtime.store else None
    if cached:
        return cached.value["result"]

    # 访问上下文
    user_id = runtime.context.user_id if runtime.context else "anon"

    # 流式写入
    runtime.stream_writer({"status": "processing", "user": user_id})

    result = f"Result for {query} by {user_id}"

    # 缓存结果
    if runtime.store:
        runtime.store.put(("cache",), query, {"result": result})

    return result

16.6.3-bis 源码核对：ToolRuntime 的真实字段是 8 个，不是 6 个

§16.6.3 给出的 ToolRuntime 字段表少了两项。对照 tool_node.py:1610 的真实 dataclass 定义：

@dataclass
class ToolRuntime(_DirectlyInjectedToolArg, Generic[ContextT, StateT]):
    state: StateT
    context: ContextT
    config: RunnableConfig
    stream_writer: StreamWriter
    tool_call_id: str | None
    store: BaseStore | None
    execution_info: ExecutionInfo | None = None   # ← 遗漏
    server_info: ServerInfo | None = None         # ← 遗漏

execution_info 和 server_info 是后加的——用于把执行元信息（运行 ID、父图、step 序号）和 LangGraph Platform 的服务器上下文（部署 ID、assistant ID）透传到工具里。两个字段都 Optional 默认 None，本地开发时没有；在 LangGraph Platform 运行时会由调度器填入。

ToolRuntime 还有两个容易错过的工程细节：

1、“No Annotated wrapper is needed”（docstring 第 1573 行）。InjectedState/InjectedStore 必须写 Annotated[dict, InjectedState]，ToolRuntime 直接 runtime: ToolRuntime 就行。区别在哪？ToolRuntime 继承了 _DirectlyInjectedToolArg——这是框架给”直接类型注解就能识别”的标记类。_get_all_injected_args 扫描工具签名时，发现某个参数的类型是 _DirectlyInjectedToolArg 的子类，就直接当成注入点，不用 Annotated。这减轻了用户侧的类型声明负担——一个更简洁的 API 入口。

2、docstring 里的自反性注释：“This is a marker class used for type checking and detection. The actual runtime object will be constructed during tool execution.”——这条注释提醒读者：当你在 IDE 里跳转到 ToolRuntime 定义时看到的 state: StateT 不是”此刻就有值”，真实的 ToolRuntime 实例是每次工具被调用时由 ToolNode 现构造。用户写工具时按类型访问这些字段是安全的；用户不该自己实例化 ToolRuntime 手动传。

3、_InjectedArgs 是 ToolNode 初始化时预计算的（tool_node.py:564 定义，:771 初始化为 self._injected_args: dict[str, _InjectedArgs] = {}）。每个工具只扫一次签名、存进 dict，执行时直接查表，不重复反射。这是一个很典型的”把反射从热路径挪到初始化”的优化——和第 7 章 Pregel 讲过的”channel 订阅在编译期确定”是一个套路。

16.7 ToolCallWithContext：v2 的内部机制

16.7.1 数据结构

在 v2 版本中，每个工具调用通过 Send API 分发到独立的 ToolNode 实例。ToolCallWithContext 是 Send 携带的有效载荷：

class ToolCallWithContext(TypedDict):
    tool_call: ToolCall
    __type: Literal["tool_call_with_context"]
    state: Any

16.7.1-bis should_continue 里 v1/v2 的两套逻辑

create_react_agent 里 v1 和 v2 的真实差别在 should_continue（chat_agent_executor.py:830-859）的 else 分支里——这段代码本身短到值得全文引用：

def should_continue(state):
    messages = _get_state_value(state, "messages")
    last_message = messages[-1]
    # If there is no function call, then we finish
    if not isinstance(last_message, AIMessage) or not last_message.tool_calls:
        if post_model_hook is not None:
            return "post_model_hook"
        elif response_format is not None:
            return "generate_structured_response"
        else:
            return END
    # Otherwise if there is, we continue
    else:
        if version == "v1":
            return "tools"                         # ← v1: 路由到一个 tools 节点
        elif version == "v2":
            if post_model_hook is not None:
                return "post_model_hook"
            return [
                Send(
                    "tools",
                    ToolCallWithContext(
                        __type="tool_call_with_context",
                        tool_call=call,
                        state=state,
                    ),
                )
                for call in last_message.tool_calls
            ]                                       # ← v2: 给每个 tool_call 发一个 Send

重新审视 v1/v2 的区别：

v1（返回字符串 "tools"）：触发普通的条件边路由，一个 ToolNode 实例接受全部 tool_calls、内部自己并行处理。Checkpoint 粒度是”agent 步 → tools 步 → agent 步”，tools 步里所有工具调用捆在一次 super-step里。

v2（返回 list[Send]）：Pregel 碰到 list of Send 会为每个 Send 任务独立调度——等价于在同一个 super-step 里生成 N 个”tools”节点的并行任务，每个任务的 payload 是 ToolCallWithContext（包装了单个 tool_call + 当前 state）。Checkpoint 粒度变成”每个 tool_call 都是一个独立任务”。

这种设计解释了为什么 v2 的 interrupt 能”中断单个工具调用”——每个 Send 任务都是独立 Pregel task，Pregel 的 interrupt_before/interrupt_after 是按 task 触发的，每个 tool call 都能单独拦。v1 做不到，因为 tools 整步是一个原子任务。

另一个从代码能读出的 opinion 是 v1 → v2 的默认切换是破坏性的——version 参数默认 "v2"。如果你有 v0.x 时期写的自定义 ToolNode 或 middleware 假设”tools 节点一次拿到所有 calls”，升级到 v1.0 会发现拿到的是单个 tool_call 的 ToolCallWithContext。这是 §16.3.7 _parse_input 里要特判 __type == "tool_call_with_context" 的真正动因——不是历史包袱，是 v2 默认行为的必然要求。

16.7.2 分发流程

sequenceDiagram
    participant Agent as agent 节点
    participant Route as 条件边
    participant Send as Send API
    participant TN1 as ToolNode 实例 1
    participant TN2 as ToolNode 实例 2

    Agent->>Route: AIMessage(tool_calls=[A, B])
    Route->>Send: [Send("tools", {tool_call: A, state}),<br/>Send("tools", {tool_call: B, state})]
    Send->>TN1: ToolCallWithContext(tool_call=A)
    Send->>TN2: ToolCallWithContext(tool_call=B)
    TN1-->>Route: ToolMessage for A
    TN2-->>Route: ToolMessage for B

这使得每个工具调用在 Send 的框架下获得了独立的 checkpoint、中断能力和错误隔离。

16.8 设计决策

16.8.1 为什么 create_react_agent 接受 str 类型的 model？

graph = create_react_agent("openai:gpt-4", tools)

这是一个便利性设计——通过 langchain.chat_models.init_chat_model 支持字符串格式的模型标识。在快速原型开发时，开发者不需要导入和实例化具体的 ChatModel 类。

16.8.2 为什么 ToolNode 支持 handle_tool_errors？

在 Agent 循环中，工具执行失败是常态而非异常。LLM 可能生成无效的工具参数，外部 API 可能暂时不可用。如果每次工具失败都中断整个图的执行，用户体验会很差。handle_tool_errors 的默认行为是将错误转化为 ToolMessage，让 LLM 有机会修正错误并重试。

16.8.3 为什么 v2 使用 Send 而非内部并行？

v1 版本的 ToolNode 在内部使用线程池并行执行工具调用。v2 改用 Send API 的原因：

Checkpoint 粒度：每个 Send 任务有独立的 checkpoint，中断恢复更精确
人机交互：可以对单个工具调用设置 interrupt_before，实现细粒度的审批
架构一致性：复用 Pregel 的任务调度，而非在 ToolNode 中引入自己的并行机制

16.8.4 InjectedState vs ToolRuntime

InjectedState 和 InjectedStore 是较早的注入机制，使用 Annotated 类型标记。ToolRuntime 是更新的统一方案，将所有注入点合并为一个对象。推荐新代码使用 ToolRuntime：

# 旧方式（仍然支持）
@tool
def old_tool(x: int, state: Annotated[dict, InjectedState]) -> str: ...

# 新方式（推荐）
@tool
def new_tool(x: int, runtime: ToolRuntime) -> str:
    state = runtime.state  # 同样的能力
    store = runtime.store

16.9 组件之间的关系

graph TB
    CRA["create_react_agent"] -->|创建| SG[StateGraph]
    CRA -->|配置| TN[ToolNode]
    CRA -->|使用| TC[tools_condition]

    SG -->|编译| CSG[CompiledStateGraph]

    TN -->|使用| TCR[ToolCallRequest]
    TN -->|使用| TCW[ToolCallWrapper]
    TN -->|注入| IS[InjectedState]
    TN -->|注入| ISt[InjectedStore]
    TN -->|注入| TR[ToolRuntime]

    TC -->|路由| TN
    TC -->|路由| END_[END]

    CSG -->|执行| Pregel[Pregel 调度]
    Pregel -->|Send API| TN

16.7.3 v1 的并发执行：get_executor_for_config

前面讲 v2 用 Send 分发，那 v1 里一个 ToolNode 同时拿到多个 tool_call 是怎么并发的？tool_node.py:817 揭示了答案：

with get_executor_for_config(config) as executor:
    outputs = list(
        executor.map(self._run_one, tool_calls, input_types, tool_runtimes)
    )

一个 executor.map 把所有 tool_calls 并行跑完。get_executor_for_config 来自 langchain_core.runnables.config——它根据 config.max_concurrency（默认 None=不限）返回一个 ThreadPoolExecutor，with 块结束时自动 shutdown。

两条被默认行为”隐藏”的性质：

1、默认线程池大小不受限。如果你调用一轮 LLM 返回了 100 个 tool_calls，v1 的 ToolNode 会起 100 个线程同时跑——对纯 I/O 的工具（HTTP、数据库）问题不大，对CPU 密集型工具（本地推理、文件处理）会打翻 GIL 导致反而变慢。控制并发需要在 config.max_concurrency 或 RunnableConfig({"max_concurrency": 5}) 里显式指定。v2 的 Send 路径实际上也会走 Pregel 的并发限制，但配置入口不同。

2、async 路径用 asyncio.gather（tool_node.py:824 开始的 _afunc）不受 GIL 影响，适合 I/O 密集。大部分 LangChain tool 都是异步友好的，新项目建议整图走 async。

这条差别和第 7 章讲 Pregel 并发模型时的”task scheduler 按 max_concurrency 切片”是一致的：图引擎层的并发限制是 config 驱动的，ToolNode 在这里只是复用了底层 config 的 executor，不自己发明并发原语——这种”不重复造轮子”的工程克制是框架长期维护的关键。

16.9-bis 本章在全书体系中的坐标

回到本书的整体脉络——前面几章铺了 LangGraph 的底层骨架，这一章把”把骨架包装成可开箱即用的 agent”这层表层 API 展开。几个承接关系值得明示：

承接第 2 章 StateGraph：create_react_agent 内部的 builder = StateGraph(...); builder.add_node("agent", call_model); builder.add_node("tools", ToolNode(tools)); builder.add_conditional_edges(...) ——这一整套就是第 2 章示例代码的放大版。理解了 StateGraph 的 add_node/add_edge/compile 三板斧，看 create_react_agent 就是看一个”用户本来可以自己写、但要写很多次”的预设。
承接第 7 章 Pregel 调度：本章 §16.2.4 的”v2 用 Send API 分发每个 tool_call” 直接依赖第 7 章讲过的 Pregel super-step 模型——Send 是 super-step 内”额外制造一个任务”的唯一方式。多工具并行执行的所有 checkpoint/interrupt/错误隔离特性，都来自 Pregel 把 Send 任务当成普通任务来调度，没有特例化。
承接第 13 章 Checkpoint：§16.3.3-ter 里 GraphBubbleUp 永远穿透的设计——为什么 interrupt 必须穿透？因为 Pregel 在碰到 interrupt 时要把当前 super-step 的部分完成状态写进 Checkpoint，等用户 Command 恢复时再接着跑。ToolNode 要是吞了 interrupt，Checkpoint 就永远写不进去、“等待人类答复”的状态就无从保存——Checkpoint 的可用性依赖 interrupt 的可穿透性。
承接第 15 章 Store：InjectedStore 是 Store 在工具层的消费接口，_inject_tool_args 的”剥离 + 补回”安全层是对”Store 的 key 空间不应被 LLM 触碰”这条约束的落地。
为第 17 章 multi-agent 铺垫：本章结尾讲到的 Command(goto=agent_name) 正是第 17 章 supervisor/swarm 架构的通信原语。transfer_to_agent 这类工具返回 Command 后，Pregel 会根据 goto 跳到对应节点——跨 agent 的控制流传递本质上就是”工具返回 Command”。下一章将看到这个能力如何组合出完整的多 agent 编排。

这种”每一章都在更底层或更高层衔接一次”的设计不是偶然。LangGraph 的设计者在 v1.0 把 create_react_agent 挪到 langchain 层（§16.1.1），正是为了强化这种分层：LangGraph 负责”有状态的图执行引擎”、LangChain 负责”应用形态和 prompt 模式”——这两条分界是理解整个 LangChain/LangGraph 生态的关键。读者如果之前混淆”什么时候用 langgraph 什么时候用 langchain”，到这里应该能给自己一个清晰的回答。

16.9-ter 本章覆盖了 prebuilt 的哪些对象

便于你自查本章把 langgraph.prebuilt 的导出面都走了没。打开 libs/prebuilt/langgraph/prebuilt/__init__.py，实际 __all__ 只有 7 个名字：

名称	本章位置	状态
`create_react_agent`	§16.2	deprecated（迁 langchain.agents.create_agent）
`ToolNode`	§16.3	保留、稳定底层原语
`tools_condition`	§16.4	保留
`ValidationNode`	§16.5	deprecated（用 create_agent + custom handle_tool_errors）
`InjectedState`	§16.6.1	保留
`InjectedStore`	§16.6.2	保留
`ToolRuntime`	§16.6.3	保留、推荐新方式

本章对这 7 个对象都有专门段落，没有遗漏——对照打对勾，也意味着你读完这一章就读完了 langgraph.prebuilt 的公共面。这个模块在 v1.0 之后是有意保持小的——应用层的抽象去 langchain 找、图执行的原语留在 langgraph——这条边界在 §16.1.1 和 §16.9-bis 都强调过，这里第三次收尾复述，是因为它是初学者最容易搞错的分界。

16.10 小结

本章分析了 LangGraph 预构建 Agent 组件层的设计与实现。create_react_agent 作为一站式工厂函数，将 StateGraph 构建、ToolNode 配置、条件路由等步骤封装为单一调用，同时通过丰富的参数（prompt、response_format、pre/post_model_hook 等）保持完整的可配置性。

ToolNode 是这套组件的核心执行器，它处理了工具执行中的所有复杂性——参数注入、并行执行、错误处理、Command 传播。v2 版本通过 Send API 实现了更精细的工具调用分发，使每个工具调用获得了独立的 checkpoint 和中断能力。

ToolRuntime 的引入统一了工具级别的依赖注入，将 state、store、context、config、stream_writer、tool_call_id 打包为一个类型安全的对象。结合 InjectedState 和 InjectedStore 的向后兼容，开发者可以根据偏好选择注入方式。

这些预构建组件体现了 LangGraph 的分层设计哲学：底层提供灵活的原语（StateGraph、Channel、Send），上层提供开箱即用的解决方案（create_react_agent、ToolNode），中间层通过标准化接口（BaseStore、Runtime、StreamWriter）连接两者。开发者可以直接使用上层组件快速启动，也可以在理解底层原理后进行深度定制。

下一章我们将探讨多 Agent 模式——如何使用 LangGraph 构建 Supervisor、Swarm、分层和协作等多种 Agent 架构。

本章源码定位：所有引用均来自 langgraph-latest 仓库的 libs/prebuilt/langgraph/prebuilt/ 目录——tool_node.py（1892 行，ToolNode/InjectedState/InjectedStore/ToolRuntime/tools_condition）、chat_agent_executor.py（1015 行，create_react_agent + AgentState 系列）、tool_validator.py（221 行，ValidationNode）、interrupt.py（105 行，人机确认辅助）、__init__.py（21 行，公共导出）。版本层面，本章描述的是 @deprecated(LangGraphDeprecatedSinceV10) 标注生效之后的源码状态，迁移指南以 §16.1.1 对照表为准。

第16章 预构建 Agent 组件

16.1 引言

16.1.1 v1.0 以来的 API 迁移：create_react_agent 已 deprecated、ToolNode 保留

16.2 create_react_agent 工厂函数

16.2.1 签名概览

16.2.2 构建流程

16.2.3 模型处理

16.2.4 v1 vs v2 版本差异

16.2.3-bis pre_model_hook / post_model_hook / response_format：不提 agent loop，单讲附加图拓扑

16.2.4-bis _should_bind_tools：三种”模型已经绑过工具了吗”的分支

16.2.5 remaining_steps 安全机制

16.2.5-bis _are_more_steps_needed 的真实判断表

16.2.5-ter _validate_chat_history：为什么 LLM 提供商都要求”tool_calls 必须有配对 ToolMessage”

16.3 ToolNode 实现

16.3.1 核心职责

16.3.2 工具执行流程

16.3.2-bis 源码核对：handle_tool_errors 的真实默认值是什么

16.3.3 错误处理策略

16.3.3-bis TOOL_CALL_ERROR_TEMPLATE：默认错误消息的原文

16.3.3-ter GraphBubbleUp：三种”永远不被吞”的中断

16.3.4 ToolCallRequest 与拦截器

16.3.4-bis _infer_handled_types：从类型注解推断异常

16.3.5 Command 返回支持

16.3.5-bis msg_content_output：工具返回任意类型如何变成 ToolMessage.content

16.3.5-ter INVALID_TOOL_NAME_ERROR_TEMPLATE：LLM 产生幻觉工具名时的保护

16.3.6 _inject_tool_args：防止 LLM 伪造 InjectedToolArg 的安全层

16.3.7 _parse_input：四种输入格式的识别逻辑

16.4 tools_condition 路由

16.4.1 实现

16.4.2 在图中使用

16.5 ValidationNode

16.5.1 设计动机

16.5.2 验证流程

16.5.3 使用示例

16.5.4 源码核对：ValidationNode 也 deprecated 了，默认错误消息的原文

16.5.5 _filter_validation_errors：不要把框架内部字段暴露给 LLM

16.6 InjectedState 与 InjectedStore

16.6.1 InjectedState

16.6.2 InjectedStore

16.6.3 ToolRuntime：统一的工具运行时

16.6.3-bis 源码核对：ToolRuntime 的真实字段是 8 个，不是 6 个

16.7 ToolCallWithContext：v2 的内部机制

16.7.1 数据结构

16.7.1-bis should_continue 里 v1/v2 的两套逻辑

16.7.2 分发流程

16.8 设计决策

16.8.1 为什么 create_react_agent 接受 str 类型的 model？

16.8.2 为什么 ToolNode 支持 handle_tool_errors？

16.8.3 为什么 v2 使用 Send 而非内部并行？

16.8.4 InjectedState vs ToolRuntime

16.9 组件之间的关系

16.7.3 v1 的并发执行：get_executor_for_config

16.9-bis 本章在全书体系中的坐标

16.9-ter 本章覆盖了 prebuilt 的哪些对象

16.10 小结

第16章预构建 Agent 组件

16.1.1 v1.0 以来的 API 迁移：`create_react_agent` 已 deprecated、`ToolNode` 保留

16.3.2-bis 源码核对：`handle_tool_errors` 的真实默认值是什么