驯服 LLM：如何让大模型稳定输出 JSON

Updated on Feb 09, 2025

作为 AI 应用开发者，我们每天都在在这个核心矛盾搏斗： 大语言模型 (LLM) 是概率性的，而软件工程是确定性的。

LLM 天生是文本补全引擎 (Text Completion Engine)。它的训练数据是莎士比亚的诗、GitHub 的代码和维基百科。它的本能是生成流畅的自然语言。

但我们的后端接口 (API) 需要的是 JSON。

你一定遇到过这种情况：千辛万苦设计了 Prompt 只要 JSON，结果模型心情好的时候给你返回标准 JSON，心情不好的时候给你来一句： "Sure! Here is the JSON you requested: { ... }" 或者更惨，它生成的 JSON 缺了个括号，导致你的 JSON.parse() 抛出异常，整个程序崩溃。

今天我们深入探讨让 LLM 稳定输出结构化数据的三个阶段：从入门的 JSON Mode，到进阶的 Tool Calling，再到终极的 Zod Self-Correction。

第一阶段：Prompt Engineering & JSON Mode

在 OpenAI 推出官方支持之前，我们只能靠 Prompt 恳求模型。

1. 基础 Prompt 技巧

给例子 (Few-Shot)：明确展示 Input 和 Output。
强约束：System Prompt 里强调 “Do not output any explanation. Output valid JSON only.”
Pre-filling：这是 Anthropic Claude 的绝活。在 Assistant Message 的开头预填一个 {，诱导模型接着补全 JSON 的剩余部分。

2. OpenAI JSON Mode

OpenAI 官方推出了 response_format: { type: "json_object" } 参数。这基本上解决了“括号不匹配”的问题。模型被强制限制在生成的 Token 必须符合 JSON 语法树。

⚠️ 关键坑点：即使开了 JSON Mode，你也必须在 System Prompt 里显式包含 “JSON” 这个词。否则 API 会报错 400 Bad Request。

const completion = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { 
      role: "system", 
      content: "You are a helpful assistant designed to output JSON." // 必须带 JSON 这个词
    },
    { role: "user", content: "Who won the world series in 2020?" }
  ],
  response_format: { type: "json_object" },
});

局限性：它只保证语法正确，不保证 Schema 正确。它可能返回 { "winner": "Dodgers" }，也可能返回 { "team": "Dodgers", "year": 2020 }。你还得自己做类型检查。

第二阶段：Function Calling (Tool Use)

这是目前工业界最稳健的方法。利用模型微调过的 Tool Calling 能力来生成 JSON。

原理是：我不告诉模型“请生成 JSON”，而是通过 API 定义“我有一个函数 save_user_info(name: string, age: number)，请帮我调用它”。

为了调用这个函数，模型被迫——基于它训练时的微调目标——生成严格符合 Schema 的 JSON 参数。

const completion = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "My name is Alice and I am 30." }],
  tools: [{
    type: "function",
    function: {
      name: "extract_info",
      parameters: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "number" }
        },
        required: ["name", "age"]
      }
    }
  }],
  tool_choice: { type: "function", function: { name: "extract_info" } } // 强制调用
});

const json = JSON.parse(completion.choices[0].message.tool_calls[0].function.arguments);

这种方法的稳定性远高于 JSON Mode，因为它不仅约束了语法，还约束了字段名和类型。

第三阶段：Auto-Correction with Instructor & Zod (自我修正)

如果你是 TypeScript 用户，你一定知道 Zod。如果你是 Python 用户，你见过 Pydantic。能不能把这些 Schema Validation 库和 LLM 结合起来？

Instructor 库（及其类似理念的实现）就是做这个的。它的核心逻辑是：验证 + 重试 (Validation + Retry)。

我们可以构建这样一个循环：

LLM 生成 JSON。
代码用 Zod 解析 (schema.safeParse())。
分支 A：如果解析成功，直接返回数据。Happy Path!
分支 B：如果解析失败（比如 Zod 报错 Expected string, received number），我们捕获这个错误。
反馈循环：我们构造一个新的 User Message，包含 LLM 刚才生成的错误 JSON + Zod 抛出的具体 Error Message，把它“喂”回给 LLM。

“The JSON you just generated has errors. Field ‘age’ should be a number, but you gave a string. Please fix it.”
LLM 收到反馈，进行自我修正 (Self-Correction)，重新生成。

代码实现演示 (伪代码)

import { z } from "zod";
import { OpenAI } from "openai";

const UserSchema = z.object({
  name: z.string(),
  skills: z.array(z.string()).max(3, "Max 3 skills only") // LLM 经常忽略数量限制
});

async function generateStructuredData(prompt: string, retries = 3) {
  let history = [{ role: "user", content: prompt }];

  while (retries > 0) {
    const response = await openai.chat.completions.create({ messages: history, ... });
    const content = response.choices[0].message.content;
    const json = JSON.parse(content);

    const parsed = UserSchema.safeParse(json);

    if (parsed.success) {
      return parsed.data; // 成功！
    }

    // 失败，构造错误反馈
    console.warn(`Attempt failed: ${parsed.error.message}. Retrying...`);
    
    history.push({ role: "assistant", content: content });
    history.push({ 
      role: "user", 
      content: `Validation Error: ${parsed.error.message}. Please fix the JSON.` 
    });

    retries--;
  }
}

这种模式极其强大。它允许我们在 Schema 中定义复杂的业务逻辑（比如 .min(), .regex(), .refine()），并利用 LLM 的推理能力来满足这些要求。

总结

LLM 的本质是文本补全，它天生不懂结构。结构化输出是连接 AI（概率世界）和传统软件（确定性世界）的桥梁。

简单场景：用 JSON Mode + Prompt。
中等场景：用 Function Calling。
复杂场景（需要严格校验）：用 Instructor / Zod Loop。

掌握了这项技术，你才能真正把 AI 嵌入到自动化的业务流中，而不仅仅是做一个聊天机器人。