Taming LLMs: A Guide to Reliable Structured JSON Outputs

Taming LLMs: A Guide to Reliable Structured JSON Outputs

As AI application developers, we fight a central conflict every day: Large Language Models (LLMs) are probabilistic, while software engineering is deterministic.

LLMs are natively text completion engines. Their training data consists of Shakespeare, GitHub code, and Wikipedia. Their instinct is to generate fluid, conversational language.

But our backends and APIs need JSON.

You’ve likely experienced it: you design a complex prompt asking for JSON, and while it works 90% of the time, the model occasionally returns: "Sure! Here is the JSON you requested: { ... }" Or worse, it misses a closing brace, causing your JSON.parse() to throw an exception and crash your application.

Today, we dive into the three stages of stable structured output: from the basic JSON Mode to advanced Tool Calling, and finally, the ultimate Zod Self-Correction.

Phase 1: Prompt Engineering & JSON Mode

Before official API support, we had to rely on begging the model via prompts.

1. Basic Prompting Heuristics

  • Few-Shot Examples: Explicitly show an Input/Output pair.
  • Rigid Constraints: Use a system prompt emphasizing “Do not output any explanation. Output valid JSON only.”
  • Pre-filling (Claude Style): Start the assistant’s message with a {. This baits the model into completing the rest of the JSON object.

2. Native JSON Mode

OpenAI and other providers now offer a response_format: { type: "json_object" } parameter. This virtually eliminates “invalid syntax” errors. The model is constrained to generate tokens that conform to a JSON syntax tree.

⚠️ A Critical Pitfall: Even with JSON Mode enabled, you must include the word “JSON” in your system prompt. Otherwise, the API will throw a 400 Bad Request.

const completion = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { 
      role: "system", 
      content: "You are a helpful assistant designed to output JSON." // Must mention "JSON"
    },
    { role: "user", content: "Who won the world series in 2020?" }
  ],
  response_format: { type: "json_object" },
});

Limitation: It guarantees valid syntax, but not a valid schema. It might return { "winner": "Dodgers" } today and { "team": "Dodgers", "year": 2020 } tomorrow. You still need manual type checking.

Phase 2: Tool Calling (Function Calling)

This is currently the most robust industry standard. It leverages the model’s fine-tuned Tool Calling capability to generate JSON.

Instead of saying “Give me JSON,” we define a tool: save_user_info(name: string, age: number).

To “call” this function, the model is compelled—by its training-time fine-tuning objectives—to generate JSON arguments that strictly follow your defined schema.

const completion = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "My name is Alice and I am 30." }],
  tools: [{
    type: "function",
    function: {
      name: "extract_info",
      parameters: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "number" }
        },
        required: ["name", "age"]
      }
    }
  }],
  tool_choice: { type: "function", function: { name: "extract_info" } } // Force the call
});

const json = JSON.parse(completion.choices[0].message.tool_calls[0].function.arguments);

This is significantly more stable than JSON Mode because it constrains field names and data types at the model level.

Phase 3: Auto-Correction with Instructor & Zod

If you use TypeScript, you live by Zod. If you’re a Pythonista, you use Pydantic. Can we combine these validation libraries with LLMs?

The Instructor library (and the philosophy behind it) is built for this. Its core logic is: Validation + Retry.

We can build a loop like this:

  1. LLM generates JSON.
  2. Code parses it with Zod (schema.safeParse()).
  3. Path A: If valid, return the data. The “Happy Path.”
  4. Path B: If invalid (e.g., Zod reports Expected number, received string), we capture the error.
  5. Feedback Loop: Construct a new User Message containing the invalid JSON and the specific Zod error.

    “The JSON you generated is invalid. Field ‘age’ should be a number, but you provided a string. Please fix it.”

  6. LLM receives the feedback, performs a “Self-Correction,” and regenerates the content.

Implementation Concept (Pseudo-code)

import { z } from "zod";
const UserSchema = z.object({
  name: z.string(),
  skills: z.array(z.string()).max(3, "Max 3 skills allowed") // LLMs often ignore count limits
});

async function generateStructuredData(prompt: string, retries = 3) {
  let history = [{ role: "user", content: prompt }];
  while (retries > 0) {
    const response = await openai.chat.completions.create({ messages: history, ... });
    const content = response.choices[0].message.content;
    const json = JSON.parse(content);
    const parsed = UserSchema.safeParse(json);

    if (parsed.success) return parsed.data;

    // Capture error and send back to LLM for correction
    history.push({ role: "assistant", content: content });
    history.push({ 
      role: "user", 
      content: `Validation Error: ${parsed.error.message}. Please fix the JSON.` 
    });
    retries--;
  }
}

This pattern is incredibly powerful. It allows us to define complex business rules in the schema and leverage LLM reasoning to satisfy those constraints.

Summary

The essence of an LLM is text completion; it doesn’t naturally understand structure. Structured data is the bridge between the probabilistic world of AI and the deterministic world of traditional software.

  • Simple Scenarios: Use JSON Mode + Prompting.
  • Medium Scenarios: Use Tool Calling.
  • Complex Scenarios: Use the Instructor / Zod Loop for strict validation.

Mastering this bridge is what allows you to embed AI into automated business processes rather than just building simple chat interfaces.