Structured Output Failures: Why JSON Mode Returns Broken Data

You switched to JSON mode specifically to stop wrestling with unparseable LLM responses. Yet here you are, staring at a json.JSONDecodeError or, worse, a response that parsed fine but contains data you can't trust. JSON mode is not a silver bullet — it has a surprisingly specific set of failure modes that catch most developers off guard.

This article breaks down why structured output still goes wrong, how to detect each failure type, and what you can do in your code to handle them reliably.

What you'll learn

The difference between syntactic validity and semantic correctness in LLM outputs
The most common ways JSON mode fails even when the model is "following instructions"
How to write prompts and schemas that reduce failure rates
Practical validation patterns you can add to your pipeline today
When to retry, when to fallback, and when to escalate to a human

Prerequisites

This article assumes you're calling an LLM API (OpenAI, Anthropic, or similar) and have already enabled JSON mode or are passing a JSON schema. Basic familiarity with Python and JSON is assumed. Code examples use Python, but the concepts apply to any language.

What JSON Mode Actually Guarantees

It's worth being precise about what JSON mode promises. Most providers guarantee that the response will be syntactically valid JSON — meaning it can be parsed without raising a decode error. That's it.

JSON mode does not guarantee that the response matches your expected schema. It does not guarantee that field values are the correct type, that required keys are present, or that the data makes logical sense for your use case. Those guarantees are your responsibility.

A response can be perfectly valid JSON and completely useless to your application at the same time.

OpenAI's newer "Structured Outputs" feature (distinct from plain JSON mode) does enforce a schema more strictly, but even that has edge cases around optional fields, string enumerations, and deeply nested objects.

Failure Type 1: Truncated Responses

Truncation is the most common failure mode, and it's sneaky because the truncated JSON is invalid — but the model was following your instructions up to the point where it ran out of tokens.

You'll see errors like:

import json

response_text = '{"name": "Alice", "scores": [10, 20, 30'
json.loads(response_text)  # Raises json.JSONDecodeError

The fix has two parts. First, always set max_tokens high enough to accommodate your expected output, with headroom. If your schema produces responses averaging 400 tokens, set max_tokens to at least 800. Second, check the finish reason before you try to parse:

import json

def parse_structured_response(api_response):
    choice = api_response.choices[0]

    if choice.finish_reason != "stop":
        raise ValueError(
            f"Response did not finish cleanly. finish_reason={choice.finish_reason}"
        )

    return json.loads(choice.message.content)

If finish_reason is length, the model hit the token limit mid-output. Retry with a higher max_tokens value, not with the same settings.

Failure Type 2: Schema Mismatches

The model parsed your schema description from the system prompt and produced JSON — but the structure doesn't match what you described. A field you said would be an array comes back as a string. A required key is missing entirely. A numeric field is a quoted string.

This happens most often when your schema description is in plain English rather than a formal schema, or when the schema is complex enough that the model "forgets" part of it mid-generation.

The solution is two-pronged: use formal schema enforcement where the provider supports it, and always validate the parsed output against your schema in code.

from jsonschema import validate, ValidationError
import json

EXPECTED_SCHEMA = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "tags": {
            "type": "array",
            "items": {"type": "string"}
        }
    },
    "required": ["name", "age", "tags"]
}

def validate_response(raw_json: str) -> dict:
    data = json.loads(raw_json)
    try:
        validate(instance=data, schema=EXPECTED_SCHEMA)
    except ValidationError as e:
        raise ValueError(f"Schema validation failed: {e.message}")
    return data

The jsonschema library is your first line of defense. Use it on every response, not just when you suspect something is wrong.

Failure Type 3: Semantically Wrong Values

This one is the most dangerous because it passes every syntactic and schema check. The JSON is valid, the structure matches, the types are correct — and the values are wrong.

Examples: a sentiment field that says "positive" for clearly negative text, a date field that contains a plausible but incorrect date, a confidence score of 0.95 for an extraction the model clearly hallucinated.

There's no general solution here, but you can catch many cases with domain-specific validation:

from datetime import date

def validate_event(data: dict) -> dict:
    # Check enumeration values explicitly
    valid_sentiments = {"positive", "negative", "neutral"}
    if data["sentiment"] not in valid_sentiments:
        raise ValueError(f"Unexpected sentiment value: {data['sentiment']}")

    # Check ranges
    if not (0.0 <= data["confidence"] <= 1.0):
        raise ValueError(f"Confidence out of range: {data['confidence']}")

    # Check date plausibility
    event_date = date.fromisoformat(data["event_date"])
    if event_date < date(2000, 1, 1) or event_date > date(2100, 1, 1):
        raise ValueError(f"Implausible date: {event_date}")

    return data

The key principle: validate business rules, not just data types. Your schema can say a field is a string; only your application knows which strings are valid.

Failure Type 4: Prompt Injection in Structured Fields

If you're building LLM output based on user-supplied input (summarizing user text, extracting data from user documents), you're vulnerable to prompt injection inside structured fields.

A user's document might contain text like:

Structured Output Failures: Why JSON Mode Still Returns Broken Data

What you'll learn

Prerequisites

What JSON Mode Actually Guarantees

Failure Type 1: Truncated Responses

Failure Type 2: Schema Mismatches

Failure Type 3: Semantically Wrong Values

Failure Type 4: Prompt Injection in Structured Fields

Related Articles

Hybrid Search Pitfalls: Why Combining BM25 and Vectors Can Hurt Recall

Embedding Quantization Trade-offs: When Shrinking Vectors Kills Recall

Attention Sink Tokens: Why the First Few Tokens Skew LLM Outputs

Comments (0)

Leave a Comment

Structured Output Failures: Why JSON Mode Still Returns Broken Data

What you'll learn

Prerequisites

What JSON Mode Actually Guarantees

Failure Type 1: Truncated Responses

Failure Type 2: Schema Mismatches

Failure Type 3: Semantically Wrong Values

Failure Type 4: Prompt Injection in Structured Fields

Related Articles

Hybrid Search Pitfalls: Why Combining BM25 and Vectors Can Hurt Recall

Embedding Quantization Trade-offs: When Shrinking Vectors Kills Recall

Attention Sink Tokens: Why the First Few Tokens Skew LLM Outputs

Comments (0)

Leave a Comment

Stay ahead of the curve