Introduction
Three weeks into launching a fintech startup's payment processing API, we got our first angry enterprise client call. Their system had been submitting payment requests for 72 hours with amount: "149.99" a string instead of a number. Our API accepted it without complaint. The database stored it. The ledger service silently skipped those transactions because it couldn't do arithmetic on a string. By the time anyone noticed, 2,400 payments worth $340,000 were sitting in limbo.
The fix took 10 minutes. The reconciliation took two weeks. The client trust? Months to rebuild.
Our API had syntax validation JSON.parse() confirmed every payload was valid JSON. But valid JSON and correct JSON are two entirely different things. That incident taught me that JSON validation is a three-layer problem, and most teams only solve the first layer.
This guide covers all three layers:
- syntax validation (is it valid JSON?),
- structural validation (does it match the shape you expect?),
- and semantic validation (do the values actually make sense?).
I'll show you how to implement each layer with real code in TypeScript, Python, and Java and how to wire it all into your CI/CD pipeline so broken JSON never reaches production again.
The Three Levels of JSON Validation
Most developers think of validation as a binary: either JSON.parse() succeeds or it doesn't. In reality, there are three distinct levels, and each catches a different class of bug.
Level 1 - Syntax Validation (Is It Valid JSON?)
This is the baseline. Does the payload conform to RFC 8259? Are brackets matched? Are strings properly quoted? Are there no trailing commas?
// Invalid syntax trailing comma
{"name": "Mahesh", "age": 30,}
// Valid syntax
{"name": "Mahesh", "age": 30}Syntax validation catches typos, copy-paste errors, and malformed serialization output. It's necessary but nowhere near sufficient.
Level 2 - Structural Validation (Does It Match the Expected Shape?)
The payload is valid JSON, but does it have the fields you expect? Are the types correct? Is email a string? Is items an array? Is quantity a positive integer?
// Valid JSON but structurally wrong for a payment endpoint
{
"amount": "149.99",
"currency": 840,
"customer": {
"email": null,
"name": 12345
}
}This payload passes JSON.parse() without issue. But amount should be a number, currency should be a string like "USD", email shouldn't be null for a payment, and name definitely shouldn't be a number. Structural validation with JSON Schema catches all of these.
Level 3 - Semantic Validation (Do the Values Make Business Sense?)
The structure is correct amount is a number, currency is a string but does amount: -50.00 make sense for a payment? Does currency: "XYZ" correspond to a real ISO 4217 code? Is the shipping_date in the past for a new order?
// Valid JSON, correct structure, ❌ semantically wrong
{
"amount": -50.0,
"currency": "XYZ",
"shipping_date": "2020-01-01",
"quantity": 0
}Semantic validation requires business logic. No generic tool can do this for you it's application-specific. But the tooling you choose for Level 2 often provides hooks for Level 3 (custom validators, refinements, custom formats).
Syntax Validation - The Basics Done Right
Why JSON.parse() Error Messages Are Terrible
Every JavaScript developer has seen this:
SyntaxError: Unexpected token } in JSON at position 847Position 847. In a 2,000-character payload. No line number. No column. No indication of what was expected instead. You're left counting characters manually or pasting into a tool.
The problem is that JSON.parse() was designed for parsing, not for error reporting. It bails at the first issue and gives you minimal context.
Building a Helpful Error Reporter
Here's a wrapper that provides actually useful error messages:
function validateJSONSyntax(input) {
try {
const parsed = JSON.parse(input);
return { valid: true, data: parsed, errors: [] };
} catch (error) {
const position = error.message.match(/position (\d+)/)?.[1];
if (position) {
const pos = parseInt(position, 10);
const lines = input.substring(0, pos).split('\n');
const line = lines.length;
const column = lines[lines.length - 1].length + 1;
const context = input.substring(
Math.max(0, pos - 30),
Math.min(input.length, pos + 30)
);
return {
valid: false,
data: null,
errors: [
{
message: error.message,
line,
column,
context: `...${context}...`,
hint: `Check around line ${line}, column ${column}`,
},
],
};
}
return { valid: false, data: null, errors: [{ message: error.message }] };
}
}This gives you line numbers, column positions, and surrounding context the difference between a 30-second fix and a 10-minute hunt.
For quick syntax checks without writing code, tools like OnlineJSONFormatt's validator pinpoint the exact line and character causing the error, with clear descriptions of what's wrong and what was expected.
The 5 Most Common Syntax Errors
From my experience fixing hundreds of broken JSON payloads, these account for ~90% of syntax failures:
| Error | Example | Fix |
|---|---|---|
| Trailing comma | {"a": 1,} | Remove the final comma |
| Single quotes | {'name': 'Alice'} | Replace with double quotes |
| Unquoted keys | {name: "Alice"} | Wrap keys in double quotes |
| Comments | {"a": 1} // config | Remove comments entirely |
| Missing comma | {"a": 1 "b": 2} | Add comma between properties |
If you're dealing with JSON that has these issues regularly (common when hand-editing config files or copy-pasting from documentation), the Fix JSON tool auto-repairs all five patterns deterministically.
JSON Schema - Structural Validation That Scales
JSON Schema is a vocabulary that lets you describe the structure of your JSON documents. Think of it as TypeScript types, but for JSON payloads and it works across every language.
Writing Your First Schema (Real Example)
Let's say you're building an e-commerce API. Your POST /orders endpoint expects this payload:
{
"customer_id": "cust_8x7k2m",
"items": [
{
"product_id": "prod_abc123",
"quantity": 2,
"unit_price": 29.99
}
],
"shipping_address": {
"street": "742 Evergreen Terrace",
"city": "Springfield",
"state": "IL",
"zip": "62704",
"country": "US"
},
"payment_method": "card_live_9f8e7d",
"notes": "Leave at front door"
}Here's the JSON Schema that validates this structure:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "CreateOrder",
"description": "Schema for POST /orders request body",
"type": "object",
"required": ["customer_id", "items", "shipping_address", "payment_method"],
"properties": {
"customer_id": {
"type": "string",
"pattern": "^cust_[a-z0-9]+$",
"description": "Customer identifier in format cust_xxxxx"
},
"items": {
"type": "array",
"minItems": 1,
"maxItems": 50,
"items": {
"type": "object",
"required": ["product_id", "quantity", "unit_price"],
"properties": {
"product_id": {
"type": "string",
"pattern": "^prod_[a-z0-9]+$"
},
"quantity": {
"type": "integer",
"minimum": 1,
"maximum": 999
},
"unit_price": {
"type": "number",
"minimum": 0.01,
"maximum": 99999.99
}
},
"additionalProperties": false
}
},
"shipping_address": {
"type": "object",
"required": ["street", "city", "state", "zip", "country"],
"properties": {
"street": { "type": "string", "minLength": 1, "maxLength": 200 },
"city": { "type": "string", "minLength": 1, "maxLength": 100 },
"state": { "type": "string", "minLength": 2, "maxLength": 2 },
"zip": { "type": "string", "pattern": "^[0-9]{5}(-[0-9]{4})?$" },
"country": {
"type": "string",
"enum": ["US", "CA", "GB", "AU", "DE", "FR"]
}
},
"additionalProperties": false
},
"payment_method": {
"type": "string",
"pattern": "^card_live_[a-z0-9]+$"
},
"notes": {
"type": "string",
"maxLength": 500
}
},
"additionalProperties": false
}This schema catches: missing required fields, wrong types (amount as string), invalid formats (bad customer ID pattern), out-of-range values (quantity of 0 or 1000), and unexpected extra fields.
Validating with Ajv (Node.js)
Ajv is the fastest JSON Schema validator for JavaScript. Here's how to use it with the schema above:
import Ajv from 'ajv';
import addFormats from 'ajv-formats';
const ajv = new Ajv({ allErrors: true, verbose: true });
addFormats(ajv);
const schema = {
/* the schema from above */
};
const validate = ajv.compile(schema);
function validateOrder(payload) {
const valid = validate(payload);
if (!valid) {
const errors = validate.errors.map(err => ({
path: err.instancePath || '/',
message: err.message,
params: err.params,
// Human-readable: "items/0/quantity must be >= 1"
readable: `${err.instancePath || 'root'} ${err.message}`,
}));
return { valid: false, errors };
}
return { valid: true, errors: [] };
}
// Example: This will fail validation
const badOrder = {
customer_id: 'cust_8x7k2m',
items: [{ product_id: 'prod_abc', quantity: 0, unit_price: 'free' }],
shipping_address: {
street: '',
city: 'Springfield',
state: 'Illinois',
zip: 'abc',
country: 'US',
},
payment_method: 'card_live_9f8e7d',
};
const result = validateOrder(badOrder);
// result.errors:
// - /items/0/quantity must be >= 1
// - /items/0/unit_price must be number
// - /shipping_address/street must NOT have fewer than 1 characters
// - /shipping_address/state must NOT have more than 2 characters
// - /shipping_address/zip must match pattern "^[0-9]{5}(-[0-9]{4})?$"Notice how each error tells you exactly where the problem is (JSON Pointer path) and what's wrong. Compare that to JSON.parse() which just says "it's fine" for this payload.
Advanced Patterns: Conditional Schemas
Real APIs often have conditional requirements. For example, if shipping_method is "express", then phone_number becomes required:
{
"if": {
"properties": { "shipping_method": { "const": "express" } },
"required": ["shipping_method"]
},
"then": {
"required": ["phone_number"],
"properties": {
"phone_number": { "type": "string", "pattern": "^\\+[1-9]\\d{1,14}$" }
}
}
}Generating Schemas from Existing JSON
If you're retrofitting validation onto an existing API, writing schemas from scratch is tedious. Tools like json-schema-generator or quicktype can reverse-engineer a schema from sample payloads:
# Generate schema from sample JSON
npx quicktype --src sample-response.json --src-lang json --lang schema -o order-schema.jsonThen refine the generated schema: add required fields, tighten pattern constraints, set minimum/maximum bounds. The generated schema is a starting point, not the final product.
Runtime Validation in Application Code
JSON Schema is great for API boundaries, but inside your application code, you want validation that integrates with your type system. Here's how the three major ecosystems handle it.
TypeScript: Zod - Parse, Don't Validate
Zod has become the standard for TypeScript runtime validation. The philosophy is "parse, don't validate" instead of checking if data matches a type, you parse it into a typed value or get a detailed error.
import { z } from 'zod';
const OrderItemSchema = z.object({
product_id: z.string().regex(/^prod_[a-z0-9]+$/),
quantity: z.number().int().min(1).max(999),
unit_price: z.number().min(0.01).max(99999.99),
});
const ShippingAddressSchema = z.object({
street: z.string().min(1).max(200),
city: z.string().min(1).max(100),
state: z.string().length(2),
zip: z.string().regex(/^[0-9]{5}(-[0-9]{4})?$/),
country: z.enum(['US', 'CA', 'GB', 'AU', 'DE', 'FR']),
});
const CreateOrderSchema = z.object({
customer_id: z.string().regex(/^cust_[a-z0-9]+$/),
items: z.array(OrderItemSchema).min(1).max(50),
shipping_address: ShippingAddressSchema,
payment_method: z.string().regex(/^card_live_[a-z0-9]+$/),
notes: z.string().max(500).optional(),
});
// Type is automatically inferred no separate interface needed
type CreateOrder = z.infer<typeof CreateOrderSchema>;
// Usage in an Express/Fastify handler
function handleCreateOrder(req, res) {
const result = CreateOrderSchema.safeParse(req.body);
if (!result.success) {
// result.error.issues gives you detailed, path-aware errors
const errors = result.error.issues.map(issue => ({
path: issue.path.join('.'),
message: issue.message,
code: issue.code,
}));
return res.status(400).json({ errors });
}
// result.data is fully typed as CreateOrder
const order = result.data;
// TypeScript knows order.items[0].quantity is a number here
}The key advantage: your validation schema is your type definition. No drift between what you validate and what TypeScript thinks the type is.
Python: Pydantic - Validation as Data Models
Pydantic is Python's answer to the same problem. It uses Python type annotations to define validation rules:
from pydantic import BaseModel, Field, field_validator
from typing import Optional
from enum import Enum
class Country(str, Enum):
US = "US"
CA = "CA"
GB = "GB"
AU = "AU"
DE = "DE"
FR = "FR"
class OrderItem(BaseModel):
product_id: str = Field(pattern=r"^prod_[a-z0-9]+$")
quantity: int = Field(ge=1, le=999)
unit_price: float = Field(ge=0.01, le=99999.99)
class ShippingAddress(BaseModel):
street: str = Field(min_length=1, max_length=200)
city: str = Field(min_length=1, max_length=100)
state: str = Field(min_length=2, max_length=2)
zip: str = Field(pattern=r"^[0-9]{5}(-[0-9]{4})?$")
country: Country
class CreateOrder(BaseModel):
customer_id: str = Field(pattern=r"^cust_[a-z0-9]+$")
items: list[OrderItem] = Field(min_length=1, max_length=50)
shipping_address: ShippingAddress
payment_method: str = Field(pattern=r"^card_live_[a-z0-9]+$")
notes: Optional[str] = Field(default=None, max_length=500)
@field_validator('items')
@classmethod
def validate_total_amount(cls, items):
total = sum(item.quantity * item.unit_price for item in items)
if total > 50000:
raise ValueError(f'Order total ${total:.2f} exceeds $50,000 limit')
return items
# Usage in a FastAPI endpoint
from fastapi import FastAPI, HTTPException
app = FastAPI()
@app.post("/orders")
async def create_order(order: CreateOrder):
# Pydantic validates automatically invalid requests get 422 responses
# with detailed error messages including field paths
return {"order_id": "ord_new123", "total": calculate_total(order)}Notice the @field_validator that's Level 3 semantic validation (business rule: order total can't exceed $50,000) integrated directly into the structural validation model.
Java: Jackson + Bean Validation Enterprise-Grade Validation
In Java/Spring ecosystems, Jackson handles deserialization while Bean Validation (JSR 380) handles constraint checking:
import com.fasterxml.jackson.annotation.JsonProperty;
import jakarta.validation.Valid;
import jakarta.validation.constraints.*;
import java.util.List;
public class CreateOrderRequest {
@NotNull
@Pattern(regexp = "^cust_[a-z0-9]+$", message = "Invalid customer ID format")
@JsonProperty("customer_id")
private String customerId;
@NotNull
@Size(min = 1, max = 50, message = "Order must have 1-50 items")
@Valid // Triggers validation on nested objects
private List<OrderItem> items;
@NotNull
@Valid
@JsonProperty("shipping_address")
private ShippingAddress shippingAddress;
@NotNull
@Pattern(regexp = "^card_live_[a-z0-9]+$")
@JsonProperty("payment_method")
private String paymentMethod;
@Size(max = 500)
private String notes;
}
public class OrderItem {
@NotNull
@Pattern(regexp = "^prod_[a-z0-9]+$")
@JsonProperty("product_id")
private String productId;
@NotNull
@Min(value = 1)
@Max(value = 999)
private Integer quantity;
@NotNull
@DecimalMin(value = "0.01")
@DecimalMax(value = "99999.99")
@JsonProperty("unit_price")
private Double unitPrice;
}
// Spring Boot controller validation happens automatically
@RestController
@RequestMapping("/orders")
public class OrderController {
@PostMapping
public ResponseEntity<?> createOrder(
@Valid @RequestBody CreateOrderRequest request) {
// If validation fails, Spring returns 400 with field-level errors
// If we reach here, the request is structurally valid
Order order = orderService.create(request);
return ResponseEntity.status(201).body(order);
}
@ExceptionHandler(MethodArgumentNotValidException.class)
public ResponseEntity<Map<String, Object>> handleValidation(
MethodArgumentNotValidException ex) {
List<Map<String, String>> errors = ex.getBindingResult()
.getFieldErrors()
.stream()
.map(err -> Map.of(
"field", err.getField(),
"message", err.getDefaultMessage(),
"rejected", String.valueOf(err.getRejectedValue())
))
.toList();
return ResponseEntity.badRequest().body(Map.of("errors", errors));
}
}The @Valid annotation on nested objects is crucial without it, Jackson deserializes the nested JSON but doesn't validate its constraints. I've seen this miss bugs in production where the top-level object was validated but nested arrays of items were not.
Validation in CI/CD Pipelines
Runtime validation catches bad data at request time. But what about the JSON files in your repository config files, fixture data, feature flags, translation files? Those need validation at commit time, not deploy time.
The Case for CI/CD JSON Validation
Here's a real scenario: A team of 8 developers maintains a monorepo with 47 JSON config files. Feature flags, environment configs, i18n translations, API mock fixtures. Every month, at least one deployment fails because someone:
- Added a trailing comma to
config.production.json - Misspelled a feature flag key in
flags.json - Added a translation key in English but forgot French (
fr.json) - Changed the structure of a test fixture without updating the test
All of these are catchable before the code reaches production if you validate JSON in CI.
Complete GitHub Actions Workflow
name: JSON Validation
on:
pull_request:
paths:
- '**/*.json'
- '!package-lock.json'
- '!node_modules/**'
jobs:
validate-json:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install validation tools
run: npm install -g ajv-cli ajv-formats
- name: Syntax check all JSON files
run: |
echo "Checking JSON syntax..."
find . -name "*.json" \
-not -path "./node_modules/*" \
-not -path "./.next/*" \
-not -name "package-lock.json" \
-exec sh -c '
for file; do
if ! python3 -m json.tool "$file" > /dev/null 2>&1; then
echo "❌ Invalid JSON: $file"
python3 -m json.tool "$file" 2>&1 | tail -1
exit 1
fi
done
' _ {} +
echo "✅ All JSON files have valid syntax"
- name: Validate config files against schemas
run: |
# Validate environment configs
ajv validate -s schemas/env-config.schema.json \
-d "config/*.json" \
--all-errors --verbose
# Validate feature flags
ajv validate -s schemas/feature-flags.schema.json \
-d "config/flags/*.json" \
--all-errors --verbose
# Validate i18n translation files
ajv validate -s schemas/translations.schema.json \
-d "locales/**/*.json" \
--all-errors --verbose
- name: Check translation completeness
run: |
# Ensure all locale files have the same keys as English
node scripts/check-translation-keys.js
- name: Validate test fixtures match API schemas
run: |
ajv validate -s schemas/api/create-order.schema.json \
-d "tests/fixtures/orders/*.json" \
--all-errorsThis workflow runs only when JSON files change (saving CI minutes), skips package-lock.json and node_modules, and validates at three levels: syntax, schema compliance, and cross-file consistency.
Pre-Commit Hooks for Instant Feedback
CI catches errors, but waiting for a pipeline to tell you about a trailing comma is slow. Pre-commit hooks give instant local feedback:
// .husky/pre-commit (or use lint-staged)
{
"*.json": ["python3 -m json.tool --no-ensure-ascii", "prettier --check"]
}Or with lint-staged in package.json:
{
"lint-staged": {
"*.json": ["prettier --check", "node scripts/validate-json-schema.js"],
"config/**/*.json": ["ajv validate -s schemas/env-config.schema.json -d"]
}
}The combination of pre-commit hooks (fast, local) and CI validation (authoritative, can't be skipped) creates a safety net that catches JSON errors at two points before they reach production.
Common JSON Validation Pitfalls
After implementing validation across dozens of services, here are the mistakes I see teams make repeatedly:
Pitfall 1: The "Valid JSON, Wrong Data" Blind Spot
The most dangerous bugs pass syntax validation. Your monitoring shows zero JSON parse errors, so you assume data quality is fine. Meanwhile, age: "twenty-five" is flowing through your system because nobody added structural validation.
Fix: Treat JSON.parse() success as the minimum bar, not the finish line. Every API endpoint needs schema validation.
Pitfall 2: Overly Strict Schemas That Break on Optional Fields
I've seen teams deploy a schema that marks every field as required, then get paged at 2 AM because a partner API started omitting an optional field they'd never seen missing before.
// ❌ Too strict will reject valid payloads with optional fields omitted
{
"required": ["id", "name", "email", "phone", "avatar_url", "bio", "created_at"]
}
// ✅ Only require what's truly mandatory
{
"required": ["id", "name", "email"],
"properties": {
"phone": { "type": "string" },
"avatar_url": { "type": "string", "format": "uri" },
"bio": { "type": "string" }
}
}Fix: Start permissive, tighten gradually. Use additionalProperties: false only on endpoints you fully control. For third-party API responses, validate only the fields you actually use.
Pitfall 3: Validation Error Messages That Don't Help
// ❌ Useless
{ "error": "Validation failed" }
// ❌ Slightly better but still frustrating
{ "error": "Invalid input", "field": "items" }
// ✅ Actually helpful
{
"errors": [
{
"path": "items[0].quantity",
"message": "Must be at least 1",
"received": 0,
"expected": "integer >= 1"
},
{
"path": "shipping_address.zip",
"message": "Must match US ZIP code format",
"received": "abc",
"expected": "string matching ^[0-9]{5}(-[0-9]{4})?$"
}
]
}Fix: Always include the field path, what was expected, and what was received. Your API consumers (including your future self) will thank you.
Pitfall 4: Performance Impact on Hot Paths
JSON Schema validation with Ajv is fast (~microseconds for simple schemas), but if you're validating on every request in a high-throughput service (10,000+ req/s), it adds up. I measured a 3ms overhead per request on a complex schema with nested arrays.
Fix: Compile schemas once at startup (ajv.compile(schema)), not on every request. For extremely hot paths, consider validating only in non-production environments or sampling a percentage of requests.
Pitfall 5: Schema Drift Between Services
Service A publishes events with schema v2. Service B still validates against schema v1. Events get rejected silently. Nobody notices until a downstream report is missing data.
Fix: Store schemas in a shared registry (like a schemas/ directory in a shared repo, or a proper schema registry like Confluent's). Version schemas explicitly. Run compatibility checks in CI when schemas change.
Building a Validation Strategy for Your Team
Here's the checklist I use when setting up JSON validation for a new project:
What to Validate
| Layer | What | Where | Tool |
|---|---|---|---|
| Syntax | All JSON files in repo | Pre-commit + CI | python -m json.tool, Prettier |
| Structure | API request/response bodies | Runtime (middleware) | Zod, Pydantic, Jackson + Ajv |
| Structure | Config files | CI pipeline | Ajv CLI + JSON Schema |
| Semantic | Business rules (amounts, dates, ranges) | Application code | Custom validators, Zod .refine() |
| Consistency | Translation files, feature flags | CI pipeline | Custom scripts |
| Compatibility | Schema changes between services | CI pipeline | Schema registry + diff tools |
Implementation Order
- Week 1: Add syntax validation to CI (catches the low-hanging fruit)
- Week 2: Write JSON Schemas for your most critical config files
- Week 3: Add runtime validation (Zod/Pydantic) to your highest-traffic API endpoints
- Week 4: Add pre-commit hooks for instant local feedback
- Ongoing: Expand schema coverage as you touch each endpoint
You don't need to validate everything on day one. Start with the endpoints that have caused production incidents, then expand coverage incrementally.
Quick Validation During Development
When you're iterating on JSON structures during development testing API payloads, debugging config issues, or verifying schema compliance having a fast feedback loop matters. The OnlineJSONFormatt editor provides real-time syntax validation as you type, which is useful for quick checks before committing. For structural validation, pair it with your schema definitions in CI for the authoritative check.
Conclusion
JSON validation is a spectrum, not a checkbox. JSON.parse() succeeding doesn't mean your data is correct it means your brackets match. Real validation happens at three layers:
- Syntax: Is it valid JSON? (Catch typos and formatting errors)
- Structure: Does it match the expected shape? (Catch type mismatches and missing fields)
- Semantics: Do the values make business sense? (Catch logical errors)
The tools exist for every ecosystem: JSON Schema for language-agnostic structural validation, Zod for TypeScript, Pydantic for Python, Jackson + Bean Validation for Java. The CI/CD integration is straightforward with GitHub Actions and pre-commit hooks.
The cost of not validating is always higher than the cost of implementing it. That fintech startup I mentioned at the beginning? They now validate at all three levels. The schema catches 95% of bad payloads before they hit business logic. The remaining 5% get caught by semantic validators. Zero reconciliation incidents in the 18 months since.
Start with syntax validation in CI this week. Add schema validation to your most critical endpoint next week. Build from there.