Engineering blog

Real engineering problems we ran into while building an LLM API gateway.

GLM 5.2 Reasoning Effort: the Setting That Cuts Cost 20x (Measured)

GLM 5.2 Reasoning Effort: the Setting That Cuts Cost 20x (Measured)

June 24, 2026 · glm · coding · llm-gateway · cost · reasoning

Same coding answer: $0.0031 with reasoning effort set right vs $0.062 on GLM 5.2's unbounded default. 20x cheaper, 30x faster. How to set the dial per task.

Claude Fable 5 Won't Run Under ZDR: 30-Day Retention Is Mandatory

Claude Fable 5 Won't Run Under ZDR: 30-Day Retention Is Mandatory

June 12, 2026 · claude-fable-5 · data-retention · compliance

ZDR orgs get a 400 error on claude-fable-5: no opt-out on the Claude API, Bedrock, Vertex or Foundry. What it breaks for HIPAA/COPPA, and the routing fix.

LLM Prompt Caching: The Complete 2026 Guide

LLM Prompt Caching: The Complete 2026 Guide

May 26, 2026 · prompt-cache · series-overview · llm-architecture

A five-part series on LLM prompt caching: KV cache architecture, provider comparison, Python tutorial, model-by-use-case matrix, LangChain integration.