Engineering blog
Real engineering problems we ran into while building an LLM API gateway.
-
LLM Prompt Caching: The Complete 2026 Guide
A four-part series on LLM prompt caching: KV cache architecture, provider comparison, working Python tutorial, and best-model-by-use-case decision matrix.
-
LLM Prompt Caching #4: Best Model for Chat, RAG & Agents
Decision matrix matching LLM workload — chatbots, RAG APIs, AI agents — to the right model and caching strategy. Real 2026 pricing, cost math per scenario.
-
LLM Prompt Caching #3: Working Python Tutorial
Measured prompt-cache savings across Claude, GPT-5, Gemini 2.5, DeepSeek-v4 and Qwen3 via Synthorai's OpenAI-compatible gateway. Real usage.cost and TTFT.
-
LLM Prompt Caching #2: Compare Claude, GPT, Gemini, DeepSeek
Anthropic Claude, OpenAI GPT-5, Gemini 2.5, DeepSeek-v4 and Qwen3 expose prompt caching in five different shapes — measured 2026 feature comparison.
-
LLM Prompt Caching #1: How KV Cache & TTL Work
How LLM prompt caching actually works: Transformer attention math behind K/V reuse, the memory-compute tradeoff that shapes TTL, and why it cuts cost and TTFT.