LIMITED$50 in. 10% off all models for 30 days.

Engineering blog

Real engineering problems we ran into while building an LLM API gateway.

  • LLM Prompt Caching: The Complete 2026 Guide

    May 26, 2026 · prompt-cache · series-overview · llm-architecture

    A four-part series on LLM prompt caching: KV cache architecture, provider comparison, working Python tutorial, and best-model-by-use-case decision matrix.

  • LLM Prompt Caching #4: Best Model for Chat, RAG & Agents

    May 25, 2026 · llm-selection · agents · rag · chatbot

    Decision matrix matching LLM workload — chatbots, RAG APIs, AI agents — to the right model and caching strategy. Real 2026 pricing, cost math per scenario.

  • LLM Prompt Caching #3: Working Python Tutorial

    May 24, 2026 · prompt-cache · tutorial · python

    Measured prompt-cache savings across Claude, GPT-5, Gemini 2.5, DeepSeek-v4 and Qwen3 via Synthorai's OpenAI-compatible gateway. Real usage.cost and TTFT.

  • LLM Prompt Caching #2: Compare Claude, GPT, Gemini, DeepSeek

    May 23, 2026 · prompt-cache · llm-providers · evaluation

    Anthropic Claude, OpenAI GPT-5, Gemini 2.5, DeepSeek-v4 and Qwen3 expose prompt caching in five different shapes — measured 2026 feature comparison.

  • LLM Prompt Caching #1: How KV Cache & TTL Work

    May 22, 2026 · prompt-cache · transformer · llm-architecture

    How LLM prompt caching actually works: Transformer attention math behind K/V reuse, the memory-compute tradeoff that shapes TTL, and why it cuts cost and TTFT.