<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Synthorai Engineering Blog</title><description>LLM gateway engineering notes — BYOK, prompt caching, billing under partial failures, and protocol translation.</description><link>https://synthorai.io/</link><language>en</language><atom:link href="https://synthorai.io/blog/rss.xml" rel="self" type="application/rss+xml"/><managingEditor>hello@synthorai.io (Synthorai)</managingEditor><webMaster>hello@synthorai.io (Synthorai)</webMaster><item><title>LLM Prompt Caching: The Complete 2026 Guide</title><link>https://synthorai.io/blog/llm-prompt-caching-complete-guide/</link><guid isPermaLink="true">https://synthorai.io/blog/llm-prompt-caching-complete-guide/</guid><description>A four-part series on LLM prompt caching: KV cache architecture, provider comparison, working Python tutorial, and best-model-by-use-case decision matrix.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>prompt-cache</category><category>series-overview</category><category>llm-architecture</category></item><item><title>LLM Prompt Caching #4: Best Model for Chat, RAG &amp; Agents</title><link>https://synthorai.io/blog/best-llm-by-use-case-chat-api-agent/</link><guid isPermaLink="true">https://synthorai.io/blog/best-llm-by-use-case-chat-api-agent/</guid><description>Decision matrix matching LLM workload — chatbots, RAG APIs, AI agents — to the right model and caching strategy. Real 2026 pricing, cost math per scenario.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>llm-selection</category><category>agents</category><category>rag</category><category>chatbot</category></item><item><title>LLM Prompt Caching #3: Working Python Tutorial</title><link>https://synthorai.io/blog/prompt-caching-tutorial-code-examples/</link><guid isPermaLink="true">https://synthorai.io/blog/prompt-caching-tutorial-code-examples/</guid><description>Measured prompt-cache savings across Claude, GPT-5, Gemini 2.5, DeepSeek-v4 and Qwen3 via Synthorai&apos;s OpenAI-compatible gateway. Real usage.cost and TTFT.</description><pubDate>Sun, 24 May 2026 00:00:00 GMT</pubDate><category>prompt-cache</category><category>tutorial</category><category>python</category></item><item><title>LLM Prompt Caching #2: Compare Claude, GPT, Gemini, DeepSeek</title><link>https://synthorai.io/blog/provider-caching-comparison/</link><guid isPermaLink="true">https://synthorai.io/blog/provider-caching-comparison/</guid><description>Anthropic Claude, OpenAI GPT-5, Gemini 2.5, DeepSeek-v4 and Qwen3 expose prompt caching in five different shapes — measured 2026 feature comparison.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><category>prompt-cache</category><category>llm-providers</category><category>evaluation</category></item><item><title>LLM Prompt Caching #1: How KV Cache &amp; TTL Work</title><link>https://synthorai.io/blog/llm-prompt-caching-explained/</link><guid isPermaLink="true">https://synthorai.io/blog/llm-prompt-caching-explained/</guid><description>How LLM prompt caching actually works: Transformer attention math behind K/V reuse, the memory-compute tradeoff that shapes TTL, and why it cuts cost and TTFT.</description><pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate><category>prompt-cache</category><category>transformer</category><category>llm-architecture</category></item></channel></rss>