Claude Fable 5: Caching, Tokenizer & Cost vs Opus 4.6

Contents
  1. Availability
  2. The headline: Fable 5 is on the new tokenizer
  3. Caching behavior: the contract is unchanged
  4. TTL behavior: both windows honored
  5. The cost story: 2x price x 1.45x tokens
  6. Migration checklist (Opus → Fable 5)
  7. Bottom line
  8. FAQ

claude-fable-5 is now available on the Synthorai gateway. If you cache against the Claude line, the good news is that the caching and TTL contract is a carry-over: same cache_control markers, same 5-minute and 1-hour TTLs, same write premiums, same deep read discount. Your caching code moves over by changing one string.

The thing to budget for isn’t the cache mechanics — it’s the bill. Fable 5 lists at 2x the Opus token price, and it tokenizes the same English text into ~45% more tokens than Opus 4.6 (it’s on the post-4.6 tokenizer, identical to Opus 4.8). Those two multipliers stack. This post measures all of it so you don’t have to.

All numbers below were measured against https://synthorai.io/ (Anthropic-native /v1/messages) on 2026-06-10 with a stable ~6.6–9.6K-token English system prompt, max_tokens small, single sequential run. Cost figures are read from the gateway usage.cost field; ratios (token counts, write premium, read discount, cross-model cost) are the portable part — absolute dollars scale with your prompt. Reproduce against your own prompt before quoting them.


Availability

import os
from anthropic import Anthropic

anth = Anthropic(
    api_key=os.environ["SYNTHORAI_KEY"],
    base_url="https://synthorai.io/",   # SDK appends /v1/messages
)

msg = anth.messages.create(
    model="claude-fable-5",             # the only line that changes
    max_tokens=512,
    system=[
        {"type": "text", "text": SYSTEM_PROMPT,
         "cache_control": {"type": "ephemeral"}},
    ],
    messages=[{"role": "user", "content": question}],
)
print(msg.usage)   # input_tokens, cache_creation_input_tokens, cache_read_input_tokens, cost

Swap claude-opus-4-6claude-fable-5 and nothing in your caching path needs to move. Fable 5 is an Anthropic-native model with a 1M-token context window. One behavioral note: it is a reasoning model and emits thinking tokens by default — even a trivial “reply OK” returned output_tokens_details.thinking_tokens > 0 in our runs, where Opus 4.6/4.8 returned zero. Budget output tokens accordingly. The mechanics behind cache_control are covered in the caching tutorial; the architecture of why the cache exists is in Part 1 of the series.


The headline: Fable 5 is on the new tokenizer

The token count for the Opus line jumped at the 4.7 generation: the same English text that counted as ~6.6K tokens on 4.6 counts as ~9.6K on 4.8. Fable 5 lands on the new side — identical text reports the exact same token count as Opus 4.8.

ModelInput tokens (identical text)Tokenizer generation
claude-opus-4-66,614pre-4.7
claude-opus-4-89,619post-4.7
claude-fable-59,619post-4.7 (identical to 4.8)

The same system prompt is ~45% more tokens on Fable 5 than on Opus 4.6 (9,619 / 6,614 = 1.45). This is the single most important number to internalize before you migrate, because every downstream figure — cost, the 1,024-token cache-eligibility floor, your per-call budget — is computed in tokens.

We’re describing a measured observation — identical text, identical token count on Fable 5 and Opus 4.8, ~45% above Opus 4.6 — most consistent with the tokenizer/vocabulary update that shipped at the 4.7 generation. If you’re coming from 4.6 or earlier, re-measure; if you’re coming from 4.7/4.8, expect parity.


Caching behavior: the contract is unchanged

We ran the same no-cache / cold-write / warm-read sequence on each model. The discount structure is identical end to end — Fable 5 honors cache_control and reports the same usage fields (cache_creation_input_tokens, cache_read_input_tokens, and the ephemeral_5m / ephemeral_1h buckets).

Model5m cache write1h cache writeWarm read
claude-opus-4-61.25x2.00x~9% of no-cache
claude-opus-4-81.25x2.00x~6% of no-cache
claude-fable-51.24x1.99x~6% of no-cache

Two invariants hold across all three:

  • Write premium ≈ 1.25x (5m), ≈ 2x (1h). The first (cold) call costs ~1.25x the no-cache price to populate a 5-minute entry, or ~2x for a 1-hour entry. Break-even is one hit.
  • Read discount ≈ 90%+. A warm cache read on Fable 5 cost ~6% of the no-cache call — a ~94% discount, in line with (slightly better than) Anthropic’s documented ~90% cached-read economics. Reads stay deeply discounted regardless of TTL.

The percentages are flat across the line. As with the Opus 4.7 → 4.8 step, the higher absolute bill on Fable 5 is a price-and-token story, not a cache-economics story — covered next.


TTL behavior: both windows honored

Fable 5 supports the same two TTLs as the rest of the line: a 5-minute sliding default and an opt-in 1-hour window. We isolated each TTL with a unique prefix per call (so no stale entry could contaminate the result) and confirmed the usage object reports the correct bucket — cache_creation.ephemeral_5m_input_tokens or ephemeral_1h_input_tokens.

# 1-hour TTL — same marker syntax on Fable 5 as on the Opus line
"cache_control": {"type": "ephemeral", "ttl": "1h"}

The 1-hour write costs ~2x no-cache (vs ~1.25x for the 5-minute write), and reads stay at the deep discount regardless of TTL — identical to Opus 4.6/4.8. If you picked 5m for live chat and 1h for agents with human-in-the-loop pauses on Opus, keep those choices on Fable 5.


The cost story: 2x price x 1.45x tokens

Here is where Fable 5 actually differs. Two things push the bill up, and they multiply.

1. List price is 2x the Opus tier.

ModelInput ($/M)Output ($/M)Cache read ($/M)
claude-opus-4-6 / 4-85250.5
claude-fable-510501

2. The same text is ~45% more tokens than on 4.6 (the tokenizer shift above).

Multiply them and the same English prompt costs materially more. Measured against the identical system prompt on each model (gateway usage.cost, same single run):

ComparisonToken ratioPrice ratioSame-prompt cost ratio (measured)
Fable 5 vs Opus 4.81.00x2.0x2.0x
Fable 5 vs Opus 4.61.45x2.0x2.9x

So against Opus 4.8 (same tokenizer), Fable 5 is a clean 2x — pure price premium. Against Opus 4.6, the tokenizer change compounds the price change into roughly 2.9x the cost for the same prompt. Your cache discount is unchanged, but the absolute base it applies to is ~2.9x larger than it was on 4.6. If you sized a per-call budget against 4.6, re-do it.

A practical consequence: re-check the 1,024-token cache-eligibility floor. Anthropic only caches prefixes at or above a minimum size. A prompt that sat just under the floor on 4.6 (in old-tokenizer tokens) may clear it on Fable 5 (~45% more tokens) — and vice versa for size estimates built on the old count. Always read cache_creation_input_tokens / cache_read_input_tokens from the live response rather than estimating from a local tokenizer that may not match.


Migration checklist (Opus → Fable 5)

  • Caching code carries over verbatim. cache_control markers, breakpoint count (up to 4), ttl: "1h", usage-field names — all identical.
  • TTL choices carry over. 5m for live/session workloads, 1h for bursty/agent-with-pauses.
  • Discount economics carry over. ~90%+ read, ~1.25x write (5m), ~2x write (1h).
  • ⚠️ Re-budget on absolute cost. Fable 5 is ~2x Opus per token, and ~2.9x the same-prompt cost vs Opus 4.6. The discount percentage is unchanged; the base it applies to is not.
  • ⚠️ Re-measure token counts if coming from 4.6 or earlier (expect ~45% more for the same text). From 4.7/4.8, expect parity.
  • ⚠️ Account for default thinking tokens. Fable 5 emits reasoning tokens by default — they bill at the output rate ($50/M). Cap or disable thinking if you don’t need it.

Bottom line

For a team already caching against Claude, claude-fable-5 is an easy integration: the entire caching and TTL surface is stable, so there’s nothing to relearn and no code to rewrite. It is not an easy budget swap from Opus 4.6 — between the 2x token price and the ~45% tokenizer inflation, the same prompt runs ~2.9x the cost. Confirm your numbers against the live usage object, decide whether you need the default thinking tokens, and size the cache breakpoints against the new token counts.

For the full caching playbook — prompt structure, hit-rate debugging, TTL-aware patterns — see the four-part series starting with How KV Cache & TTL Work and the working Python tutorial.


FAQ

Do I need to change my cache_control code to use Fable 5? No. The marker syntax, breakpoint limit, and TTL options are identical to the Opus line. Change the model field and nothing else in the caching path.

Did the cache read discount change on Fable 5? No. A warm read is a small single-digit fraction of the no-cache input price (~90%+ off) — we measured ~94% on Fable 5, consistent with Anthropic’s documented cached-read economics.

Does Fable 5 support the 1-hour TTL? Yes. {"type": "ephemeral", "ttl": "1h"} works exactly as on Opus. The 1-hour write costs ~2x no-cache; the 5-minute write ~1.25x. Reads stay deeply discounted on both.

Why is the same prompt so much more expensive on Fable 5 than on Opus 4.6? Two stacked multipliers: Fable 5 lists at 2x the per-token price, and the same English text counts as ~45% more tokens (it uses the post-4.6 tokenizer). Together that’s ~2.9x the cost for an identical prompt. The cache discount is unchanged.

Is Fable 5 a drop-in replacement for Opus 4.8? On the caching/TTL surface and token counts, yes — token counts are identical, so the only delta is the 2x price and Fable 5’s default thinking tokens. We don’t publish capability benchmarks we haven’t run; for quality and reasoning claims, see Anthropic’s model card.


Verification: all token-count, cost, write-premium, and read-discount figures measured against https://synthorai.io/ on 2026-06-10 using the official anthropic SDK, single tenant, single sequential run. Cost is read from the gateway usage.cost field; cross-model and premium/discount ratios are computed from those measured costs and are independent of any account-level promotion. Discount/premium ratios cross-checked against Anthropic Prompt Caching docs. Warm-read latency (TTFT) was dominated by network jitter in our run and is omitted as unreliable. Your numbers will vary with prompt, region, and load.

← Back to blog