11.1 When and why you might switch ¶
Realistic triggers, in roughly increasing order of urgency:
- Cost. A new model from the same provider, or a different
provider entirely, comes in at half the per-token price. At thousands
of memories this matters less than at millions, but it matters.
- Quality. A newer or larger model produces measurably better
recall on your kind of facts. Embedding-model leaderboards (e.g.
MTEB) move every few months; what was state-of-the-art two years ago
is often surpassed today.
- Provider deprecation. Your current provider end-of-lifes
the model you embedded under. Migrating before they yank the API
is cheaper than scrambling after.
- Vendor diversity / data sovereignty. You decide you no
longer want your memory texts traveling to a third party for
embedding. You move to a self-hosted model (Ollama,
sentence-transformers, BGE, etc.).
- Local privacy / air-gap. Same as above: the entire pipeline
becomes on-host. No outbound traffic at all.
- You changed your mind about the dimension. You started with
a 1536-dim model because that was the default and now want to drop to
768 to save disk + index memory, or move up to 3072 for higher
quality.
You are not changing the LLMs that use the memory store — Claude,
ChatGPT, Gemini connect over MCP and don’t care which embedding model
produced the vectors they’re searching. The CLI’s teach
subcommand uses an independent LLM for fact extraction (separate
config, separate endpoint, separate API key) and has no data-shape
effect on the server. Switching the server’s embedding provider has
no effect on the teach LLM, and vice versa.