Skip to content
1AIVault1AIVault
Recall

Semantic Search

Local embeddings — via Xenova Transformers (ONNX), llama.cpp (GGUF), or a remote embedding API — so AI clients can recall memories by meaning, not just keywords.

Overview

What Semantic Search does

Local embeddings — via Xenova Transformers (ONNX), llama.cpp (GGUF), or a remote embedding API — so AI clients can recall memories by meaning, not just keywords.

Keyword search misses the memory you wrote three months ago using different words. Semantic recall finds it by meaning, which is how your AI tools think about retrieval anyway.

SearchSimilarity
Why it matters

The payoff for your AI memory

ONNX embedding models run locally via Xenova

ONNX embedding models run locally via Xenova Transformers

GGUF embedding models run via llama.cpp for

GGUF embedding models run via llama.cpp for higher quality on capable machines

Remote embedding API option if you want

Remote embedding API option if you want to offload

Semantic recall exposed to AI clients through

Semantic recall exposed to AI clients through the MCP `vault_search` tool

How it works

From first launch to reusable memory

  1. 1

    Open search in the app or have a connected AI client query the vault.

  2. 2

    Enter a natural-language description of the memory you want, not just keywords.

  3. 3

    Review results ranked by conceptual similarity and open the closest match.

  4. 4

    Rephrase to steer toward a different angle of the same topic when needed.

FAQ

Common questions

Does semantic search require GPU?

No. The default ONNX model runs on CPU. GGUF models can use Metal on macOS or CUDA on Linux if you've set up llama.cpp accordingly.

What happens when I change embedding models?

1AIVault triggers a background backfill that re-embeds every entry with the new model. The vault stays usable while it runs; search quality blends old and new until the backfill completes.

Local memory, shared everywhere

Give every AI tool the same memory.

Start free, import real conversations, and reuse your memory across every AI agent you already use.