Recall

Semantic Search

Local embeddings — via Xenova Transformers (ONNX), llama.cpp (GGUF), or a remote embedding API — so AI clients can recall memories by meaning, not just keywords.

Download free All features

Overview

What Semantic Search does

Local embeddings — via Xenova Transformers (ONNX), llama.cpp (GGUF), or a remote embedding API — so AI clients can recall memories by meaning, not just keywords.

Keyword search misses the memory you wrote three months ago using different words. Semantic recall finds it by meaning, which is how your AI tools think about retrieval anyway.

SearchSimilarity

Why it matters

The payoff for your AI memory

ONNX embedding models run locally via Xenova

ONNX embedding models run locally via Xenova Transformers

GGUF embedding models run via llama.cpp for

GGUF embedding models run via llama.cpp for higher quality on capable machines

Remote embedding API option if you want

Remote embedding API option if you want to offload

Semantic recall exposed to AI clients through

Semantic recall exposed to AI clients through the MCP `vault_search` tool

How it works

From first launch to reusable memory

1
Open search in the app or have a connected AI client query the vault.
2
Enter a natural-language description of the memory you want, not just keywords.
3
Review results ranked by conceptual similarity and open the closest match.
4
Rephrase to steer toward a different angle of the same topic when needed.

FAQ

Common questions

Does semantic search require GPU?

No. The default ONNX model runs on CPU. GGUF models can use Metal on macOS or CUDA on Linux if you've set up llama.cpp accordingly.

What happens when I change embedding models?

1AIVault triggers a background backfill that re-embeds every entry with the new model. The vault stays usable while it runs; search quality blends old and new until the backfill completes.

Browse all features

Local memory, shared everywhere

Give every AI tool the same memory.

Start free, import real conversations, and reuse your memory across every AI agent you already use.

Download free See pricing

Semantic Search

What Semantic Search does

The payoff for your AI memory

ONNX embedding models run locally via Xenova

GGUF embedding models run via llama.cpp for

Remote embedding API option if you want

Semantic recall exposed to AI clients through

From first launch to reusable memory

Common questions

Dashboard & Activity

Last Used Stripe on Every Memory

Full-text search with ranking

Give every AI tool the same memory.

Semantic Search

What Semantic Search does

The payoff for your AI memory

ONNX embedding models run locally via Xenova

GGUF embedding models run via llama.cpp for

Remote embedding API option if you want

Semantic recall exposed to AI clients through

From first launch to reusable memory

Common questions

More Recall features

Dashboard & Activity

Last Used Stripe on Every Memory

Full-text search with ranking

Give every AI tool the same memory.