Memoryport gives LLMs persistent, queryable memory using Arweave for permanent storage and LanceDB for local vector search. Every conversation is stored permanently and retrieved semantically — so your AI never forgets.
Works with Claude Code, Cursor, Open WebUI, Ollama, and any OpenAI-compatible tool.
Download the latest release for your platform:
| Platform | Download |
|---|---|
| macOS (Apple Silicon & Intel) | Download .dmg |
| Linux (x64) | Download .AppImage |
The desktop app includes a setup wizard, dashboard, and manages all services automatically.
curl -fsSL https://memoryport.ai/install | shThen run the setup wizard:
uc initThat's it. Restart your editor — Memoryport auto-captures conversations and surfaces relevant context.
# Prerequisites: Rust 1.91+, protoc (brew install protobuf), Node.js 18+, pnpm
cargo build --release
cd ui && pnpm install && pnpm build # Dashboard294ms query latency at 500M tokens — brute force with 100% recall. No approximate indexing needed.
| Context Space | Chunks | p50 Latency |
|---|---|---|
| 100K tokens | 266 | 1ms |
| 1M tokens | 2,666 | 3ms |
| 10M tokens | 26,666 | 9ms |
| 100M tokens | 266,666 | 61ms |
| 500M tokens | 1,333,333 | 294ms |
Tested with nomic-embed-text (768d, local via Ollama). Compacted LanceDB, no cloud APIs required.
| Tool | Method | Setup |
|---|---|---|
| Claude Code | API Proxy | uc init configures automatically (sets ANTHROPIC_BASE_URL) |
| Cursor | API Proxy | Set ANTHROPIC_BASE_URL=http://127.0.0.1:9191 |
| Open WebUI | Ollama Proxy | Set Ollama URL to http://127.0.0.1:9191 in Settings → Connections |
| Ollama (terminal) | Ollama Proxy | OLLAMA_HOST=http://127.0.0.1:9191 ollama run llama3 |
| Continue.dev | Ollama/OpenAI Proxy | Set endpoint to http://127.0.0.1:9191 |
| Any OpenAI SDK app | API Proxy | OPENAI_BASE_URL=http://127.0.0.1:9191 |
| Claude Code (MCP) | MCP Server | uc init registers MCP automatically |
| Cursor (MCP) | MCP Server | uc init registers MCP automatically |
The proxy handles all three API formats on a single port (9191):
- Anthropic
/v1/messages - OpenAI
/v1/chat/completions - Ollama
/api/chat,/api/generate,/api/tags, and all/api/*routes
Memoryport supports two retrieval modes, configurable per-request:
User sends a message
│
▼
Proxy intercepts transparently
│
├─ Quality gating (skip greetings, commands, trivial queries)
├─ Search memory for relevant context
├─ Inject context into the message as plain text
├─ Forward to LLM (Anthropic, OpenAI, Ollama)
├─ Capture user message + assistant response
│ ├─ Sanitize (strip system prompts, internal commands)
│ ├─ Embed and store in LanceDB
│ └─ Optionally sync to Arweave (permanent storage)
└─ Return response to user
User sends a message
│
▼
Proxy injects a memory search tool into the request
│
├─ LLM decides what to search for and calls the tool
├─ Proxy executes the search, returns results to LLM
├─ LLM may search again (up to max_rounds)
├─ LLM produces final response with full context
├─ Capture and store conversation
└─ Return response to user
Multi-turn lets the model iteratively refine its memory queries — useful for complex questions that need multiple pieces of context. Toggle between modes in the dashboard Settings or via [proxy.agentic] enabled in config. See the AMP specification for the protocol details.
The Tauri desktop app includes a full dashboard. For CLI users, run the stack manually:
./dev.sh start # Build and start server + proxy + UIPages:
- Dashboard — context space, indexed chunks, session browser, semantic search
- Analytics — activity sparklines, storage growth, type/source distribution, sync status
- Integrations — toggle MCP server, API proxy, Ollama capture on/off with live status
- Settings — embedding provider, model, API key, smart gating, encryption, Arweave wallet
uc init # Interactive setup wizard
uc store "text" -t knowledge # Store a chunk
uc query "search term" # Full retrieval pipeline (gated + reranked + assembled)
uc retrieve "search" # Raw vector search (bypasses gating)
uc proxy # Start the API proxy
uc delete --tx-id <id> # Logical deletion (destroy encryption key)
uc rebuild-index -u <id> # Rebuild index from Arweave
uc status # Index stats
uc flush # Flush pending writes| Tool | Description |
|---|---|
uc_auto_store |
Silently store a conversation turn (called automatically) |
uc_store |
Store text with explicit metadata |
uc_query |
Semantic search with full retrieval pipeline |
uc_retrieve |
Raw ranked results |
uc_get_session |
Full conversation history for a session |
uc_list_sessions |
List all stored sessions |
uc_status |
System status |
~/.memoryport/uc.toml (created by uc init):
[arweave]
gateway = "https://arweave.net"
turbo_endpoint = "https://upload.ardrive.io"
# wallet_path = "~/.memoryport/wallet.json"
[index]
path = "~/.memoryport/index"
embedding_dimensions = 768
[embeddings]
provider = "ollama" # or "openai"
model = "nomic-embed-text"
dimensions = 768
[retrieval]
max_context_tokens = 50000
similarity_top_k = 50
recency_window = 20
gating_enabled = true # Three-gate system: skip greetings, route by embedding, filter low quality
# query_expansion = true # LLM generates alternative search terms
# hyde = true # Embed hypothetical answer instead of raw query
# llm_model = "gpt-4o-mini"
[encryption]
# enabled = true
# passphrase_env = "UC_MASTER_PASSPHRASE"
[proxy]
listen = "127.0.0.1:9191"crates/
├── uc-arweave/ # Arweave client (wallet, ANS-104, Turbo, GraphQL)
├── uc-embeddings/ # Embedding + LLM providers (OpenAI, Ollama)
├── uc-core/ # Core engine (chunk, index, retrieve, rerank, assemble, encrypt, gate)
├── uc-cli/ # CLI binary with setup wizard
├── uc-mcp/ # MCP server (stdio, 7 tools, 2 resources)
├── uc-proxy/ # Multi-protocol API proxy (Anthropic + OpenAI + Ollama)
├── uc-server/ # Multi-tenant hosted API server + dashboard
└── uc-tauri/ # Tauri desktop app (macOS, Linux)
ui/ # React 19 dashboard (Vite + Tailwind)
- All data on Arweave is encrypted with AES-256-GCM (per-batch random keys)
- Master key derived from passphrase via Argon2id
- Logical deletion: destroy batch key → ciphertext permanently unreadable
- Proxy sanitizes system prompts, internal commands, and meta-requests before storage
docker compose upEnvironment variables:
OPENAI_API_KEY— for embeddings (if using OpenAI)UC_ADMIN_API_KEY— admin API key for user managementUC_SERVER_LISTEN— listen address (default0.0.0.0:8080)UC_SERVER_DATA_DIR— data directory (default/var/lib/uc-server)
# Create a user
curl -X POST http://localhost:8080/admin/users \
-H "Authorization: Bearer $UC_ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"email": "user@example.com"}'
# Returns: { "user_id": "...", "api_key": "uc_..." }
# Store context
curl -X POST http://localhost:8080/v1/store \
-H "Authorization: Bearer uc_..." \
-H "Content-Type: application/json" \
-d '{"text": "Arweave uses pay-once permanent storage", "chunk_type": "knowledge"}'
# Query
curl -X POST http://localhost:8080/v1/query \
-H "Authorization: Bearer uc_..." \
-H "Content-Type: application/json" \
-d '{"query": "How does Arweave pricing work?"}'If you lose your local data or set up on a new machine, Pro users can rebuild their memory from Arweave:
- Install Memoryport on the new machine
- Open Settings → Arweave Storage
- Enter your API key
- Click "Rebuild from Arweave"
All encrypted batches are fetched from the permanent storage network and re-indexed locally. Your encryption key never leaves your machine — data is decrypted client-side during rebuild.
Evaluated on LongMemEval, a benchmark for long-term memory in chat assistants. Tests retrieval and answer accuracy on the standard split (longmemeval_s) with ~115K token haystacks per question.
Answer Accuracy (full 500 questions, gpt-4o reader, gpt-4o-mini judge):
| Category | Accuracy | Session Recall | n |
|---|---|---|---|
| single-session-assistant | 91.1% | 87% | 56 |
| single-session-user | 60.0% | 56% | 70 |
| knowledge-update | 53.3% | 72% | 78 |
| single-session-preference | 36.7% | 53% | 30 |
| temporal-reasoning | 27.1% | 36% | 133 |
| multi-session | 27.1% | 47% | 133 |
| Overall | 43.5% | 61.1% | 500 |
Note: the full 500-question run places all questions' haystacks in a shared index (~250K chunks). In production, each user has an isolated index, which gives better retrieval quality — our 100-question runs (isolated context) consistently score 60-63%.
Session Recall (48-question oracle split, local embeddings):
| Category | Recall | n |
|---|---|---|
| knowledge-update | 100% | 8 |
| multi-session | 100% | 8 |
| single-session-user | 100% | 8 |
| single-session-assistant | 100% | 8 |
| single-session-preference | 100% | 8 |
| temporal-reasoning | 87.5% | 8 |
| Overall | 97.9% | 48 |
Key retrieval improvements validated across 41 experiments:
- Temporal fallback (retry without time filter when too few results)
- Date-enriched embeddings (prepend date to chunks before embedding)
- Date-prefixed retrieve responses (LLMs see explicit dates per chunk)
- Round-level conversation storage (user+assistant pairs as single embeddings)
- Chronological session ordering in assembled context
See tests/longmemeval/autoresearch/results.tsv for the full experiment optimization log. Autoresearch framework (tests/longmemeval/autoresearch/) enables automated experiment iteration.
| Metric | Result |
|---|---|
| Insert throughput | 25 chunks/sec (1K) → 13 chunks/sec (10K) |
| Retrieval accuracy | 91% recall@10 across 6 topic categories |
| Query latency (p50) | 265ms (1K chunks), 361ms (oracle dataset) |
| Index size | ~50MB at 10K chunks |
Prevents unnecessary retrieval on simple messages:
| Gate | What it does | Latency |
|---|---|---|
| Gate 1: Rules | Skip greetings, commands, short queries. Force memory references, temporal queries. | ~0ms |
| Gate 2: Embedding routing | Compare query embedding against "needs retrieval" vs "skip" centroids. | ~0ms (reuses existing embedding) |
| Gate 3: Quality threshold | Drop results below relevance score. | ~0ms (checks existing scores) |
Measures the latency added by the proxy in each mode using a mock upstream (50ms simulated LLM delay):
| Mode | p50 | p95 | mean | Overhead vs direct |
|---|---|---|---|---|
| Direct (no proxy) | 58ms | 61ms | 58ms | — |
| Single-turn (context injection) | 108ms | 120ms | 107ms | +49ms |
| Multi-turn (agentic loop, 1 round) | 139ms | 145ms | 139ms | +81ms |
Single-turn overhead is dominated by embedding + LanceDB search. Multi-turn adds one extra round trip to the upstream for tool execution.
Run benchmarks yourself:
# LongMemEval session recall (oracle split, fast)
python3 tests/longmemeval/run_benchmark.py --questions 50 --dataset oracle
# LongMemEval answer accuracy (standard split, requires OpenAI API key)
python3 tests/longmemeval/run_answer_accuracy.py --questions 100 --dataset s --answer-model gpt-4o
# Autoresearch optimization loop (iterates experiments overnight)
python3 tests/longmemeval/autoresearch/prepare.py --questions 100
# Stress test
python3 tests/stress/generate.py --chunks 10000
python3 tests/stress/benchmark.py
# Latency benchmark (requires mock upstream + proxy pointed at it)
python3 tests/latency/mock_upstream.py --port 8199 &
python3 tests/latency/benchmark.py --proxy http://127.0.0.1:9292 --mock http://127.0.0.1:8199How Memoryport compares to other AI memory tools:
| Memoryport | Supermemory | claude-mem | |
|---|---|---|---|
| Architecture | Local-first + permanent backup | Cloud SaaS | Local plugin |
| Language | Rust | TypeScript | TypeScript |
| Storage | LanceDB (local) + Arweave (permanent) | PostgreSQL (their cloud) | SQLite (local only) |
| Encryption | AES-256-GCM per-batch, Argon2id key | Delegated to cloud infra | None |
| Data ownership | You (local + on-chain) | Them (cloud) | You (local files) |
| Multi-tool | Proxy + MCP (Claude Code, Cursor, Ollama, OpenAI) | API-based | Claude Code only |
| Capture method | Transparent API proxy (zero-config) | Explicit API calls | Lifecycle hooks |
| Desktop app | Signed Tauri native app | None (web only) | Localhost web viewer |
| Open protocol | AMP | No | No |
| Self-hosting | Default (runs locally) | Enterprise only | Default (runs locally) |
| Scale benchmark | 500M tokens, 294ms p50 | Not published | Not published |
| Retrieval accuracy | 43.5% answer accuracy / 500q, 97.9% session recall (LongMemEval) | 84.6% answer accuracy (LongMemEval, GPT-5) | Not published |
| Permanent storage | Arweave (pay once, stored forever) | No | No |
| License | Apache-2.0 | MIT | AGPL-3.0 |
See CONTRIBUTING.md.
Apache-2.0

