GitHub - t8/memoryport: Local-first permanent, persistent memory for all agents and humans

Website · Download · AMP Spec

Memoryport gives LLMs persistent, queryable memory using Arweave for permanent storage and LanceDB for local vector search. Every conversation is stored permanently and retrieved semantically — so your AI never forgets.

Works with Claude Code, Cursor, Open WebUI, Ollama, and any OpenAI-compatible tool.

Install

Desktop App

Download the latest release for your platform:

Platform	Download
macOS (Apple Silicon & Intel)	Download .dmg
Linux (x64)	Download .AppImage

The desktop app includes a setup wizard, dashboard, and manages all services automatically.

CLI

curl -fsSL https://memoryport.ai/install | sh

Then run the setup wizard:

uc init

That's it. Restart your editor — Memoryport auto-captures conversations and surfaces relevant context.

Build from Source

# Prerequisites: Rust 1.91+, protoc (brew install protobuf), Node.js 18+, pnpm
cargo build --release
cd ui && pnpm install && pnpm build  # Dashboard

Performance

294ms query latency at 500M tokens — brute force with 100% recall. No approximate indexing needed.

Context Space	Chunks	p50 Latency
100K tokens	266	1ms
1M tokens	2,666	3ms
10M tokens	26,666	9ms
100M tokens	266,666	61ms
500M tokens	1,333,333	294ms

Tested with nomic-embed-text (768d, local via Ollama). Compacted LanceDB, no cloud APIs required.

Supported Integrations

Tool	Method	Setup
Claude Code	API Proxy	`uc init` configures automatically (sets `ANTHROPIC_BASE_URL`)
Cursor	API Proxy	Set `ANTHROPIC_BASE_URL=http://127.0.0.1:9191`
Open WebUI	Ollama Proxy	Set Ollama URL to `http://127.0.0.1:9191` in Settings → Connections
Ollama (terminal)	Ollama Proxy	`OLLAMA_HOST=http://127.0.0.1:9191 ollama run llama3`
Continue.dev	Ollama/OpenAI Proxy	Set endpoint to `http://127.0.0.1:9191`
Any OpenAI SDK app	API Proxy	`OPENAI_BASE_URL=http://127.0.0.1:9191`
Claude Code (MCP)	MCP Server	`uc init` registers MCP automatically
Cursor (MCP)	MCP Server	`uc init` registers MCP automatically

The proxy handles all three API formats on a single port (9191):

Anthropic /v1/messages
OpenAI /v1/chat/completions
Ollama /api/chat, /api/generate, /api/tags, and all /api/* routes

How It Works

Memoryport supports two retrieval modes, configurable per-request:

Single-turn (default)

User sends a message
  │
  ▼
Proxy intercepts transparently
  │
  ├─ Quality gating (skip greetings, commands, trivial queries)
  ├─ Search memory for relevant context
  ├─ Inject context into the message as plain text
  ├─ Forward to LLM (Anthropic, OpenAI, Ollama)
  ├─ Capture user message + assistant response
  │   ├─ Sanitize (strip system prompts, internal commands)
  │   ├─ Embed and store in LanceDB
  │   └─ Optionally sync to Arweave (permanent storage)
  └─ Return response to user

Multi-turn (agentic retrieval)

User sends a message
  │
  ▼
Proxy injects a memory search tool into the request
  │
  ├─ LLM decides what to search for and calls the tool
  ├─ Proxy executes the search, returns results to LLM
  ├─ LLM may search again (up to max_rounds)
  ├─ LLM produces final response with full context
  ├─ Capture and store conversation
  └─ Return response to user

Multi-turn lets the model iteratively refine its memory queries — useful for complex questions that need multiple pieces of context. Toggle between modes in the dashboard Settings or via [proxy.agentic] enabled in config. See the AMP specification for the protocol details.

Dashboard

The Tauri desktop app includes a full dashboard. For CLI users, run the stack manually:

./dev.sh start    # Build and start server + proxy + UI

Pages:

Dashboard — context space, indexed chunks, session browser, semantic search
Analytics — activity sparklines, storage growth, type/source distribution, sync status
Integrations — toggle MCP server, API proxy, Ollama capture on/off with live status
Settings — embedding provider, model, API key, smart gating, encryption, Arweave wallet

CLI

uc init                  # Interactive setup wizard
uc store "text" -t knowledge  # Store a chunk
uc query "search term"   # Full retrieval pipeline (gated + reranked + assembled)
uc retrieve "search"     # Raw vector search (bypasses gating)
uc proxy                 # Start the API proxy
uc delete --tx-id <id>   # Logical deletion (destroy encryption key)
uc rebuild-index -u <id> # Rebuild index from Arweave
uc status                # Index stats
uc flush                 # Flush pending writes

MCP Tools

Tool	Description
`uc_auto_store`	Silently store a conversation turn (called automatically)
`uc_store`	Store text with explicit metadata
`uc_query`	Semantic search with full retrieval pipeline
`uc_retrieve`	Raw ranked results
`uc_get_session`	Full conversation history for a session
`uc_list_sessions`	List all stored sessions
`uc_status`	System status

Configuration

~/.memoryport/uc.toml (created by uc init):

[arweave]
gateway = "https://arweave.net"
turbo_endpoint = "https://upload.ardrive.io"
# wallet_path = "~/.memoryport/wallet.json"

[index]
path = "~/.memoryport/index"
embedding_dimensions = 768

[embeddings]
provider = "ollama"              # or "openai"
model = "nomic-embed-text"
dimensions = 768

[retrieval]
max_context_tokens = 50000
similarity_top_k = 50
recency_window = 20
gating_enabled = true            # Three-gate system: skip greetings, route by embedding, filter low quality
# query_expansion = true         # LLM generates alternative search terms
# hyde = true                    # Embed hypothetical answer instead of raw query
# llm_model = "gpt-4o-mini"

[encryption]
# enabled = true
# passphrase_env = "UC_MASTER_PASSPHRASE"

[proxy]
listen = "127.0.0.1:9191"

Architecture

crates/
├── uc-arweave/      # Arweave client (wallet, ANS-104, Turbo, GraphQL)
├── uc-embeddings/   # Embedding + LLM providers (OpenAI, Ollama)
├── uc-core/         # Core engine (chunk, index, retrieve, rerank, assemble, encrypt, gate)
├── uc-cli/          # CLI binary with setup wizard
├── uc-mcp/          # MCP server (stdio, 7 tools, 2 resources)
├── uc-proxy/        # Multi-protocol API proxy (Anthropic + OpenAI + Ollama)
├── uc-server/       # Multi-tenant hosted API server + dashboard
└── uc-tauri/        # Tauri desktop app (macOS, Linux)

ui/                  # React 19 dashboard (Vite + Tailwind)

Security

All data on Arweave is encrypted with AES-256-GCM (per-batch random keys)
Master key derived from passphrase via Argon2id
Logical deletion: destroy batch key → ciphertext permanently unreadable
Proxy sanitizes system prompts, internal commands, and meta-requests before storage

Deployment

Docker

docker compose up

Environment variables:

OPENAI_API_KEY — for embeddings (if using OpenAI)
UC_ADMIN_API_KEY — admin API key for user management
UC_SERVER_LISTEN — listen address (default 0.0.0.0:8080)
UC_SERVER_DATA_DIR — data directory (default /var/lib/uc-server)

Hosted API

# Create a user
curl -X POST http://localhost:8080/admin/users \
  -H "Authorization: Bearer $UC_ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"email": "user@example.com"}'
# Returns: { "user_id": "...", "api_key": "uc_..." }

# Store context
curl -X POST http://localhost:8080/v1/store \
  -H "Authorization: Bearer uc_..." \
  -H "Content-Type: application/json" \
  -d '{"text": "Arweave uses pay-once permanent storage", "chunk_type": "knowledge"}'

# Query
curl -X POST http://localhost:8080/v1/query \
  -H "Authorization: Bearer uc_..." \
  -H "Content-Type: application/json" \
  -d '{"query": "How does Arweave pricing work?"}'

Data Recovery

If you lose your local data or set up on a new machine, Pro users can rebuild their memory from Arweave:

Install Memoryport on the new machine
Open Settings → Arweave Storage
Enter your API key
Click "Rebuild from Arweave"

All encrypted batches are fetched from the permanent storage network and re-indexed locally. Your encryption key never leaves your machine — data is decrypted client-side during rebuild.

Benchmarks

LongMemEval (ICLR 2025)

Evaluated on LongMemEval, a benchmark for long-term memory in chat assistants. Tests retrieval and answer accuracy on the standard split (longmemeval_s) with ~115K token haystacks per question.

Answer Accuracy (full 500 questions, gpt-4o reader, gpt-4o-mini judge):

Category	Accuracy	Session Recall	n
single-session-assistant	91.1%	87%	56
single-session-user	60.0%	56%	70
knowledge-update	53.3%	72%	78
single-session-preference	36.7%	53%	30
temporal-reasoning	27.1%	36%	133
multi-session	27.1%	47%	133
Overall	43.5%	61.1%	500

Note: the full 500-question run places all questions' haystacks in a shared index (~250K chunks). In production, each user has an isolated index, which gives better retrieval quality — our 100-question runs (isolated context) consistently score 60-63%.

Session Recall (48-question oracle split, local embeddings):

Category	Recall	n
knowledge-update	100%	8
multi-session	100%	8
single-session-user	100%	8
single-session-assistant	100%	8
single-session-preference	100%	8
temporal-reasoning	87.5%	8
Overall	97.9%	48

Key retrieval improvements validated across 41 experiments:

Temporal fallback (retry without time filter when too few results)
Date-enriched embeddings (prepend date to chunks before embedding)
Date-prefixed retrieve responses (LLMs see explicit dates per chunk)
Round-level conversation storage (user+assistant pairs as single embeddings)
Chronological session ordering in assembled context

See tests/longmemeval/autoresearch/results.tsv for the full experiment optimization log. Autoresearch framework (tests/longmemeval/autoresearch/) enables automated experiment iteration.

Stress Test (10K chunks)

Metric	Result
Insert throughput	25 chunks/sec (1K) → 13 chunks/sec (10K)
Retrieval accuracy	91% recall@10 across 6 topic categories
Query latency (p50)	265ms (1K chunks), 361ms (oracle dataset)
Index size	~50MB at 10K chunks

Three-Gate Retrieval Gating

Prevents unnecessary retrieval on simple messages:

Gate	What it does	Latency
Gate 1: Rules	Skip greetings, commands, short queries. Force memory references, temporal queries.	~0ms
Gate 2: Embedding routing	Compare query embedding against "needs retrieval" vs "skip" centroids.	~0ms (reuses existing embedding)
Gate 3: Quality threshold	Drop results below relevance score.	~0ms (checks existing scores)

Proxy Latency Overhead

Measures the latency added by the proxy in each mode using a mock upstream (50ms simulated LLM delay):

Mode	p50	p95	mean	Overhead vs direct
Direct (no proxy)	58ms	61ms	58ms	—
Single-turn (context injection)	108ms	120ms	107ms	+49ms
Multi-turn (agentic loop, 1 round)	139ms	145ms	139ms	+81ms

Single-turn overhead is dominated by embedding + LanceDB search. Multi-turn adds one extra round trip to the upstream for tool execution.

Run benchmarks yourself:

# LongMemEval session recall (oracle split, fast)
python3 tests/longmemeval/run_benchmark.py --questions 50 --dataset oracle

# LongMemEval answer accuracy (standard split, requires OpenAI API key)
python3 tests/longmemeval/run_answer_accuracy.py --questions 100 --dataset s --answer-model gpt-4o

# Autoresearch optimization loop (iterates experiments overnight)
python3 tests/longmemeval/autoresearch/prepare.py --questions 100

# Stress test
python3 tests/stress/generate.py --chunks 10000
python3 tests/stress/benchmark.py

# Latency benchmark (requires mock upstream + proxy pointed at it)
python3 tests/latency/mock_upstream.py --port 8199 &
python3 tests/latency/benchmark.py --proxy http://127.0.0.1:9292 --mock http://127.0.0.1:8199

Comparison

How Memoryport compares to other AI memory tools:

	Memoryport	Supermemory	claude-mem
Architecture	Local-first + permanent backup	Cloud SaaS	Local plugin
Language	Rust	TypeScript	TypeScript
Storage	LanceDB (local) + Arweave (permanent)	PostgreSQL (their cloud)	SQLite (local only)
Encryption	AES-256-GCM per-batch, Argon2id key	Delegated to cloud infra	None
Data ownership	You (local + on-chain)	Them (cloud)	You (local files)
Multi-tool	Proxy + MCP (Claude Code, Cursor, Ollama, OpenAI)	API-based	Claude Code only
Capture method	Transparent API proxy (zero-config)	Explicit API calls	Lifecycle hooks
Desktop app	Signed Tauri native app	None (web only)	Localhost web viewer
Open protocol	AMP	No	No
Self-hosting	Default (runs locally)	Enterprise only	Default (runs locally)
Scale benchmark	500M tokens, 294ms p50	Not published	Not published
Retrieval accuracy	43.5% answer accuracy / 500q, 97.9% session recall (LongMemEval)	84.6% answer accuracy (LongMemEval, GPT-5)	Not published
Permanent storage	Arweave (pay once, stored forever)	No	No
License	Apache-2.0	MIT	AGPL-3.0

Contributing

See CONTRIBUTING.md.

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github		.github
.next		.next
config		config
crates		crates
docs		docs
packaging		packaging
tests		tests
ui		ui
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
TECHNICAL_SPEC.md		TECHNICAL_SPEC.md
dev.sh		dev.sh
docker-compose.yml		docker-compose.yml
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install

Desktop App

CLI

Build from Source

Performance

Supported Integrations

How It Works

Single-turn (default)

Multi-turn (agentic retrieval)

Dashboard

CLI

MCP Tools

Configuration

Architecture

Security

Deployment

Docker

Hosted API

Data Recovery

Benchmarks

LongMemEval (ICLR 2025)

Stress Test (10K chunks)

Three-Gate Retrieval Gating

Proxy Latency Overhead

Comparison

Contributing

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Install

Desktop App

CLI

Build from Source

Performance

Supported Integrations

How It Works

Single-turn (default)

Multi-turn (agentic retrieval)

Dashboard

CLI

MCP Tools

Configuration

Architecture

Security

Deployment

Docker

Hosted API

Data Recovery

Benchmarks

LongMemEval (ICLR 2025)

Stress Test (10K chunks)

Three-Gate Retrieval Gating

Proxy Latency Overhead

Comparison

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 1

Languages

Packages