This document covers how Brain works internally: storage, indexing, sync, analytics, and the MCP server.
Brain has four layers:
flowchart TD
subgraph Interface["CLI / MCP Server"]
A["Commands parse args, call core, format output"]
end
subgraph Core["Core Layer"]
B["config.ts entry.ts index-db.ts receipts.ts"]
end
subgraph Storage
C["Git Repo\n(persistent)"]
D["SQLite DB\n(cache)"]
E["Receipt Files\n(in repo)"]
end
Interface --> Core
Core --> C
Core --> D
Core --> E
- Git repo is the source of truth for entries. It's a regular git repository that can be hosted anywhere.
- SQLite database is a local cache for fast search. It can be rebuilt from the repo at any time.
- Receipt files are stored in the git repo under
_analytics/so they sync across team members.
Entries are markdown files with YAML frontmatter, stored in two directories:
repo/
├── guides/
│ ├── k8s-deployment-guide.md
│ └── ci-pipeline-setup.md
└── skills/
└── react-testing-patterns.md
The filename (without .md) is the entry ID (slug). Slugs are generated from titles by lowercasing, removing special characters, and replacing spaces with hyphens.
Each entry file has YAML frontmatter parsed by gray-matter:
---
title: "K8s Deployment Guide" # Required
author: "alice" # Required
created: "2026-03-20T10:00:00.000Z" # Required, ISO 8601
updated: "2026-03-22T14:30:00.000Z" # Required, ISO 8601
tags: # Optional, array of strings
- kubernetes
- k8s
type: guide # Required: "guide" or "skill"
status: active # Optional: "active" (default), "stale", "archived"
summary: "Step-by-step guide..." # Optional
related_repos: # Optional
- https://github.com/team/infra
related_tools: # Optional
- kubectl
- helm
---Malformed entries (missing required fields, invalid type/status) are skipped with a warning during scanning rather than causing a hard failure.
scanEntries(repoPath) reads all .md files from guides/ and skills/, parses each with gray-matter, validates required frontmatter fields, and returns an array of Entry objects. This is called during sync, push, and index rebuild.
Brain maintains a SQLite database at ~/.brain/cache.db for fast full-text search. The database uses WAL journal mode.
CREATE TABLE entries (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
type TEXT NOT NULL,
author TEXT NOT NULL,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
tags TEXT, -- comma-separated
status TEXT DEFAULT 'active',
related_repos TEXT, -- comma-separated
related_tools TEXT, -- comma-separated
summary TEXT,
content TEXT NOT NULL,
file_path TEXT NOT NULL
);
CREATE VIRTUAL TABLE entries_fts USING fts5(
title, tags, content, summary,
content='entries', content_rowid='rowid'
);Three triggers keep the FTS5 index in sync with the entries table:
- After INSERT: adds the new row to FTS5
- After DELETE: removes the row from FTS5
- After UPDATE: deletes the old row and inserts the new one
rebuildIndex(db, entries) does a full rebuild:
- Deletes all rows from
entries - Runs
INSERT INTO entries_fts(entries_fts) VALUES('rebuild')to resync FTS5 - Inserts all entries in a single transaction
This runs on brain sync, brain join/brain connect, and brain push.
- User query is sanitized: FTS5 operators (
AND,OR,NOT,NEAR) are stripped, special characters removed, each term wrapped in double quotes - FTS5
MATCHquery runs withORDER BY rank(BM25 scoring) - If FTS5 fails (malformed query), falls back to
LIKEsearch ontitle,content,tags - Results are returned as
Entry[]sorted by relevance
- Zero infrastructure (no Elasticsearch, no external service)
- Fast (sub-millisecond for typical knowledge base sizes)
- BM25 relevance ranking built in
- Ships with
better-sqlite3(no native extension compilation issues)
brain sync runs git pull --ff-only and rebuilds the index.
1. Load config
2. Scan entries before pull (get IDs)
3. git pull --ff-only
4. Scan entries after pull (get IDs)
5. Diff: added = after - before, removed = before - after
6. Rebuild FTS5 index from disk
7. Update lastSync in config
Fast-forward-only pulls prevent merge conflicts. If the pull fails (divergent histories), the user needs to resolve manually in ~/.brain/repo/.
1. Read content from --file
2. Create Entry object with frontmatter
3. Write to guides/<slug>.md or skills/<slug>.md
4. git add <file> && git commit && git push
5. Rebuild FTS5 index
6. Record read receipt
Read receipts track who reads what. They are JSON files stored in the git repo so they sync across team members.
repo/_analytics/receipts/
├── 2026-03-21/
│ ├── alice-k8s-deployment-guide-a1b2c3.json
│ └── bob-k8s-deployment-guide-d4e5f6.json
└── 2026-03-23/
└── alice-ci-pipeline-setup-789abc.json
Directory names are dates (YYYY-MM-DD). File names follow the pattern {reader}-{entryId}-{random6hex}.json. The random suffix prevents collisions when the same person reads the same entry multiple times in a day.
{
"entry_id": "k8s-deployment-guide",
"reader": "bob",
"timestamp": "2026-03-23T09:15:00.000Z",
"source": "cli"
}The source field distinguishes CLI reads from MCP reads.
| Action | Source |
|---|---|
brain show <id> |
cli |
brain digest (for each displayed entry) |
cli |
brain push (for the author) |
cli |
MCP get_entry |
mcp |
MCP search_knowledge (for each result) |
mcp |
getStats(repoPath, author, period):
- Computes a cutoff date from the period string (
7d→ 7 days ago) - Lists date directories in
_analytics/receipts/that are >= cutoff - Reads all
.jsonfiles in matching directories - Filters by timestamp >= cutoff
- Groups by
entry_id, counts total reads and unique readers - Returns sorted by read count (descending)
This is a file-scan operation, not a database query. Stats are computed fresh on each call.
Receipts are committed to git and synced via brain sync. This means:
- Your own reads appear immediately in your stats
- Teammate reads appear after the next
brain sync - Stats are not real-time across the team — they are eventually consistent
This is intentional. Real-time stats would require a server. Git-based receipts give us analytics with zero infrastructure.
The MCP server (src/mcp/server.ts) uses the @modelcontextprotocol/sdk package with stdio transport.
flowchart LR
A["MCP Client\n(Claude, Copilot)"] <-->|stdio| B["Brain MCP Server"]
B --- C["Tools (5)"]
B --- D["Resources (2)"]
B --- E["Context:\nBrainConfig + SQLite DB"]
loadConfig()— reads~/.brain/config.yamlcreateIndex(dbPath)— opens/creates SQLite DB with schema- Creates
BrainMcpContextwith config and db reference - Registers 5 tools and 2 resources
- Scans entries from disk and rebuilds the FTS5 index
- Connects to stdio transport
All tools and resources receive a shared BrainMcpContext containing the config and database handle. This means:
- All tools share the same database connection
- The index is only built once at startup
- Config is loaded once and shared
On SIGINT/SIGTERM, the server closes the SQLite database before exiting. This prevents WAL file corruption that can occur if the process is killed while a write is in progress.
src/
├── index.ts ← CLI entry (Commander program, command registration)
├── types.ts ← Shared interfaces (Entry, BrainConfig, Receipt, etc.)
├── commands/ ← One file per CLI command
│ └── *.ts ← Each exports a Commander Command
├── core/ ← Business logic (no CLI or MCP dependencies)
│ ├── config.ts ← Load/save ~/.brain/config.yaml
│ ├── entry.ts ← Parse/serialize markdown entries, scan directories
│ ├── index-db.ts ← SQLite FTS5 create/rebuild/search/query
│ ├── receipts.ts ← Record and aggregate read receipts
│ └── repo.ts ← Git operations (init, join, sync)
├── mcp/ ← MCP server layer
│ ├── server.ts ← Server creation and lifecycle
│ ├── tools.ts ← Tool registrations (push, search, whats_new, get, stats)
│ └── resources.ts ← Resource registrations (digest, stats)
└── utils/ ← Pure utilities
├── git.ts ← Git wrapper (simple-git)
├── output.ts ← Terminal table formatting
├── slug.ts ← Title → slug conversion
├── time.ts ← Time window parsing, relative dates
└── url.ts ← URL validation and credential sanitization
flowchart LR
commands/ --> core/ --> utils/
mcp/ --> core/
commands/ --> utils/
Commands and MCP handlers depend on core. Core depends on utils. There are no circular dependencies. The core layer has no knowledge of CLI or MCP concerns — it is purely business logic operating on entries, config, and git.