A personal memory system for Claude Code (and how I built it)

Every agent conversation starts from zero. It’s called a cold start.

How often have you wished you could simply recall that thing you did or learned during a session a few weeks back?

Maybe you created a spec, a change log, git history, or used Claude’s native memory system to later restore context.

But what about context from other sources?

Team or client meeting transcripts. Learnings from YouTube videos or podcasts.

Project notes. Journal entries. Insightful articles. Contacts. Documents.

The point is that loads of relevant context is scattered across different tools and formats, without a great way to surface it on demand.

This was an itch I needed to scratch.

So I built a memory system.

It’s not perfect by any means. There are many like it.

But this one is mine. And it works for me.

The system

Everything flows into one place: my Obsidian vault.

Not manually. An automated pipeline.

Here’s what feeds into it:

Claude Code sessions sync automatically via hooks. Every conversation gets exported as a markdown file with full metadata: session ID, date, skills used, files touched. 835 sessions and counting.
Meeting transcripts from Granola sync via /meetings-sync, which pulls from the local cache and backfills full speaker-attributed transcripts from the API. 273 meetings, all searchable.
YouTube videos and podcasts get ingested via /media-ingest. YouTube uses yt-dlp for metadata and transcripts. Podcasts get transcribed locally with Whisper on my M1 Pro. No API calls, no cost, no data leaving my machine.
Journal entries and notes I write directly in Obsidian.

It all lands as markdown with structured YAML frontmatter.

Same format. Same vault. Same search index.

Making it searchable

Having everything in one vault is table stakes. The magic is QMD, an on-device hybrid search engine that indexes all of it.

QMD runs three local models via Metal GPU acceleration:

An embedding model that chunks every document and creates vector representations
A query expansion model that rewrites your search to catch what you meant
A reranking model that scores results by relevance

So I can run qmd query "that conversation about auth middleware" and get relevant results across sessions, meetings, YouTube notes, podcasts, and journal entries. Under 2 seconds. No API calls. Fully local.

Right now my index has 1,100+ files and 11,000+ embedded chunks across six collections: sessions, meetings, YouTube, podcasts, notes, and journal.

Recall: the interface

The search engine is useless if I have to leave my workflow to use it. So I built /recall, a Claude Code skill that acts as the front door to the whole system.

/recall classifies every query into one of two modes:

Temporal: “What did I work on yesterday?” reads directly from Claude Code’s native session files. No index needed. It scans the raw JSONL transcripts and gives me a table of sessions with timestamps, message counts, and the ability to expand any one.

Topic: “What did we discuss about pricing strategy?” hits QMD’s hybrid search across all six collections. BM25 keyword matching, vector similarity, and query expansion, combined in a single pass.

Two things that matter

After a few months of running this system, two insights stand out:

Auto-capture or it doesn’t exist

I built it this way because the moment I have to think about capturing something, I won’t. Sessions sync via hooks, events that fire automatically when I send a message or when Claude stops responding.

Meetings, media, and embeddings each have their own skill. And when I want to bring everything current at once, I run /vault-sync, which exports sessions, syncs meetings, and re-indexes the full vault in one pass.

If any of this required manual effort (copying transcripts, exporting notes, tagging files) I wouldn’t do it. Nobody would. The capture has to be invisible.

Local-first changes everything

All three models run on my machine. Every search hits a local SQLite index. No API calls, no latency, no cost per query, no data leaving my laptop.

So I actually use it. I /recall multiple times per session without thinking about rate limits or token costs. Search takes 1.7 seconds, not 10. My meeting transcripts, journal entries, and session history never touch a third-party server.

What this unlocks

When Claude Code can search across 800+ sessions, 270+ meetings, and dozens of media notes, it stops being a stateless tool.

It remembers what we discussed last Tuesday. It knows what the client said in that call. It can find the YouTube video where someone explained the pattern I’m trying to implement.

It’s not artificial memory. It’s my memory, indexed and searchable by the agent I work with every day.

The context window resets every session. The vault doesn’t.