>_
← All articles
aiproductivityknowledge-management

Your AI Conversation History Is a Gold Mine. Here's How to Extract It.

After accumulating 300+ AI conversations, I discovered they contained something valuable — and completely inaccessible. This is the framework I built to fix that.

Felipe Díaz Marín··7 min read

At some point in the last two years, you had a genuinely useful conversation with an AI. Maybe you worked through a financial decision, a health question, a project architecture, a career dilemma. The AI helped. You nodded along. You closed the tab.

It is now completely gone.

Not deleted — still in your history somewhere. But buried under 300 other threads with titles like "help me think through this" and "quick question about X." No index. No search that returns structured facts. No way to ask "what did we decide about this?" across the entire archive.

I ran into this problem after accumulating a few hundred conversations across multiple topics — finances, health, projects, career, teaching. All of it valuable. None of it accessible. I spent several weeks trying to fix it, hit every failure mode, redesigned from scratch twice, and eventually built something that works. What follows is the framework.

Why the Obvious Approach Fails

The obvious first move — ask an LLM to summarize your conversations — produces something worthless. Not because the model fails, but because summarization is the wrong task.

Here is what a naive extraction looks like:

"The user discussed their investment portfolio and the associated tax implications. They also explored health optimization strategies and had a conversation about career positioning."

Congratulations. You now know that you talk to AI about the same things everyone talks to AI about. This tells you nothing you can act on six months from now.

The same trap catches every variation: "extract key insights," "what are the most important things from this conversation," "give me a summary of what we decided." All produce prose descriptions of what happened. None produce structured facts you can query later.

There are three additional failure modes that make this harder than it looks:

Relative time references decay immediately. "Last week I got the results back" tells you nothing a year later. Every temporal reference must be resolved to an absolute date at extraction time, or it becomes noise.

Without primary keys, facts duplicate. If the same topic appears across eight conversations over several months, a naive extraction produces eight separate records for the same information — none of them authoritative.

Volume creates false confidence. Running a script that produces 300 summaries from 300 conversations feels like progress. It is not. You have 300 descriptions of descriptions, and the actual facts are still buried.

The Core Principle

The only test that matters: can you cite a specific value from the extracted record?

Good extraction:

investment:
  asset_type: index fund
  ticker: VOO
  quantity: 12.5 shares
  cost_basis: $3,240
  acquisition_date: 2024-03-15
  platform: Fidelity
  tax_lot: open

Bad extraction: "User discussed index fund investments and portfolio strategy"

Good extraction:

health_protocol:
  supplement: Vitamin D3
  dose: 4000 IU/day
  timing: with morning meal
  started: 2024-09-01
  context: bloodwork showed deficiency (18 ng/mL)
  reviewing_at: 2025-03-01

Bad extraction: "User takes Vitamin D and discussed supplement optimization"

The difference is not about thoroughness. It is about extracting a different type of object. Descriptions are prose. Facts are structured records with typed fields. You are not summarizing conversations. You are filling schemas.

The Framework

Phase 0: Domain Discovery

Before writing a single schema or running a single extraction, read your conversation list. Not the conversations themselves — just the titles and dates. Spend 30 minutes grouping similar ones.

Do not start with a taxonomy you invent in advance. Start with the taxonomy your actual conversations reveal. What you think your life looks like and what you actually talked to an AI about for two years are genuinely different things.

In practice, a few hundred conversations across a working professional's history typically cluster into six to ten domains. Common ones: finances, health, projects, career, learning, research, personal decisions.

The discovery step matters because it prevents you from building elaborate schemas for domains with only two conversations, and surfaces recurring themes you had not consciously recognized.

Phase 1: Schema Design

A schema is a list of typed fields with examples, designed before extraction begins, for one domain at a time.

Two fields are non-negotiable in every schema regardless of domain: source_conversation_id and source_quote. These are your provenance chain. Without them, extracted facts become untraceable assertions within months. You will not remember where they came from, and you will not be able to audit or correct them.

{
  "asset_type": "index fund",
  "ticker": "VOO",
  "quantity": 12.5,
  "cost_basis_usd": 3240,
  "acquisition_date": "2024-03-15",
  "platform": "Fidelity",
  "source_conversation_id": "conv_2024_03_portfolio_review",
  "source_quote": "I picked up 12.5 shares of VOO at around $259 each on March 15th",
  "extraction_date": "2026-03-12"
}

Primary keys must also be defined per schema before any extraction runs. A primary key is the combination of fields that makes a record unique — for investment holdings, that might be (ticker, platform, acquisition_date). Without primary keys, the deduplication logic has nothing to operate on.

Phase 2: Extraction

With schemas defined, extraction becomes a structured prompt engineering task. The prompt does three things: provides the schema with field-level instructions, provides 2-3 example completions (few-shot), and provides the raw conversation text.

The few-shot examples are the most important part. Before prompting any LLM, do the first 5-10 extractions manually. Fill the schema yourself, field by field. This calibrates what good looks like for your specific data and reveals where your schema needs adjustment.

What to skip: greetings, LLM suggestions you did not adopt, hypothetical discussions that went nowhere, debugging exchanges with no decision outcome. Extract only confirmed decisions and explicitly stated facts.

Phase 3: Maintenance

Define four operations for every incoming fact:

  • ADD: new record, primary key not yet in the knowledge base
  • UPDATE: existing record, a field value has changed
  • CORRECT: existing record was wrong (extraction error, not just outdated)
  • NOOP: already stored accurately, discard

Every extraction pass should produce a diff. Without an audit trail, your knowledge base drifts in ways you cannot trace.

Phase 4: Human Validation

Never write extracted facts directly to your knowledge base without a review step. Errors compound — a wrong value in one record propagates to every downstream calculation or decision that references it.

Present extracted facts in batches of 5-10. For each fact, show the extracted value alongside the source_quote. If the quote does not support the extracted value, it is a hallucination. Reject it. If you keep correcting the same field type, your schema is ambiguous — fix it.

Storage Architecture

Three layers serve different purposes:

JSON files per domain — source of truth. Version-controlled, agent-readable, programmatically queryable. This is what the extraction process writes to.

Human-readable documents — for browsing. Notion, Craft, Obsidian, whatever you actually use. Generated from the JSON layer, not maintained directly.

Vector embeddings — for semantic recall. A local vector store (Qdrant, Chroma) where each record is embedded with metadata. This enables queries like "what do I know about my supplement protocol" without knowing the exact domain or field name.

The discipline: never edit facts in the human-readable layer. Always edit the JSON, then regenerate. Otherwise the layers diverge and you lose your source of truth.

Getting Started

  1. Export your conversation history. ChatGPT, Claude, and Gemini all support full history export — usually under Settings → Data Export.

  2. Read through 20-30 conversation titles. Group similar ones. What themes keep appearing? These are your domains.

  3. Pick the one domain where losing the information would cost you the most. For most people this is finances or health — where the cost of an error is highest.

  4. Design a schema for that domain. Ten to fifteen fields maximum. source_conversation_id, source_quote, and extraction_date are required on every record.

  5. Extract 5 conversations from that domain manually — fill the schema yourself, field by field. Do not use an LLM for this step. The goal is to calibrate what a correct extraction looks like before you automate anything.

  6. Use your manual extractions as few-shot examples. Now run LLM extraction on the next 20 conversations in that domain.

  7. Review every output against the source quote before writing to your knowledge base.

That is the first pass. You now have a structured, queryable record of your most important domain.

The full architecture — vector store, multiple domains, automated sync — comes later. The minimum viable version is a single JSON file you trust, with provenance you can check, covering the domain that matters most. Start there.

If you have built something similar — different domains, different schema patterns, a better maintenance protocol — I would be curious to know what your approach looked like. Get in touch.

Related reading:

Felipe Díaz Marín has twenty years of hospitality operations experience across Chile, Malaysia, Spain, and France. He is a lecturer in organizational leadership, marketing, and entrepreneurship at CY Cergy Paris Université, and advises hotel and F&B teams on operational transformation. Based in Paris.