Skip to content
← Blog
6 min read

Karpathy Wants an LLM Knowledge Base Product. Here's the Missing Layer.

Andrej Karpathy described the ideal LLM knowledge base workflow — ingest, compile, query, lint. But there's no verification layer. Your wiki can be internally consistent and completely wrong.

“I think there is room here for an incredible new product instead of a hacky collection of scripts.”

— Andrej Karpathy, April 2026

Karpathy dropped a thread this week describing how he uses LLMs to build personal knowledge bases — indexing papers, articles, and repos into a raw directory, then having an LLM compile a wiki of markdown files. He uses Obsidian as the frontend. He runs “health checks” to lint the wiki. He built a custom search engine over it.

The workflow is genuinely good. But there's a gap in it that matters more than anything else he described: nothing verifies whether the compiled knowledge is actually true.

The Verification Gap

Karpathy's pipeline: ingest raw sources, compile a wiki, query it, lint it, enhance it. Each step is LLM-powered. But every step inherits the same blind spot — the LLM that compiles the wiki can misinterpret a paper. The LLM that lints it can't tell if the underlying data is outdated. The LLM that answers questions synthesizes confidently from potentially wrong summaries.

His “linting” step finds inconsistencies within the wiki. That's groundedness — does the wiki agree with itself? What it doesn't do is check whether the wiki agrees with reality.

Wiki article: “Transformer attention is O(n) with FlashAttention v3”

LLM lint: Consistent with 3 other wiki articles. PASS.

Reality: FlashAttention is IO-aware but still O(n²) in compute.

The wiki is internally consistent. It's also wrong. At ~400K words and ~100 articles, errors like this compound silently.

What the “Incredible Product” Needs

Karpathy laid out the requirements implicitly:

  • 1.Document ingestion — upload raw sources (papers, articles, repos, internal docs)
  • 2.Compiled knowledge — structured, searchable, interlinked
  • 3.Q&A — ask complex questions, get researched answers
  • 4.Linting / health checks — find inconsistencies, impute missing data
  • 5.Verification — check compiled knowledge against external reality

Items 1–4 are what he built with scripts. Item 5 is the missing layer.

Upload Your Docs. Shield the Output.

VEROQ's Knowledge Base lets you upload your documents — papers, internal reports, research notes, whatever your corpus is — and then verify any LLM output against them and against external evidence in a single call.

Step 1: Upload your knowledge

python
from veroq import Veroq

client = Veroq()

# Upload your compiled wiki, papers, internal docs
client.knowledge.upload(
    file="research-wiki/transformer-attention.md",
    collection="ml-research",
    chunk_strategy="semantic"  # auto-chunks by topic
)

Step 2: Shield any LLM output against your docs + reality

python
result = client.shield(
    text=llm_response,
    source="claude-opus-4-6",
    knowledge_base="ml-research"  # verify against your uploaded docs
)

# Two layers of verification:
# 1. Groundedness — does it match YOUR documents?
# 2. Factual — does it match external evidence?
for claim in result.claims:
    print(f"{claim.text}")
    print(f"  Your docs:  {claim.groundedness}")  # supported | contradicted
    print(f"  External:   {claim.verdict}")        # supported | contradicted
    if claim.correction:
        print(f"  Fix:        {claim.correction}")

Applied to Karpathy's Workflow

Here's how this maps to each step he described:

Data Ingest

Upload raw sources to a Knowledge Base collection. Semantic chunking handles papers, markdown, code. No manual directory management.

Q&A

Search your knowledge base with knowledge.search(). Ask your LLM the question. Then shield() the answer to verify it matches your docs and external data.

Linting

Run shield() over every article in your wiki. Claims contradicted by external evidence get flagged with corrections. Claims unsupported by your own docs get flagged as ungrounded.

Output Verification

Before filing LLM-generated reports back into the wiki, verify them. Only verified claims get committed to the knowledge base. Errors get corrected before they compound.

The Compounding Error Problem

Karpathy notes that his explorations “always add up in the knowledge base” — LLM answers get filed back into the wiki to enhance it for further queries.

This is powerful and dangerous. If an early answer contains a subtle error, every subsequent query that touches that article inherits it. The error gets cited, reinforced, and eventually treated as established fact within the wiki.

Verification at the write boundary is the fix. Before any LLM output gets committed back into the knowledge base:

python
# Before filing an LLM answer back into the wiki
answer = llm.generate(question, context=wiki_articles)

# Verify before committing
result = client.shield(
    text=answer,
    knowledge_base="ml-research"
)

if result.trust_score >= 0.85:
    # Safe to add to wiki
    save_to_wiki(answer)
else:
    # Flag for review — don't let errors compound
    flag_for_review(answer, result.claims)

The Ephemeral Wiki

The most interesting part of Karpathy's post is the extrapolation at the end:

“You could imagine that every question to a frontier grade LLM spawns a team of LLMs to automate the whole thing: iteratively construct an entire ephemeral wiki, lint it, loop a few times, then write a full report.”

This is exactly what VEROQ's Verified Swarm does. Five specialized agents construct a verified answer: corpus search, web research, evidence scoring, synthesis, and verification. Each agent's output is checked before the next agent uses it. The final output comes with a trust score, evidence chain, and verification receipt.

python
# The "ephemeral wiki" pattern — one call
result = client.swarm.run(
    query="What are the current limitations of linear attention?",
    knowledge_base="ml-research",  # ground in your docs
    depth="deep"                   # 5-agent verified pipeline
)

# result.output — verified synthesis
# result.evidence — source chain with reliability scores
# result.receipt — shareable verification proof
# result.trust_score — 0.0 to 1.0

From Scripts to API

Karpathy's setup works for a researcher with the skills to vibe-code a search engine and wire up Obsidian plugins. The product version needs to be three things:

Karpathy's ScriptsAPI
Manual directory of .md filesknowledge.upload() with semantic chunking
Custom search engine (vibe-coded)knowledge.search() — full-text + semantic
LLM linting for consistencyshield() — groundedness + factual verification
Obsidian as viewerAny frontend — JSON responses, not files
No verification layerTwo-layer verification on every query
Single-user localAPI — works in CI/CD, agents, pipelines

Try It

Upload a collection of documents. Ask a question. Shield the answer. The whole loop takes three API calls.

bash
pip install veroq