Part IV: External-Brain Tutorial — Build Your Own Research OS

Chapter 10: Starting Without Your Own LLM — Claude Code/Codex + Obsidian

Written: 2026-05-22 Last updated: 2026-05-22

10.1 You Can Start Without a GPU

This chapter's readers share one constraint: you do not have your own LLM infrastructure. No GPU cluster, no in-house model hosting, no enterprise OpenAI account on the company's card. You are a first-year PhD, a one-person R&D cell in industry, or a domain expert with thin ML infrastructure. (Readers of Part II already know this — all six LLM Wiki OSS implementations in the April-2026 cohort assume a SaaS LLM. That is not a disclaimer; it is an entry point.)

Two pieces of good news. First, the LLM Wiki pattern Karpathy released in April 2026 is closer to a file convention than to a model ^[6]. Raw markdown, agent-instruction files (AGENTS.md / CLAUDE.md), git diff workflow — the artifacts survive when the model changes. Second, Claude Code and Codex's entry costs are roughly 1/1000 of GPU hosting ^[2]. Karpathy himself says it in the gist: "copy-paste this idea file into your favorite LLM agent, e.g. Codex, Claude Code, OpenCode, Pi" ^[6].

The four-layer taxonomy in (Chapter 2) named L1 LLM Wiki as the foundational layer to build first. The 6-level maturity ladder in (Chapter 3) puts L2 (LLM Wiki) as the first level where knowledge starts to accumulate rather than evaporate after the chat window closes. The three chapters of Part IV walk that road step by step. This chapter is Day 1. From dropping a single PDF in five minutes, to a vault skeleton in thirty, to a research repo with seven hook-enforced rules within a week.

The predecessor survey From Claude Code to Codex (hereafter C2C-IV) covered similar territory in shorter form ^[12]. That book's Ch10 introduced the Karpathy pattern itself; Ch11 showed the Obsidian operation. This book, a month later, rewrites the same material at tutorial depth. Six OSS implementations doubled in number during that month, Codex 0.128 shipped the /goal command ^[8], and Claude Code's subagent/hook/skill/MCP foursome settled into a recognizable operational pattern ^[2]. This chapter compresses that month-newer material into onboarding-ready form.

10.2 Installation — The Smallest Toolchain

Figure 10.3: Obsidian vault skeleton — Drafts/, raw/, wiki/, agents/, decisions/, glossary/ folder tree — illustration by author (gpt-image assisted)

Tools first. What do you install?

1. Obsidian — plain-text markdown vault editor, free. Backlinks and graph view at the cursor are the decisive difference from a generic text editor. Download from obsidian.md, on first launch choose "Create new vault", point at a folder.

2. Claude Code (Anthropic CLI) — terminal-based agent. npm install -g @anthropic-ai/claude-code, then claude. First launch asks for an API key (create at console.anthropic.com, billing card required, but pay-per-use — onboarding fits in $10-30/month). It exposes all four primitives: subagent, hook, skill, MCP ^[2].

3. Codex CLI (OpenAI) — terminal-based agent. npm install -g @openai/codex, then codex. Version 0.128.0 onward ships the /goal long-horizon command in stable form, which makes "queue a 5-hour ingest before bed" a real workflow ^[7]. Tecton & Tide's six-hour run report confirms /goal survives a 5-hour pause ^[11].

4. git — almost certainly already installed. If not, brew install git (macOS) or apt install git (Ubuntu).

Right after installation, the smallest vault skeleton (skip the Obsidian wizard, do it from the shell):


mkdir -p ~/research-vault/{raw/{papers,daily-notes,inbox},wiki/{concepts,claims,open-questions,dead-ends},agents,logs}
cd ~/research-vault
touch AGENTS.md CLAUDE.md README.md log.md TODO.md
git init && git add -A && git commit -m "init: empty research vault"

That is the smallest functioning LLM Wiki. Two directories more than Karpathy's original — dead-ends/ (G14 negative-result capture) and inbox/ (holding pen for un-triaged sources). Both are justified in (Chapter 6) as wiki-rot defenses.

Cost transparency (G10): the above environment costs roughly $15-25/month for Anthropic API + $10-20 for OpenAI API + $0 for Obsidian + negligible disk. About 1/100 the cost of self-hosted GPU, in the territory of a junior researcher's monthly coffee budget. Compare against AAR's $18k for a 5-day 9-instance Opus 4.6 run ^[2]: this chapter's pattern is 1/1000 of that — not a smaller AAR, but the base layer everyone can start from before AAR is feasible.

Figure 10.1: The four-tool stack for starting without your own LLM — Obsidian (vault editor) + Claude Code/Codex (CLI agents) + git (change history). Every artifact lives as markdown on local disk.

10.3 Minimum Vault — The Skeleton Validated in Part II

(Chapter 6) proposed a directory schema designed against wiki rot. Simplified for onboarding, that prescriptive schema reduces to the following tree.


research-vault/
├── raw/                         # L1: original sources, immutable
│   ├── papers/                  # PDFs, arXiv exports
│   ├── daily-notes/             # daily notes (handwritten by you)
│   └── inbox/                   # un-triaged sources
│
├── wiki/                        # L2: maintained by the LLM
│   ├── concepts/                # technical concepts and terms
│   ├── claims/                  # source-anchored claims (claim schema enforced)
│   ├── open-questions/          # unresolved questions
│   └── dead-ends/               # refuted hypotheses (G14)
│
├── agents/                      # specialist agent configs
│   ├── literature-reviewer.md
│   └── statistician.md
│
├── logs/                        # ingest/query/lint changes
│   └── 2026-05-22.md
│
├── AGENTS.md                    # L3: instructions for Codex
├── CLAUDE.md                    # L3: instructions for Claude Code
├── README.md                    # for the human reader
├── log.md                       # updated at session close
└── TODO.md                      # unfinished work

As (Chapter 4) showed, Karpathy's original works with only raw/ and wiki/ ^[6]. The structure above adds four research-specific directories — claims/, open-questions/, dead-ends/, logs/. They earn their place only in research contexts; personal PKM does not need them. As Wenhao Yu's Zettelkasten-perspective review points out, too many directories are themselves a wiki-rot cause ^[14]. The structure above is at the upper edge of what a single human can hold in their head.

claim page minimum fields (the (Chapter 6) schema):


---
type: claim
source: raw/papers/karpathy2026-autoresearch.pdf
confidence: medium
scope: nanochat-scale GPT-2 training
contradicts: []
---

# Claim: autoresearch reduces Time-to-GPT-2 by 11%

**Evidence**: Karpathy reports 700 experiments / 2 days / 11% reduction
in Time-to-GPT-2 metric on nanochat training loop ^[6].

**Verified by**: bswen2026autoresearch700 independent verification ^[16].

**Scope**: nanochat-scale (single-node GPU), GPT-2 architecture only.

**Open question**: does the 11% transfer to GPT-4-scale training?
→ see wiki/open-questions/autoresearch-scaling.md

This schema is what blocks unsourced claims from entering the wiki. The first hook in (Chapter 6)'s table — citation check — is the gate that enforces this format.

10.4 Writing Your First AGENTS.md / CLAUDE.md

Figure 10.4: CLAUDE.md sample — Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution — illustration by author (gpt-image assisted)

Next, the agent needs to know what your vault looks like. The template below is the smallest working instruction file.

AGENTS.md and CLAUDE.md carry near-identical content because Codex and Claude Code prefer their own filename by convention ^[2]. (See §10.7 on cross-vendor portability.)


# Research Vault — Agent Instructions

## Research Rules (Mandatory)

1. Every wiki/claims/ page must have a source ID.
   No source ID, no commit.
2. Never modify raw/. New info goes to wiki/ or inbox/ only.
3. Run unit tests or dry-run before executing experiment code.
4. Send only summaries of internal data to external models.
   No raw patient/customer/revenue rows in prompts.
5. Wet-lab and robot commands need explicit human approval.
6. When results differ from expectations, log 3 likely causes and
   3 follow-up experiments in wiki/dead-ends/.
7. Update report.md and log.md at the end of every session.

## Vault Structure

- `raw/papers/`: PDFs, READ ONLY
- `raw/daily-notes/`: daily notes, READ ONLY
- `raw/inbox/`: triage queue, MOVE-OUT-ONLY (read and clear)
- `wiki/concepts/`: concept pages, EDIT OK
- `wiki/claims/`: claim-schema-enforced, EDIT OK
- `wiki/open-questions/`: unresolved questions, EDIT OK
- `wiki/dead-ends/`: refuted hypotheses, APPEND ONLY (never delete)
- `logs/YYYY-MM-DD.md`: session logs, APPEND ONLY

## Standard Operations

"ingest [paper]" → read raw/papers/[paper], create wiki/claims/, wiki/concepts/ pages
"discuss [topic]" → search wiki/, synthesize answer with citations
"lint" → check for orphan links, missing sources, contradictions
"weekly review" → list pages modified this week, flag low-confidence claims

The seven rules come from §9.1 of the ChatGPT seed. Rules 5 and 6 carry the most weight. Rule 5 enforces the wet-lab safety pattern from (Chapter 9)'s RoboChem-Flex; rule 6 enforces the G14 negative-result capture from (Chapter 6).

10.5 Drop Your First PDF — Five-Minute Workflow

Enough theory. What do you actually do? Here is the five-minute scenario for dropping your first paper.


# 1. Download a PDF from arXiv
cd ~/research-vault
curl -L https://arxiv.org/pdf/2408.06292 -o raw/papers/sakana2024-aiscientist.pdf

# 2. Launch Claude Code (or codex)
claude

Inside Claude Code:


> ingest raw/papers/sakana2024-aiscientist.pdf

Within five minutes, the following files appear (this is a real trace from the author's vault running the same command):

wiki/claims/sakana-end-to-end-loop.md — "Sakana v1 automates ideation→experiment→paper→review end-to-end ^[17]"
wiki/concepts/agentic-tree-search.md — explanation of the tree-search variant Sakana v2 introduced
wiki/open-questions/sakana-novelty-validity.md — "Is Sakana's novelty assessment reliable? → see Schmidgall 2025 critique"
logs/2026-05-22.md — "ingested sakana2024-aiscientist.pdf, generated 3 wiki pages, 1 open question"

This is the smallest functioning unit of an LLM Wiki. One ingest produces four interconnected pages. The links are expressed as [[wiki/open-questions/sakana-novelty-validity]] wikilinks. Obsidian draws the graph view in real time.

Important: do not trust the four pages outright. As (Chapter 11) will show, terry's four months of operation and Aimaker's four-month longitudinal report ^[15] agree on one number: roughly 30% of pages ingested in the first week get rewritten, merged, or deleted in the first lint sweep a month later. That is not failure; it is normal wiki maintenance. Run lint once a week in claude or codex — that maintenance loop is itself the defense against wiki rot.

10.6 Subagents · Skills · Hooks · MCP — Four Mechanisms for Research

Claude Code (and Codex) are more than chat interfaces. Four mechanisms map onto distinct research use cases ^[2].

Subagent — task-specific specialist

A specialized assistant with its own context window, system prompt, and tool permissions. Four examples:


agents/literature-reviewer.md
agents/statistician.md
agents/safety-reviewer.md
agents/paper-formatter.md

literature-reviewer is allowed to call PubMed/arXiv MCP and write to wiki/ only. statistician runs numpy/scipy and writes to results/. The reason to separate them is context hygiene — keeping PubMed hits from polluting a statistical analysis mid-flight. C2C-IV Ch10 covered this in a paragraph ^[12]; a month of operation suggests that each subagent owning its own short instruction file is more stable than several subagents sharing one large file.

Skill — reusable workflow package

A bundle of instructions, scripts, and resources for a particular task. Examples: paper-ingest, claim-audit, weekly-review. Astro-Han's karpathy-llm-wiki is the cleanest example of the format — a Claude Code/Cursor/Codex-compatible skill manifest that packages the raw→wiki→query→lint workflow into one install ^[3]. terry's own vault uses /post, /paper, /paper-search, /survey as skill commands (see (Chapter 11)).

Hook — deterministic gate

A deterministic shell command at a specific point in the agent lifecycle. The seven hooks listed in (Chapter 6) are enforced exactly by this mechanism.

Hook	Function	Implementation hint
citation check	Refuse commits when `wiki/claims/*.md` lacks a source ID	pre-commit hook: `grep -L '^source:' wiki/claims/*.md && exit 1`
raw immutability	Block modifications to `raw/`	git pre-commit: `git diff --cached --name-only raw/ && echo "raw is immutable" && exit 1`
test-before-run	Require tests before experiment code	tool-call hook: when agent runs `python experiment.py`, run `pytest tests/` first
data boundary	Block sensitive data leaving the boundary	API call hook: grep prompt content for patient/customer ID patterns
robot safety	Approval gate on equipment commands	tool-call hook: stdin-approval before MCP tools matching equipment domain
report sync	Confirm report.md/log.md updated after results	post-tool hook: check both files' mtime is inside the current session
negative-result capture	Auto-log refuted hypotheses to dead-ends/	post-experiment hook: if exit code ≠ 0, create a stub in `wiki/dead-ends/`

The last hook is the prescriptive G14 schema slot proposed in (Chapter 6). No OSS implementation in the corpus has this yet — it is one of the prescriptive contributions of this book.

MCP — Model Context Protocol

The standardized interface that connects Claude/Codex to PubMed, internal databases, ELNs, equipment APIs. Examples:

pubmed-mcp — search and fetch abstracts
arxiv-mcp — arXiv metadata
obsidian-mcp — vault-internal search (ekadetov's LLM Wiki package wires this in automatically ^[4])
internal PostgreSQL — query internal experiment logs (paired with the data-boundary hook)

MCP is also the substrate for paper-to-agent (Chapter 8). Stanford Paper2Agent auto-converts paper code into MCP servers ^[10]; that is the L3 transition in (Chapter 12).

Figure 10.2: The four mechanisms (Subagent / Skill / Hook / MCP) and how they bind to the LLM Wiki — subagents get partial vault permissions, skills package reusable workflows, hooks are deterministic gates, MCP connects external data and tools.

10.7 Claude Code vs Codex — An Honest Take on Host Portability

The most common (Chapter 10) question: "do I learn Claude Code first, or Codex first?"

The honest answer is the files port, the agent loops do not (G13). Three layers:

Layer	Claude Code	Codex 0.128	Portability
File convention	`CLAUDE.md`, `.claude/agents/*.md`, vault markdown	`AGENTS.md`, `.codex/agents/*.toml`, vault markdown	clean — markdown reads on both
Slash commands	`/agents`, `/skills`, `/hooks`	`/goal`, `/init`, `/exec`	partial — `/agents` has no direct dual; `/goal` differs from Claude's long-horizon pattern
Agent loop primitives	subagent (separate context), hook (event-driven), skill	`/goal` (persistent), worktrees (parallel), permission profiles	not portable — running the same workflow on both requires redesign

C2C-IV's conclusion ^[12] holds a month later: as long as knowledge sits in AGENTS.md, HANDOFF.md, TODO.md plain-text rather than inside CLAUDE.md or subagent memory, Claude Code ↔ Codex transitions are possible. But moving a 9-subagent workflow onto Codex /goal is a lossy conversion. The auto-translation tools claude2codex and cc2codex say this explicitly — subagents become multi_agent=true skills, and most hooks are dropped ^[12].

Practical recommendations:

Fast interactive experimentation with several specialists in parallel → Claude Code
Five-hour autonomous ingests, overnight runs, worktree-parallel branches → Codex 0.128 /goal
Don't know either well → pick one and go a month deep. By month two the other becomes easier; the files survive the switch.

Codex 0.128's /goal deserves a specific note. Shipped 2026-04-30 ^[8], Willison's explainer calls it "the OEM-blessed Ralph loop" ^[13]. Tecton & Tide's six-hour run confirms /goal survives a 5-hour pause ^[11]. Concretely: "queue a 5-hour ingest at bedtime, check results at breakfast" is now a real workflow — batch-mode LLM Wiki maintenance.

10.8 The Seven Hook Rules — Operational Safety Net

The hook table from (Chapter 6) revisited, this time with implementation hints. You do not need all seven on day one. Start with 1 and 2, switch on 3–7 as your work reaches data, experiments, and equipment.


# .git/hooks/pre-commit (citation check + raw immutability)
#!/bin/bash
# Hook 1: citation check
for f in $(git diff --cached --name-only --diff-filter=AM | grep '^wiki/claims/'); do
  if ! grep -q '^source:' "$f"; then
    echo "ERROR: $f missing 'source:' field"
    exit 1
  fi
done

# Hook 2: raw immutability
if git diff --cached --name-only | grep -q '^raw/'; then
  echo "ERROR: raw/ is immutable. Move new files to inbox/ or wiki/."
  exit 1
fi

As (Chapter 11) will show, terry's vault runs these hooks at two layers — both as git hooks and as Claude Code tool_use hooks. The duplication is the obvious safety net: the agent sometimes edits files outside the git workflow.

10.9 First-Week Checklist

Having read this chapter, you can finish the following inside 24 hours:

[ ] Obsidian + Claude Code (or Codex) installed
[ ] ~/research-vault/ skeleton created + git init
[ ] AGENTS.md / CLAUDE.md written from the §10.4 template
[ ] One arXiv PDF downloaded into raw/papers/
[ ] ingest [paper] run once
[ ] Verify 3–4 wiki/claims/*.md were created
[ ] git commit -m "first ingest" passes (the citation hook should let it through)

That is the L0 (one-shot chat) → L2 (LLM Wiki) jump. (Chapter 11) shows what this looks like a month in — how terry's four-month-old vault evolved, and how it extended into terryum.ai. (Chapter 12) is the step-by-step roadmap from L2 all the way to L5/L6.

The one-line takeaway: with no GPU, no self-hosted LLM, and roughly $30 a month in API calls, the smallest working unit of an external brain is something you can stand up today. Karpathy's BYOAI / files-over-apps slogan, discussed in (Chapter 4), is operational, not aspirational ^[6].

References

Anthropic (2026). Claude Code memory + subagent documentation. Anthropic Developer Docs.
Anthropic (2026). Automated Alignment Researchers — Using LLMs to scale scalable oversight. Anthropic Research, 2026-04-14.
Astro-Han (2026). Astro-Han/karpathy-llm-wiki — Agent Skills-compatible LLM Wiki package. GitHub.
ekadetov (2026). ekadetov/llm-wiki — Claude Code plugin for persistent compounding KBs in Obsidian. GitHub.
Karpathy, A., Y. He, X. Lee, et al. (2026). LLM Wiki — A pattern for building personal knowledge bases using LLM agents. GitHub Gist, 2026-04-04.
Karpathy, A. (2026). Farzapedia reply — personalization argument for LLM Wiki. X (Twitter), 2026.
OpenAI (2026). Custom instructions with AGENTS.md. OpenAI Codex Docs.
OpenAI Codex Team (2026). Codex CLI 0.128.0 release notes. OpenAI Codex Changelog, 2026-04-30.
Park, J. (GeekNews) (2026). Forget RAG: Karpathy's LLM Wiki and a new knowledge-management paradigm. GeekNews, 2026. [GeekNews / Park, 2026]
Stanford Paper2Agent Team (2025). Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents. arXiv:2509.06917.
Tecton and Tide (2026). /goal: The Six-Hour Codex Run That Survived a Five-Hour Pause. Tecton and Tide Blog. [Tecton and Tide, 2026]
Um, T. (terryum) (2026). From Claude Code to Codex — A Migration Note. terryum.ai post, 2026-04-24. [Um, 2026]
Willison, S. (2026). Codex CLI 0.128.0 adds /goal. Simon Willison's Weblog, 2026-04-30.
Yu, W. (2026). What Is Karpathy's LLM Wiki? A Zettelkasten User's Honest Review. Personal Blog, 2026.
Aimaker (2026). AI-powered second brain from LLM Wiki — 4-month report. Aimaker Substack, 2026.
bswen (2026). What Results Did 700 Autoresearch Experiments Achieve Overnight?. Medium, 2026-03-30. [bswen, 2026]
Lu, C., Lu, C., et al. (2024). The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. arXiv:2408.06292.