Chapter 4: Karpathy's LLM Wiki Pattern — The 16M-View Retrospective
4.1 A single gist, 2026-04-04
On 2026-04-04 Andrej Karpathy published a one-page GitHub Gist [3]. Title: "LLM Wiki." The body was not a product announcement but a pattern description. Keep raw sources in a directory under their own bytes; let an LLM agent read them and maintain a markdown wiki on top; use that wiki as the substrate for every subsequent question, summary, or extension. The gist itself begins "copy-paste this idea file into your favorite LLM agent, e.g. Codex, Claude Code, OpenCode, Pi." The pattern itself is shipped as an idea file, meant to be copy-pasted into whatever agent host the reader uses [10].
The launch tweet hit 16M+ impressions [3]. By early May 2026 the gist had crossed 19k+ stars and 4k+ forks, and at least six OSS implementations had appeared within thirty days [3]. In the same month MindStudio, Cognition, Denser, Analytics Vidhya, Agentpedia, Lobster Pack, and WebEdge published explanatory deep-reads [5]; Starmorph's "Full Beginner Setup Guide" video became the canonical setup tutorial [12]; and in Korean, GeekNews crystallised the "BYOAI / files-over-apps" vocabulary inside a week [13]. The OSS and content matrix surveyed in (Chapter 5) is entirely the product of those six weeks.
Why did a single-page gist do this? That question is the spine of this chapter. The short answer is that the pattern was the last line of a thirty-year tradition of external-brain ideas — and 2026 was the first year the agent layer existed to maintain it.
4.2 The metaphor: Obsidian as IDE, LLM as programmer, wiki as codebase
A one-liner from Karpathy's launch thread compresses the pattern best [3]: Obsidian is the IDE, the LLM is the programmer, the wiki is the codebase. Unpacking:
- codebase = wiki/. An artefact that lives on disk rather than in a head. It can be read, diff'd, reverted via git, shared with another person, and ported to another model.
- programmer = the LLM. The agent — not the human — writes, refactors, and lints this codebase. The human plays PR reviewer: read the diff, approve, reject, request changes.
- IDE = Obsidian (or Cursor, VS Code + Claude Code, Codex). The surface the human uses to read the wiki and pull it into their own thinking. Karpathy specifies Obsidian, but the load-bearing claim is not the IDE — it is that markdown files are first-class citizens for both human and LLM simultaneously.
The metaphor's implications are larger than they look. Everything the phrase "knowledge management system" usually suggests — databases, tag trees, elaborate graph views — is pushed to the side. The core is to let an LLM agent work on a markdown vault the way a programmer works on a codebase in an IDE. The discipline of code review (diff, lint, conventions, tests) becomes the discipline of wiki maintenance. This is why the research-grade schema proposed in (Chapter 6) is not arbitrary prescription — it is a direct port of codebase hygiene to a markdown vault.
4.3 The three-layer structure
Karpathy's gist cuts the vault into three layers [3]. (Chapter 10) of claude-to-codex Part IV already introduced these once; Part II re-establishes them as the canonical unit of analysis for a research-grade external brain.
L1 — raw/ : an immutable source-of-truth
vault/
└── raw/
├── papers/ # PDFs, arXiv originals
├── articles/ # blogs, news, Substack
├── notes/ # meeting memos, journals, handwriting scans
├── bookmarks/ # bookmarks, tweet captures
└── transcripts/ # video and podcast transcripts
The agent reads this layer; it does not write. Raw-immutability anchors every downstream inference. Agentpedia argues that this one rule is the single most consequential line in the gist — the operational property that prevents wiki rot more than any other [9].
L2 — wiki/ : the LLM-authored markdown synthesis
vault/
└── wiki/
├── entities/ # people, projects, companies, models
├── concepts/ # technical concepts, definitions
├── summaries/ # paper and article summaries
├── comparisons/ # comparison / contrast pages
├── claims/ # explicit claims with evidence
├── contradictions/ # contradiction pages
└── open-questions/ # unresolved questions
The agent reads L1 and writes and updates this layer. Humans may edit too, but most edits come from the agent. The human reviews via git diff.
L3 — schema/ : CLAUDE.md, AGENTS.md
An instruction file telling the agent how this vault is organised and how it is to behave: "people live under wiki/entities/. Never modify raw/. Every claim must carry a source ID." Claude Code reads CLAUDE.md; Codex reads AGENTS.md [14]. The fact that these are effectively the same file under two names is itself becoming a cross-vendor convention — taken up again in (Chapter 6).
Three primitives operate on this vault:
- ingest: a new raw arrives → wiki pages created or updated
- query: a question is asked → the wiki is traversed → an answer plus citations is returned
- lint: broken links, duplicates, contradictions, and citation-less claims are reconciled
That ingest and lint come before query is the decisive break from RAG. RAG does all its work at query time; LLM Wiki front-loads synthesis to ingest time and maintenance time.
4.4 CLAUDE.md / AGENTS.md — why instruction files are load-bearing
WebEdge's deep-dive is particularly useful here [11]. CLAUDE.md is not a README. It is the operating contract the agent honours, and six sections appear almost identically across every implementation reviewed.
- Ingest rules — when a new raw arrives, which wiki pages get created, and how they merge with existing ones.
- Citation rules — every claim must carry a source ID pointing into raw/. Citation-less claims fail to save.
- Contradiction-flagging rules — when new raw conflicts with an existing claim, both sides are recorded under contradictions/.
- Link conventions — wikilink format, duplicate-link prevention, broken-link lint rules.
- Lint commands — the lint commands to be run periodically, and how each violation is handled.
- Tool boundaries — when and with what permission the agent may invoke external tools (web search, databases, experiment apparatus).
Agent skills, hooks, subagents, and MCP are downstream of this operating contract. The seven hook-enforced rules proposed in (Chapter 10) of this book begin here. One point worth emphasising: the instruction file is not a list of commands but readable prose. To borrow Karpathy's phrasing — if a human cannot read it and agree, the LLM will not follow it either.
Lobster Pack's "idea file" framing is decisive at exactly this point [10]. What Karpathy shared was not code but the idea file. It works pasted into Codex, Claude Code, OpenCode, Cursor, or Pi. In the era of code, the instruction was bound to the host. In the era of agents, the instruction is host-agnostic markdown. This small format shift is the infrastructural reason six OSS implementations could appear in thirty days.
4.5 index.md, log.md, schema.md — the centre of long-term operation
For a vault to survive past the first month, three meta-files are required. MindStudio, Starmorph, and Fulkerson converge on almost identical conclusions [5].
- index.md: the vault's sitemap and the entry point for every agent query. lint uses it as the baseline against which orphan pages and broken links are caught.
- log.md: a machine-readable history of ingest, query, and lint operations — when, what, from which raw, into which page. Fulkerson treats this as the minimum unit of production observability for a wiki [16].
- schema.md: the schema for the wiki pages themselves — required fields for a claim page, an entity page, a contradiction page. (Chapter 6) extends this for research use.
These three files do more than organisational hygiene. They force the agent to treat the wiki not as "one large thing to remember" but as a structured file system. Aimaker's four-month longitudinal report — a single user — observed that vaults missing log.md began "to repeat themselves" [17]. n=1, but the signal connects directly to the wiki-rot discussion of (Chapter 6).
4.6 RAG vs LLM Wiki — when do you synthesise?
The most common misreading is "how is this any different from RAG?" The largest of the three durable camps in the first-day Hacker News thread asked exactly that [18], and Denser — itself a RAG vendor — answered honestly from the other side [7]. The clean summary:
| Dimension | Vanilla RAG (Lewis 2020 + Karpukhin 2020 + Johnson 2019) | Atlas (Izacard 2022) | LLM Wiki (Karpathy 2026) |
|---|---|---|---|
| Memory unit | vector-DB chunks | jointly trained retriever + generator | human-readable markdown pages |
| When synthesis happens | query-time | query-time (learned augmentation) | ingest-time + maintenance-time |
| Knowledge accumulation | weak — re-interpreted per question | weak — absorbed into weights | strong — concepts, claims, contradictions compound |
| Transparency | low — chunks hidden | low — weights opaque | high — files, git, diffs |
| Infra dependence | retrieval stack | training pipeline | markdown + git |
| Fits best | one-shot Q&A | knowledge-intensive QA | long-running research, hypothesis management, literature tracking |
| Main failure modes | retrieval miss, chunk pollution | weight staleness, refresh cost | wiki rot, synthesis error, lost provenance |
The decisive variable is when synthesis happens. RAG synthesises at query time; Atlas synthesises into weights; LLM Wiki synthesises at ingest and maintenance time, persisting the result to disk [19].
Denser's load-bearing insight is the cost/quality trade-off [7]. Ingest-time synthesis is more expensive up-front but cheaper and more accurate at query time when the same corpus is queried repeatedly. A researcher re-reads the same set of papers dozens of times. Having RAG re-interpret the same chunks via the same LLM in the same way every query is waste. LLM Wiki leaves that synthesis on disk as a result-cached-once.
Conversely, LLM Wiki is overkill for one-shot Q&A. Denser — themselves a RAG vendor — settles the point exactly: the two patterns are complementary, not competitive. This book accepts that framing. LLM Wiki is not a replacement for RAG; it is a compiler that adds a kind of synthesis to a research corpus that RAG cannot reach.
A note on the most-quoted figure in the launch coverage. MindStudio reports a case study of "383 files + 100 meeting transcripts → 95% Claude-token reduction vs naive RAG" [5]. The qualitative claim (a wiki is cheaper than re-RAG for repeated queries on the same corpus) is true. The specific 95% number is vendor-reported and not independently benchmarked; it depends heavily on corpus shape, query distribution, and the comparison baseline. G2 in (Chapter 5) revisits this directly.
4.7 Thirty years of PKM — Bush → Luhmann → Ahrens → Karpathy
Karpathy's gist hit 16M views partly because it felt fresh, and partly because it was the last line of a thirty-year tradition. The second half is what this section makes visible.
1945 — Vannevar Bush, the Memex [23]. In the July 1945 Atlantic, Bush — then director of the OSRD — diagnosed hierarchical indexing as the wrong substrate for human thought and proposed the Memex: a microfilm-and-desk apparatus to follow "associative trails" across primary documents. The Memex is the historical root from which Engelbart NLS, Nelson Xanadu, the World Wide Web, Wikipedia, Roam, and Obsidian all descend. The 2026 LLM Wiki adds exactly one line to this 80-year proposal: the associative trail is now automatically maintained by an LLM agent.
1992 — Niklas Luhmann, Zettelkasten [24]. Luhmann's essay describes a slip-box of ~90,000 hand-numbered cards linked by ID — a "communication partner" that surprised its owner with connections he himself had not designed. The two disciplines codified here — atomic notes and dense linking — anchor every later PKM system. The moment an LLM Wiki agent surfaces an unbidden contradictions/ page to its user is the digital analogue of Luhmann's "surprising second mind."
1998 — Andy Clark and David Chalmers, The Extended Mind [25]. The parity principle: if an external resource reliably plays the functional role of a cognitive process, it belongs to the mind. Otto's notebook deserves the same epistemic standing as Inga's biological memory. This book's stance — that an LLM Wiki is not a tool one uses but the external cortex of one's cognition — rests directly on this 1998 argument. A 28-year-old philosophy-of-mind paper supplies the ethical and epistemic warrant for the markdown vault of 2026.
2017 — Sönke Ahrens, How to Take Smart Notes [26]. Ahrens codifies Luhmann into a workable modern workflow: fleeting → literature → permanent notes. The "permanent note must work for a future stranger reading it without context" discipline is exactly the atomic-note design constraint the 2026 LLM Wiki implementations inherit. Both Astro-Han and ekadetov cite atomic-note discipline explicitly as a design constraint [36].
2017 — Andrej Karpathy, Software 2.0 [4]. The essay reframes neural networks as a new programming substrate where weights — not code — are the executable knowledge under version control. The 2026 LLM Wiki is the same frame translated sideways: replace weights with markdown pages. The same author named two substrates nine years apart.
2020 — Lewis et al., RAG; Karpukhin et al., DPR; 2019 — Johnson et al., FAISS [19]. The period in which vector-store retrieval became the industrial default — and the baseline against which LLM Wiki explicitly defines itself.
2022 — Izacard et al., Atlas [22]. Jointly trained retriever + generator; 11B Atlas beats 540B parametric-only by 3 points. The strongest RAG baseline. Karpathy's gist does not dismiss Atlas; it positions it as "synthesis at a different time."
2023 — MemGPT (Packer et al.); Voyager (Wang et al.); Generative Agents (Park et al.); Reflexion (Shinn et al.); ReAct (Yao et al.); Toolformer (Schick et al.); CoT (Wei et al.) [27]. The substrate of LLM-era external memory and agentic reasoning. MemGPT borrows OS virtual memory to page "main context ↔ archival." Karpathy's LLM Wiki generalises that archival store from a vector index to a human-readable markdown vault. Voyager's skill library is the cleanest pre-Karpathy proof of the "wiki accumulates, RAG re-retrieves" distinction. Generative Agents' memory-stream + reflection architecture is the existence proof that ingest and lint can be run by the agent itself.
2024 — FutureHouse WikiCrow and PaperQA2 [34]. Eighteen months before Karpathy, FutureHouse already demonstrated LLM-maintained wikis with PaperQA2 and the WikiCrow demo. The target was superhuman literature search, not personal knowledge. Karpathy's 2026 contribution was to generalise the same pattern from a company-grade system to a personal idea file. This is the exact reason this chapter — and (Chapter 1) — treats Karpathy as an integrator rather than the inventor (gap G15).
2026 — Karpathy, LLM Wiki gist + Farzapedia follow-up [3]. In the 2026-04-12 Farzapedia reply, Karpathy names the four properties an LLM Wiki should have: Explicit (a memory artefact you can read and audit), Yours (file ownership, no app lock-in), Files over apps (markdown durability), BYOAI (model-agnostic). Farza's personal wiki compresses 2,500 raw inputs (diary entries, Apple Notes, iMessage) into roughly 400 wiki documents. GeekNews crystallised these four properties for the Korean community and made "BYOAI / files-over-apps" canonical vocabulary in that locale [13].
The thirty-year arc in one sentence: the external-brain idea has existed since 1945; what 2026 added is the agent that maintains it automatically. Karpathy's largest contribution is not the invention of a pattern but the compression of an 80-year line into a one-page idea file delivered to 16 million people simultaneously. This is why this book treats Karpathy as integrator throughout (Chapter 1) and the rest of Part II.
4.8 Dissent and counter-takes — kept on record
To avoid a triumphal close, three strands of dissent are placed on the record here.
"This is just RAG with extra steps" (the largest of three durable camps in the day-one HN thread) [18]. Reply: at the surface the two patterns look similar absent the words query-time and ingest-time. But Denser, Cognition, and Karpathy himself converge on the same point: for a research environment that re-queries the same corpus repeatedly, ingest-time synthesis is meaningfully better on both cost and quality [6]. For one-shot Q&A, RAG remains the right answer.
"Model collapse — isn't this training on the model's own output?" (cleanly answered in the HN thread by user gojomo) [18]. Reply: LLM Wiki is inference-time organisation, not training-time recursion. raw/ is immutable; wiki/ is re-synthesised from raw/ each time; weights never update. The distinction neutralises the model-collapse concern.
"10M+ context windows will make this obsolete" (the third HN camp). Reply: partly correct. The larger the context window, the smaller the marginal saving of ingest-time synthesis. But (a) git diffs, citations, and human-readable prose remain valuable independent of context size, and (b) for any researcher whose mental model of the field must itself stay alive, the LLM Wiki is not a tool but an external cortex [25]. That survives regardless of window size.
The Zettelkasten critique [38]. Wenhao Yu argues that LLM Wiki short-circuits the personal cognitive work — the linking-as-thinking step — that makes Zettelkasten valuable. The vault grows while the user's mental model does not: cognitive offloading as a failure mode. (Chapter 11) takes this on directly. The brief reply: Yu is correct. Without atomic-note discipline (Ahrens) enforced at the schema level, an LLM Wiki becomes a second junk drawer rather than a second brain. The (Chapter 6) schema treats this as a first-class concern.
The graph-DB-missing argument [39]. wikilink graphs degrade at scale. For use cases where citation graphs, experiment lineage, or ACL boundaries are the load-bearing structure, a real graph-DB layer is required. This book accepts that critique. File-only purism is a sensible default at personal and small-team scale; at enterprise and lab-lineage scale, hybrid layering is correct. OmegaWiki's 9 entity types × 9 edge types knowledge graph sitting on top of a wiki is the contemporary existence proof [35].
Enterprise reality check [40]. RBAC, retention, audit, and classification metadata — none of these appear in Karpathy's single-user gist. An enterprise LLM Wiki has to add file-level RBAC, immutable edit history, and retention policy. (Chapter 6) treats the enterprise dimension at its close.
A final note on G15 — the Karpathy self-citation loop [42]. The six OSS implementations this book reviews and most of the analyst pieces are downstream of Karpathy's framing. Counter-takes (Infranodus, Wenhao Yu, AI Critique, innobu) are minority voices. This book marks that dependence explicitly here and in (Chapter 1), and positions Karpathy not as origin but as integrator of a line stretching back to Bush 1945. That conscious positioning is the survey's response to G15.
4.9 Onward — the OSS and content matrix
Karpathy's gist defined the pattern. The six weeks that followed are the story of how the pattern unfolded into code, prose, and video. (Chapter 5) ships the matrix: six OSS implementations (Astro-Han, lucasastorian, ussumant, ekadetov, OmegaWiki, Mcptube), the blog landscape, the video lineage, and the HN/Reddit/GeekNews viral arc. It is the most information-dense chapter of this book. (Chapter 6) then proposes a research-grade schema on top of that landscape, framed honestly against an empirical hole that has not yet been measured.
References
- Karpathy, A. (2026). LLM Wiki — A pattern for building personal knowledge bases using LLMs. GitHub Gist, 2026-04-04.
- Karpathy, A., "LLM Wiki announcement (Twitter/X thread)," 2026-04-04. [Karpathy, 2026]
- Karpathy, A., "Farzapedia reply — personalization argument for LLM Wiki," 2026-04-12. [Karpathy, 2026]
- Karpathy, A. (2017). Software 2.0. Medium.
- MindStudio (2026). What Is Andrej Karpathy's LLM Wiki? How to Build a Personal Knowledge Base With Claude Code. MindStudio Blog. [MindStudio, 2026]
- Cognition AI (2026). llm-wiki: the reference implementation of Karpathy's self-building AI memory pattern. Cognition blog (re-syndicated). [Cognition, 2026]
- Denser.ai (2026). From RAG to LLM Wiki: What Karpathy's idea means for AI knowledge bases. Denser.ai blog. [Denser, 2026]
- Analytics Vidhya (2026). LLM Wiki Revolution: How Andrej Karpathy's Idea is Changing AI. Analytics Vidhya blog. [Analytics Vidhya, 2026]
- Agentpedia (2026). Karpathy's LLM Wiki: The Complete Guide to His Idea File. Agentpedia blog. [Agentpedia, 2026]
- Lobster Pack (2026). Karpathy's LLM Wiki and the rise of "idea files" — why sharing instructions beats sharing code. Lobster Pack blog. [Lobster Pack, 2026]
- WebEdge (2026). Karpathy's LLM Knowledge Base System: Full Breakdown of His CLAUDE.md Schema. MindStudio Blog (WebEdge attribution). [WebEdge, 2026]
- Starmorph (2026). Karpathy's LLM Wiki: Step-by-step setup guide. Starmorph blog. [Starmorph, 2026]
- Park, J. (2026). RAG is forgotten: Karpathy's "LLM Wiki" and a new knowledge-management paradigm (Korean). GeekNews / WikiDocs blog. [Park, 2026]
- Anthropic (2026). Claude Code documentation. Anthropic docs. [Anthropic, 2026]
- OpenAI (2026). Custom instructions with AGENTS.md (Codex). OpenAI Developers Portal. [OpenAI, 2026]
- Fulkerson, A. (2026). Karpathy's Pattern for an LLM Wiki in Production. aaronfulkerson.com blog. [Fulkerson, 2026]
- Aimaker (2026). AI-powered second brain from LLM Wiki — 4-month report. Aimaker Substack. [Aimaker, 2026]
- Hacker News community, "LLM Wiki — example of an 'idea file' (Hacker News front-page thread)," 2026-04-04. [HN, 2026]
- Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020. arXiv:2005.11401.
- Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., et al. (2020). Dense Passage Retrieval for Open-Domain Question Answering. EMNLP 2020. arXiv:2004.04906.
- Johnson, J., Douze, M., and Jégou, H. (2019). Billion-scale similarity search with GPUs (FAISS). IEEE Transactions on Big Data. arXiv:1702.08734. DOI:10.1109/TBDATA.2019.2921572.
- Izacard, G., Lewis, P., Lomeli, M., Hosseini, L., Petroni, F., Schick, T., et al. (2022). Atlas: Few-shot Learning with Retrieval Augmented Language Models. JMLR 2023. arXiv:2208.03299.
- Bush, V. (1945). As We May Think (the Memex proposal). The Atlantic Monthly, July 1945.
- Luhmann, N. (1992). Communicating with Slip Boxes — An Empirical Account. Universität Bielefeld (translated essay).
- Clark, A. and Chalmers, D. (1998). The Extended Mind. Analysis 58 (1): 7-19. DOI:10.1093/analys/58.1.7.
- Ahrens, S. (2017). How to Take Smart Notes. Book (CreateSpace / Independently Published).
- Packer, C., Wooders, S., Lin, K., Fang, V., Patil, S. G., Stoica, I., and Gonzalez, J. E. (2023). MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560.
- Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., and Anandkumar, A. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. TMLR 2024. arXiv:2305.16291.
- Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., and Bernstein, M. S. (2023). Generative Agents: Interactive Simulacra of Human Behavior. UIST 2023. arXiv:2304.03442. DOI:10.1145/3586183.3606763.
- Shinn, N., Cassano, F., Berman, E., Gopinath, A., Narasimhan, K., and Yao, S. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. NeurIPS 2023. arXiv:2303.11366.
- Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023. arXiv:2210.03629.
- Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., and Scialom, T. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. NeurIPS 2023. arXiv:2302.04761.
- Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS 2022. arXiv:2201.11903.
- FutureHouse (2024). PaperQA2: Superhuman scientific literature search (FutureHouse announcement). FutureHouse blog. [FutureHouse, 2024]
- skyllwt (DAIR Lab, PKU) (2026). OmegaWiki — Wiki-centric full-lifecycle AI research platform on Claude Code. GitHub. [skyllwt, 2026]
- Astro-Han (2026). Astro-Han/karpathy-llm-wiki — Agent Skills-compatible LLM Wiki for Claude Code/Cursor/Codex. GitHub. [Astro-Han, 2026]
- ekadetov (2026). ekadetov/llm-wiki — Claude Code plugin for persistent compounding KBs in Obsidian. GitHub. [ekadetov, 2026]
- Yu, W. (2026). What Is Karpathy's LLM Wiki? A Zettelkasten User's Honest Review. yu-wenhao.com blog. [Yu, 2026]
- Infranodus (2026). Infranodus on LLM Wiki — graph DBs as the missing layer. Infranodus blog. [Infranodus, 2026]
- innobu (2026). Karpathy's LLM Wiki: Second Brain and the Enterprise Reality Check 2026. innobu blog. [innobu, 2026]
- AI Critique (2026). Andrej Karpathy's latest concept 'LLM Wiki' and the future of enterprise knowledge. AI Critique blog. [AI Critique, 2026]
- Critical Analyst (2026). Research gap analysis — gaps.md (internal). terry-surveys repo. [Critical Analyst, 2026]