Chapter 5: The OSS and Content Matrix That Exploded (April-2026 Focus)
5.1 Six weeks of explosion — what made this possible now
As (Chapter 4) recounted, Karpathy's gist was a single-page idea file [3]. What this chapter does is collect everything that happened in the next six weeks into one set of tables. Within thirty days, six independent OSS implementations appeared on GitHub; MindStudio, Cognition, Denser, Analytics Vidhya, Agentpedia, Lobster Pack, WebEdge, and Starmorph published deep-reads in near-simultaneous succession; Starmorph, Nate Herk, and Paige produced the canonical setup-guide video lineage; Hacker News generated three front-page Show HN threads in addition to the launch thread; and in Korean, GeekNews crystallised the "BYOAI / files-over-apps" vocabulary inside a week [13].
Why now? Three pieces of infrastructure matured in the same quarter. (a) Context-window expansion — Claude Code and Codex's 100K–1M context windows made it possible to hold an entire vault in scope. (b) Agent SDK stabilisation — subagents, hooks, skills, MCP, and the AGENTS.md convention converged into roughly the same shape across vendors [36]. (c) The "idea file" format itself — the realisation that what you copy-paste is markdown instruction, not code, and that this makes the pattern host-agnostic [19]. Without these three meeting in the same quarter, the same gist would not have produced six OSS implementations within thirty days regardless of the tweet's view count.
This chapter promises one thing: it will not tell you which one to install via star count. Stars measure attention, not function. The stance taken in (Chapter 4) holds — the table below is a feature matrix, not a performance ranking [38]. What this chapter collects is the information a reader needs to decide which implementation fits which use case.
5.2 Six OSS implementations — the feature matrix
This is the most information-dense section of the book. Table 5.1 compares the six GitHub implementations along uniform dimensions. The comparison is by positioning, not by stars.
Table 5.1 — Six OSS LLM Wiki implementations (feature matrix, May 2026)
| Dimension | Astro-Han/karpathy-llm-wiki | lucasastorian/llmwiki | ussumant/llm-wiki-compiler | ekadetov/llm-wiki | skyllwt/OmegaWiki | 0xchamin/Mcptube |
|---|---|---|---|---|---|---|
| Positioning | Agent Skills package | upload + MCP hosted service | knowledge compiler plugin | Obsidian × Claude Code binding | full-lifecycle research platform | YouTube-to-wiki adapter |
| Primary integrations | Claude Code / Cursor / Codex | Claude (MCP) | Claude Code / Codex | Claude Code + Obsidian | Claude Code (23 skills) | Claude Desktop / Cursor / VS Code Copilot (MCP) |
| Ingest path | user's own raw/ directory → SKILL.md workflow | web upload UI for documents | markdown directory or codebase → compiled into topic-wiki | Obsidian vault → ingest command | paper PDF + web + notes → KG ingest pipeline | YouTube URL → transcript + scene frames |
| Wiki structure | concept / entity / summary / contradiction pages + examples/ | per-uploaded-document wiki + chat | topic pages (idempotent, re-buildable) | Obsidian wikilinks + Dataview frontmatter + qmd hybrid search | 9 entity types × 9 edge types KG + claim/idea/experiment pages | YouTube entity/concept pages + scene-level frame references |
| Distinguishing feature | SKILL.md standardised packaging — an installable idea file | "service" shape (lowest activation cost) | wiki-as-build-artifact — re-runnable | Obsidian graph view + Dataview leverage | full research lifecycle + a second-LLM reviewer + anti-repetition memory of failed experiments | scene-change perceptual chunking + vision-model captioning |
| License | MIT (presumed; verify) | (verify) | (verify) | (verify) | (verify) | (verify) |
| Stars (May 2026) | ~495 | ~896 | ~49 (plus atomicstrata fork etc.) | (small star count but Obsidian power-user adoption) | (the most ambitious; star count not captured — fact-checker to verify) | Show HN front-page exposure (item 47754559) |
| Recent commits | 2026-04~05 multi-release cadence | 2026-04~05 active | 2026-04 commits + active forks | 2026-04 active | DAIR Lab (PKU) active | v0.1 → v2 (mcptube-vision) migration in progress |
| Main limitation / GAP | host-specific template branches — cross-host portability is partial | dependence on the hosted service (self-host option needs checking) | small star count — early adoption is limited | small star count — community signal is weak | high complexity — entry barrier is high | YouTube-only modality |
| Primary sources | [4] | [11]; [13] (Show HN) | [6] | [7] | [8] | [9]; [13] (Show HN) |
G2 emphasis — this table is not a benchmark. Star counts are an attention proxy, not a functional comparison. No primary source this survey reviewed provides a controlled comparison across all six on (a) the same raw/ corpus, (b) the same agent host, (c) the same measurable outputs. The analyst pieces — Agentpedia, MindStudio, Analytics Vidhya, WebEdge — also rank by stars or curated vibe [14]. (Chapter 5.4) sketches what a real benchmark would have to control for.
5.2.1 Six in one line each
A five-second summary after the table:
- Astro-Han — the first attempt to make Karpathy's gist an installable Agent Skill, packaging compatibility for Claude Code / Cursor / Codex in a single bundle [4].
- lucasastorian — an upload-first hosted service. Its higher star count is best explained by activation cost: less-technical users can adopt it without installing anything [11].
- ussumant — makes the verb compile explicit. The wiki is treated as a re-runnable build artefact [6].
- ekadetov — treats the Obsidian vault as a first-class citizen, leveraging wikilinks, Dataview, and qmd hybrid search [7].
- OmegaWiki — the most ambitious of the six. The wiki is the SSOT (single source of truth) for the entire research lifecycle, with 23 Claude Code skills sharing a single vault [8].
- Mcptube — extends the modality to YouTube; the first demonstration that the same pattern operates beyond text corpora [9].
5.2.2 OmegaWiki — why it deserves its own paragraph
Of the six, OmegaWiki is the one this chapter pulls aside for separate treatment. The other five treat LLM Wiki as a personal external brain; OmegaWiki treats it as the SSOT of an entire research lifecycle [8]. Concretely:
- 23 Claude Code skills share a single wiki: paper ingestion → knowledge graph → gap detection → idea generation → experiment design → paper writing → peer-review response.
- The knowledge graph carries 9 entity types and 9 edge types. Edge predicates include extends / contradicts / supports / inspired_by / tested_by / invalidates / supersedes / addresses_gap / derived_from.
- Failed experiments and discovered knowledge gaps are stored as "anti-repetition memory" — a schema-level guard against re-trying the same dead end.
- A second LLM is wired in as an independent reviewer of ideas / experiments / drafts.
OmegaWiki is the single most direct primary source linking Part II's LLM Wiki to Part III's AI Scientist genealogy. Its appearance in (Chapter 7) is not a coincidence — it is the natural extension of the same pattern. The fact that it ships from DAIR Lab at Peking University is itself a notable case of academic discipline returning as OSS code [38].
5.3 Show HN and community threads — how the pattern was received
The OSS code shows what was built. The community threads show which questions survived. Within the same six weeks, Hacker News carried one launch thread and three Show HNs.
Table 5.2 — Four HN threads tracked
| Thread | Date | Which questions dominated |
|---|---|---|
| Karpathy launch ([13], item 47640875) | 2026-04-04~05 | (a) "is this just RAG with extra steps?" (the largest camp) (b) "model-collapse risk" — gojomo's answer (inference-time organisation, not training-time recursion) closed the thread (c) "10M-token contexts will make this obsolete" |
| Show HN: lucasastorian/llmwiki ([13], item 47656181) | 2026-04 mid | (a) upload-first UX (b) MCP wiring (c) "why not RAG" returns |
| Show HN: agent-maintained wiki ([13], item 47899844) | 2026-05 early | (a) lint reliability (b) ingest concurrency — what happens when two agents write the same page? (c) production reliability |
| Show HN: Mcptube ([13], item 47754559) | 2026-04 late | (a) why fixed-token chunking is the wrong unit for video (b) scene-change-based chunking (c) idempotency for re-uploaded videos |
The arc is unmistakable: the first thread asks "what is this?"; the last thread asks "how do we keep it from breaking in production?" Lint thrash, ingest concurrency, idempotency — within six weeks, community questions moved from conceptual to operational. That movement is itself evidence the pattern has passed through hype and is now beginning to mature [13].
5.4 G2 — the open-benchmark gap
(Chapter 4) flagged it briefly; this chapter makes it explicit. There is no single benchmark in any primary source this survey reviewed that compares the six OSS implementations on a level playing field [38]. The survey's contribution here is to sketch what such a benchmark would have to measure.
Proposal — sketch of an LLM Wiki Bench
- Fixed input: the same raw/ corpus across all systems. E.g., 30 papers (arXiv 2024-08~2026-04) + 20 daily notes + 10 meeting transcripts.
- Fixed agent host: the same model at the same context size. Pin to either Claude Sonnet 4.7 or Codex GPT-5.5.
- Fixed query set: a five-set held-out evaluation. Factual Q (n=20), comparison Q (n=10), hypothesis-inference Q (n=10), etc.
- Measured outputs:
- page count (number of wiki pages produced)
- claim-with-source ratio (fraction of claims carrying a source ID)
- redundancy rate (% duplicate pages)
- query latency (per-question response time)
- query precision on the held-out QA set
- cost per ingest+query+lint cycle (USD)
- Reproducibility: same seed, same ingest order, same lint frequency.
This survey does not run that benchmark. Stating clearly that no one has run it yet is the survey's response to G2. Anyone with a weekend can run it, and if the results turn the table in (5.2) into a ranking, this book welcomes that outcome [38].
5.5 Analyst writing and guides — the blog/article matrix
The same six weeks produced an explosion of analytical writing. The key sources already touched on in (Chapter 4) are collected here as one table.
Table 5.3 — Key analyst pieces (2026-04~05)
| Source | Timing | Vantage | Main contribution |
|---|---|---|---|
| MindStudio [14] | mid 2026-04 | post-explosion | most-cited explainer; "three-layer" vocabulary; 95% token-reduction case study (vendor-reported) |
| Cognition [15] | mid 2026-04 | agent vendor | crystallised the "reference implementation" framing; "files over apps" canonicalised in English |
| Denser [16] | mid 2026-04 | RAG vendor | framed RAG vs Wiki as a cost/quality trade-off; coined "compiled knowledge artefact" |
| Analytics Vidhya [17] | late 2026-04 | mainstream DS audience | the signal that the pattern crossed from niche to mainstream; step-by-step Claude Code guide |
| Agentpedia [18] | mid 2026-04 | analyst blog | "three-layer OS" analytical read; identified raw-immutability as the load-bearing rule against wiki rot |
| Lobster Pack [19] | late 2026-04 | conceptual blog | "idea file" vocabulary — why the same pattern shows up across multiple hosts |
| WebEdge [20] | 2026-04~05 | hosted by MindStudio | crystallised the CLAUDE.md six-section schema |
| Starmorph [22] | mid 2026-04 | blog + video | added inbox/ and logs/ directories the gist omitted; later picked up by WebEdge and Cognition |
| Data Science Dojo [23] | late 2026-04 | educational | "five papers in thirty minutes" — the lowest-friction onboarding |
| Joshi (Medium) [24] | early 2026-04 | first-person walkthrough | validated the pattern with bare directories before any OSS tooling existed — the pre-tooling baseline |
| Fulkerson [25] | 2026-04-12 | infra-eng practitioner | production operation: backup, replication, multi-device sync |
| Aimaker [26] | 2026-04~05 (4-month report) | longitudinal n=1 | wiki rot, weekly lint cadence, concept-page duplication |
| Yu (Zettelkasten review) [27] | late 2026-04 | Zettelkasten power-user | cognitive-offloading critique — revisited in (Chapter 11) |
| Infranodus [28] | late 2026-04 | graph-DB vendor | wikilinks degrade at scale — proposes a KG layer |
| innobu [29] | 2026-04~05 | enterprise lens | RBAC / retention / audit — the enterprise reality check |
| AI Critique [30] | 2026-05-08 | enterprise KMS analysis | competitive analysis vs Notion AI / Atlassian Rovo / Glean |
| Global Advisors [31] | 2026-04-06 | strategy-consulting glossary | lexical canonicalisation two days after the gist — the fastest term-of-art crystallisation in the corpus |
Korean community — its own paragraph
GeekNews crystallised the four properties for Korean readers immediately after launch: Explicit / Yours / Files over apps / BYOAI [32]. The post named the Korean SaaS KMS products this threatens, and framed BYOAI as a procurement consideration, not hobbyist preference. As (Chapter 4)'s G4 noted, English-language analyst pieces rarely cite this Korean thread [38]. The Korean edition of this survey is the place that gap gets filled.
TiddlyWiki — cross-pollination
The strongest evidence that the LLM Wiki pattern is not bound to Karpathy's exact file structure is the TiddlyWiki community's absorption of it [33]. Logseq, Obsidian, TiddlyWiki — any markdown-or-wikitext store with link semantics can host the pattern.
5.6 The video matrix — the "Full Beginner Setup Guide" lineage
Video moved a beat faster than prose. Within a month, a setup-guide-shaped video had become canonical.
Table 5.4 — Key videos (2026-04~05)
| Video | Upload | Where | Contribution |
|---|---|---|---|
| Nate Herk, "Karpathy 10x'ed Claude Code" [34] | 2026-04-05 (day after the gist) | YouTube | coined the phrasing "LLM Wiki is a 10x tool — it converts a one-shot chat tool into a compounding personal knowledge engine"; carried into claude-to-codex (Chapter 10) |
| Starmorph, "LLM Wiki — Full Beginner Setup Guide" [22] | mid 2026-04 | YouTube | first canonical setup-guide; CLAUDE.md authoring, raw/ + wiki/ creation, first ingest pass on Claude Code |
| Paige, "Second-brain setup using Karpathy's LLM Wiki" [35] | late 2026-04 | YouTube | a personal-productivity variant rather than a research variant — same pattern, different archetype |
| Mcptube channel + Show HN demo [9] | late 2026-04 | YouTube + GitHub README | a demo of converting video into an LLM Wiki source — first video of the modality extension |
The video train diverged faster than the prose train. Herk coined the catch line; Starmorph established the step-by-step canon; Paige produced the personal-productivity variant; Mcptube demonstrated the modality extension. Within a single month, four videos covered four distinct personas of the same pattern. View counts for the setup-guide videos were not captured precisely at retrieval time and are pending fact-checker verification [38].
5.7 What changed over six weeks — the temporal arc
Table 5.5 re-projects all of the above onto a timeline.
Table 5.5 — 2026-04-04 to 2026-05-22 temporal arc
| Date | Event | Note |
|---|---|---|
| 2026-04-04 | Karpathy gist + launch tweet [3] | day 0 — 16M+ tweet impressions |
| 2026-04-04~05 | HN launch thread [13] | the "RAG with extra steps" debate begins; gojomo's model-collapse rebuttal settles part of the thread |
| 2026-04-05 | Nate Herk "10x" video [34] | vocabulary crystallised |
| 2026-04-06 | Global Advisors glossary entry [31] | analyst-lexicon canonicalisation (D+2) |
| 2026-04-08~14 | MindStudio, Cognition, Denser, Agentpedia deep-reads [14] | "three-layer" vocabulary crystallised |
| 2026-04-08~10 | Joshi Medium walkthrough [24] | pre-tooling baseline |
| 2026-04-10~15 | Astro-Han, lucasastorian, ussumant, ekadetov first releases [4] | four of the six OSS land |
| 2026-04-12 | Karpathy's Farzapedia reply [3] | "Explicit / Yours / Files over apps / BYOAI" four properties crystallised |
| 2026-04-12 | Fulkerson production-pattern post [25] | infra-eng lens enters the conversation |
| 2026-04-15 | GeekNews Korean canonicalisation [32] | Korean-language vocabulary established |
| 2026-04 mid-late | Starmorph step-by-step + video [22] | setup-guide canon |
| 2026-04 late | Lobster Pack "idea file" framing [19] | host-portability made conscious |
| 2026-04 late | Show HN: lucasastorian, Mcptube [13] | "is this RAG?" gives way to operational questions |
| 2026-04 late | Analytics Vidhya mainstream explainer [17] | mainstream-DS entry |
| 2026-04 late | OmegaWiki appears [8] | scope extends to full research lifecycle |
| 2026-04~05 | Yu, Infranodus, innobu counter-takes [27] | dissent voice forms |
| 2026-05 early | Show HN: agent-maintained wiki [13] | thread questions move to lint/concurrency |
| 2026-05-08 | AI Critique enterprise analysis [30] | enterprise KMS threat analysis |
| 2026-05 mid-late | Aimaker 4-month retrospective [26] | longitudinal n=1 appears |
The shape this table exposes is the body of (Chapter 5)'s claim: over six weeks, community attention moved from "what is this?" to "how do I follow along?" to "where does this break?" The conceptual → operational → longitudinal three-phase progression is also the spine of Part II of this book. (Chapter 5) sits at phase two (operational); (Chapter 6) takes on phase three (longitudinal — wiki rot).
5.8 Closing — what the matrix says
This chapter said one thing with five tables: in six weeks, LLM Wiki moved from a one-page idea file to a generation of external-brain tooling. The shape of that move:
- On the code side, six OSS implementations occupied distinct positions: Agent Skills package / hosted service / compiler / Obsidian binding / research-lifecycle platform / video-modality extension.
- On the analyst side, MindStudio set the vocabulary, Cognition the reference position, Denser the cost/quality frame, Agentpedia the raw-immutability justification, Lobster Pack the host-portability language, WebEdge the CLAUDE.md six-section schema.
- On the video side, Herk supplied the catch line, Starmorph the setup-guide canon, Paige the personal-productivity variant, and Mcptube the modality extension.
- On the counter-take side, Yu (Zettelkasten), Infranodus (graph DB), innobu and AI Critique (enterprise reality) formed a dissent voice. This survey does not bury them as minority noise; it carries them explicitly.
- On the temporal axis, six weeks moved through conceptual → operational → longitudinal. (Chapter 6) takes on the most-unmeasured risk of the longitudinal phase: wiki rot.
A final reiteration of one thing this chapter did not promise. It does not tell you which OSS implementation to install. The tables are feature matrices, not performance rankings. Until someone runs the LLM Wiki Bench sketched in (5.4) over a weekend, this survey does not lean on star counts. That honesty is the survey's response to G2 [38].
References
- Karpathy, A. (2026). LLM Wiki — A pattern for building personal knowledge bases using LLMs. GitHub Gist, 2026-04-04.
- Karpathy, A., "LLM Wiki announcement (Twitter/X thread)," 2026-04-04. [Karpathy, 2026]
- Karpathy, A., "Farzapedia reply — personalization argument for LLM Wiki," 2026-04-12. [Karpathy, 2026]
- Astro-Han (2026). Astro-Han/karpathy-llm-wiki — Agent Skills-compatible LLM Wiki for Claude Code/Cursor/Codex. GitHub. [Astro-Han, 2026]
- Astorian, L. (2026). lucasastorian/llmwiki — Open-source LLM Wiki with document upload + Claude MCP. GitHub. [Astorian, 2026]
- ussumant (2026). ussumant/llm-wiki-compiler — Claude Code plugin: markdown knowledge → topic-based wiki. GitHub. [ussumant, 2026]
- ekadetov (2026). ekadetov/llm-wiki — Claude Code plugin for persistent compounding KBs in Obsidian. GitHub. [ekadetov, 2026]
- skyllwt (DAIR Lab, PKU) (2026). OmegaWiki — Wiki-centric full-lifecycle AI research platform on Claude Code. GitHub. [skyllwt, 2026]
- 0xchamin (2026). Mcptube — Karpathy's LLM Wiki applied to YouTube (transcripts + vision frames). GitHub + Hacker News Show HN. [0xchamin, 2026]
- Hacker News community, "LLM Wiki — example of an 'idea file' (Hacker News front-page thread)," 2026-04-04. [HN, 2026]
- Astorian, L. and Hacker News community, "Show HN: LLM Wiki — Open-Source Implementation of Karpathy's LLM Wiki (lucasastorian)," 2026-04. [HN, 2026]
- Hacker News community, "Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git)," 2026-05. [HN, 2026]
- 0xchamin and Hacker News community, "Show HN: Mcptube — Karpathy's LLM Wiki idea applied to YouTube videos," 2026-04. [HN, 2026]
- MindStudio (2026). What Is Andrej Karpathy's LLM Wiki? How to Build a Personal Knowledge Base With Claude Code. MindStudio Blog. [MindStudio, 2026]
- Cognition AI (2026). llm-wiki: the reference implementation of Karpathy's self-building AI memory pattern. Cognition blog (re-syndicated). [Cognition, 2026]
- Denser.ai (2026). From RAG to LLM Wiki: What Karpathy's idea means for AI knowledge bases. Denser.ai blog. [Denser, 2026]
- Analytics Vidhya (2026). LLM Wiki Revolution: How Andrej Karpathy's Idea is Changing AI. Analytics Vidhya blog. [Analytics Vidhya, 2026]
- Agentpedia (2026). Karpathy's LLM Wiki: The Complete Guide to His Idea File. Agentpedia blog. [Agentpedia, 2026]
- Lobster Pack (2026). Karpathy's LLM Wiki and the rise of "idea files" — why sharing instructions beats sharing code. Lobster Pack blog. [Lobster Pack, 2026]
- WebEdge (2026). Karpathy's LLM Knowledge Base System: Full Breakdown of His CLAUDE.md Schema. MindStudio Blog (WebEdge attribution). [WebEdge, 2026]
- Starmorph (2026). Karpathy's LLM Wiki: Step-by-step setup guide. Starmorph blog. [Starmorph, 2026]
- Starmorph (2026). Karpathy's LLM Wiki — Full Beginner Setup Guide (video). YouTube. [Starmorph, 2026]
- Data Science Dojo (2026). The LLM Wiki Pattern by Andrej Karpathy — 5-paper, 30-minute tutorial. Data Science Dojo blog. [Data Science Dojo, 2026]
- Joshi, U. (2026). Andrej Karpathy's LLM Wiki: Create your own knowledge base. Medium. [Joshi, 2026]
- Fulkerson, A. (2026). Karpathy's Pattern for an LLM Wiki in Production. aaronfulkerson.com blog. [Fulkerson, 2026]
- Aimaker (2026). AI-powered second brain from LLM Wiki — 4-month report. Aimaker Substack. [Aimaker, 2026]
- Yu, W. (2026). What Is Karpathy's LLM Wiki? A Zettelkasten User's Honest Review. yu-wenhao.com blog. [Yu, 2026]
- Infranodus (2026). Infranodus on LLM Wiki — graph DBs as the missing layer. Infranodus blog. [Infranodus, 2026]
- innobu (2026). Karpathy's LLM Wiki: Second Brain and the Enterprise Reality Check 2026. innobu blog. [innobu, 2026]
- AI Critique (2026). Andrej Karpathy's latest concept 'LLM Wiki' and the future of enterprise knowledge. AI Critique blog. [AI Critique, 2026]
- Global Advisors / Quantified Strategy Consulting (2026). Term: LLM Wiki — Andrej Karpathy. Global Advisors blog. [Global Advisors, 2026]
- Park, J. (2026). RAG is forgotten: Karpathy's "LLM Wiki" and a new knowledge-management paradigm (Korean). GeekNews / WikiDocs blog. [Park, 2026]
- TiddlyWiki community (2026). Riding the wave of Andrej Karpathy's 'LLM Wiki' (Talk TW). TiddlyWiki Talk forum. [TiddlyWiki, 2026]
- Herk, N. (2026). Karpathy 10x'ed Claude Code (LLM Wiki framing video). YouTube. [Herk, 2026]
- Paige (2026). Second-brain setup using Karpathy's LLM Wiki (video). YouTube. [Paige, 2026]
- Anthropic (2026). Claude Code documentation. Anthropic docs. [Anthropic, 2026]
- OpenAI (2026). Custom instructions with AGENTS.md (Codex). OpenAI Developers Portal. [OpenAI, 2026]
- Critical Analyst (2026). Research gap analysis — gaps.md (internal). terry-surveys repo. [Critical Analyst, 2026]