My Second Braincell
I spent $2 and a weekend building a personal wiki that’s already more useful than five years of Notion
For those who don’t know about Andrej Karpathy’s work on LLM’s, he recently shared an idea that stuck with me. Rather than purely using an LLM to answer questions from raw documents every time you ask, what if they read your documents once, built a structured wiki from them, and maintained that wiki permanently for themselves? He called it the llm-wiki and published it as a concept rather than code. Because int he age of AI agents, it’s the idea that matters more than specific execution.
This is a concept I have experimented with previously and a problem I have tried to solve. “I have a large amount of documentation I have written about a variety of things, a lot of it has changed over time, some of it is inaccurate, some of it is conflicting, some of it is out of date but most of it is golden material and using traditional RAG on vector databases it is very hard to discern which is which.”
As a summary: An LLM will trawl through your knowledge bases, and create its own wiki based on everything it has read. It does this on top of marking material which is contradictory. It will also create an index for faster reference when you ask a question. The benefit: You do not load the entire knowledge base on every load. This drastically saves context windows, as well as saving on token spend. Because everything is saved to files instead of living in ephemeral memory, it will persist across sessions and restarts. Assitionally, it will create and then load an index of the wiki from the start. So it will understand what knowledge it has available to it, it will only load what it needs. If it needs more, it will then make the decision to go back and open up more.
So far I have been experimenting with it for a few days, and I have tested it on everything from technical architecture docs to personal research. I have been asking it a lot of questions and it is definitely more powerful than using the free tier of Claude.ai. Obviously it would be a lot more expensive if I used Sonnet or later Mythos, but the model I have been using and having great success with is simply Deepseek. As cheap as possible. The most expensive part of the process is the setting up, which cost a grand total of… $2.
This isn’t to say that it is the absolute death of RAG + vector databases. This method does have its potential flaws. There is the potential risk of hallucination when first creating the summaries from your source of truth. It then will give you information from its summaries. This is greatly efficient and it will help you find places where your sources of truth are lacking, however, the divergence from the source of truth means that I would not trust this in a production environment. I would only ever trust this with personal notes, on things I would still self reference, or that I already have a level of understanding on.
How did I implement it?
Open Source as much as possible. Originally I wanted to self-host a smaller model that could be used to query the wiki after I used a cloud model to build it, because I wanted to keep costs low. However my graphics card is not powerful enough for it to be feasible at this scale.
- opencode
- deepseek reasoner (for high quality wiki creation)
- deepseek chat (for cheaper querying)
- A lot of research for how best to set up the folder structure and AGENTS.md file
Summary
This method of knowledge management is genuinely amazing for personal knowledge management. When I used it on my own code bases and personal projects, it has been a great help. I genuinely recommend it as an experiment to learn more.
Yes, the wiki contents are added to a git repository, and no, I will not be making it public.