/ project
StarCraft II theorycrafter
A factually-grounded strategy tool that refuses to hallucinate.
ProblemLLMs reason well but are unreliable on precise facts. For a game where unit costs and timings have to be exact, a model that confidently invents numbers is worse than useless.
What it doesIt answers build-order and strategy questions for a specific patch by grounding every factual claim in a database and using the model only for the reasoning on top. It's a two-layer design where an SQLite fact store (populated from a replay parser) holds the truth and the LLM is never asked to recall it.
OutcomeA concrete pattern for building grounded LLM tools. It's the same idea that powers my investing pipeline, applied here to a game I know at the top level.
Architecture
A StarCraft II strategy tool built on a hard separation between facts and reasoning, so the LLM can reason about strategy without being trusted to recall exact numbers.
[replay files] ──► [parser] ──► [SQLite fact store]
(units, costs, build timings — patch-specific)
│
[user question] ──► [reasoning layer (LLM)] ──◄ queries facts, never recalls them
│
▼
[grounded answer]Layer one is an SQLite database of ground-truth game facts — unit costs, build timings, tech requirements for a specific patch — populated from parsed replays. Layer two is the LLM, which answers build-order and matchup questions by querying those facts rather than recalling them from training, so it can't hallucinate a number.
Technical decisions
Two-layer fact/reason split
the same pattern as my investing tools' grounding — the DB holds truth, the model holds judgment. It's the cleanest defense against confident fabrication in a domain where a wrong unit cost invalidates the whole answer.
Patch-pinned facts
the store is tied to a specific game patch, so answers stay correct as balance changes — you re-parse for a new patch rather than hoping the model "knows" the current numbers.
Replay parser as the source of truth
facts come from parsed game data, not manual entry or model recall, which keeps the store accurate and refreshable.
uv for environment management
fast, reproducible Python environments across the sibling repos.
What broke / what I learned
The whole project is a bet that grounding beats scale for factual reliability — and the build confirmed it: the moment the model is allowed to "remember" a number instead of querying it, accuracy collapses. Enforcing that the reasoning layer can only read facts, never invent them, is the entire game. It's the same lesson as stock-vetter, learned in a domain I know cold.