nghia dang · software engineer
← back to work

/ project

StarCraft II theorycrafter

A factually-grounded strategy tool that refuses to hallucinate.

stack Python · SQLite · LLM reasoning layer · replay parsing · uv
scope solo build
status in progress

ProblemLLMs reason well but are unreliable on precise facts. For a game where unit costs and timings have to be exact, a model that confidently invents numbers is worse than useless.

What it doesIt answers build-order and strategy questions for a specific patch by grounding every factual claim in a database and using the model only for the reasoning on top. It's a two-layer design where an SQLite fact store (populated from a replay parser) holds the truth and the LLM is never asked to recall it.

OutcomeA concrete pattern for building grounded LLM tools. It's the same idea that powers my investing pipeline, applied here to a game I know at the top level.

/ architecture

Architecture

A StarCraft II strategy tool built on a hard separation between facts and reasoning, so the LLM can reason about strategy without being trusted to recall exact numbers.

  [replay files] ──► [parser] ──► [SQLite fact store]
                                    (units, costs, build timings — patch-specific)
                                          │
  [user question] ──► [reasoning layer (LLM)] ──◄ queries facts, never recalls them
                                          │
                                          ▼
                                  [grounded answer]

Layer one is an SQLite database of ground-truth game facts — unit costs, build timings, tech requirements for a specific patch — populated from parsed replays. Layer two is the LLM, which answers build-order and matchup questions by querying those facts rather than recalling them from training, so it can't hallucinate a number.

/ technical decisions

Technical decisions

Two-layer fact/reason split

the same pattern as my investing tools' grounding — the DB holds truth, the model holds judgment. It's the cleanest defense against confident fabrication in a domain where a wrong unit cost invalidates the whole answer.

Patch-pinned facts

the store is tied to a specific game patch, so answers stay correct as balance changes — you re-parse for a new patch rather than hoping the model "knows" the current numbers.

Replay parser as the source of truth

facts come from parsed game data, not manual entry or model recall, which keeps the store accurate and refreshable.

uv for environment management

fast, reproducible Python environments across the sibling repos.

/ what broke

What broke / what I learned

The whole project is a bet that grounding beats scale for factual reliability — and the build confirmed it: the moment the model is allowed to "remember" a number instead of querying it, accuracy collapses. Enforcing that the reasoning layer can only read facts, never invent them, is the entire game. It's the same lesson as stock-vetter, learned in a domain I know cold.

A grounded build-order answer (capture to be added).
← back to work