Day 01: scaffolding and the plan
Date: 2026-06-16 · Week 1 · Phase 0 Foundations
What I added today
Repo skeleton for nanoserve: package layout under src/nanoserve/, the 100-day plan in docs/PLAN.md, the system design in docs/ARCHITECTURE.md, and three diagrams in docs/diagrams/. Every module is a stub that names the week it gets implemented.
Why it matters
Setting the done line before writing a single forward pass is the whole game. v1 stops at a correct, batched, served engine running Llama-3.2-1B. Speculative decoding and tensor parallelism are explicitly out, they are the v2 teaser. The structure exists so the daily work is “fill in the next stub,” never “what do I do today.”
What I learned
The hard part of an inference engine is not the transformer, it is the memory and the scheduling. The architecture doc already makes that clear: layers and model are standard, the two ideas that earn the name “engine” are the paged KV cache and continuous batching. Naming that on Day 1 sets the right focus for the next 14 weeks.
Diagram
architecture-overview.svg — the life of a request.
Tomorrow
Week 1 menu: download Llama-3.2-1B, inspect config.json, fill in config.py, start the safetensors name mapping in loader.py.
Post angle: I am building an AI inference engine from scratch in 100 days. Not a wrapper, the actual thing: paged KV cache, continuous batching, a Triton kernel. Day 1 is the map. Here is what the next 100 days build and why.