amitshekhariitbhu/llm-internals

A structured, step-by-step walkthrough of how LLMs actually work under the hood — tokenization, attention mechanisms, inference optimization — written for engineers who want to understand the machinery, not just call an API. The differentiation is depth with accessibility: most resources either oversimplify to blog-post level or dump you into a dense academic paper. This sits in the practical middle. If you are building on top of models and keep hitting walls because you do not know why a certain prompt behaves strangely or why quantization breaks your use case, this is the reference you have been missing. Reservation: it is a learning resource, not a tool — so measure your ROI in understanding, not shipped features. That said, a solid mental model of attention mechanisms will save more debugging hours than most libraries will. -> Best for: AI engineer or technical PM who ships LLM features but skipped the foundational theory