How I Structure a FastAPI Backend with LLM Features (From a Real Project)

A practicing engineer shares the folder structure, separation-of-concerns decisions, and routing patterns they actually used when wiring LLM calls into a production FastAPI service. This is not a toy tutorial with a single endpoint and a hardcoded prompt. The write-up walks through how to isolate LLM logic from business logic, how to handle async calls without turning your codebase into a tangle of awaits, and how to keep prompt management from leaking everywhere. The author makes decisions visible, explaining the why behind each structural choice, which is more useful than a scaffold you just copy and paste. For anyone who built an MVP by stapling OpenAI calls directly into route handlers, this provides a credible upgrade path before the mess compounds. The honest reservation is that this is one person's approach on one project, so it will not cover every deployment scenario, team size, or LLM provider quirk. Treat it as a strong starting point for discussion, not a definitive standard. Worth an hour of your time if you are at the point where adding one more feature feels riskier than it should. -> Best for: solo backend builders moving their LLM-powered app from prototype to maintainable product.