walkinglabs/hands-on-modern-rl
A structured, open curriculum that walks from basic RL all the way through RLHF, RLVR, and agentic system design. Not a paper dump or a link list — the repo is built as a runnable, hands-on course with the explicit goal of bridging the gap between toy Gym environments and the alignment techniques that are actually moving model behavior right now. The value here is the sequencing: most resources either stop at PPO or jump straight to constitutional AI with nothing in between. This fills the middle. Reservation: it is a curriculum, not a library — you are committing time, not downloading a dependency. If you want to understand why your RLVR training run is misbehaving or how to design reward models for an agent product, this is worth the investment. -> Best for: AI engineer or technical PM building agentic products who needs to stop treating RL as a black box