raindrop-ai/workshop
Most agent toolchains stop at generation — the agent writes code, maybe runs it, and you eyeball the result. This project closes the loop by letting the coding agent author and execute its own evaluation harness. That is a genuinely different posture: instead of you writing evals after the fact, the agent is responsible for defining what good looks like and checking against it. The practical payoff is faster iteration on prompts and tool configs without a human in the quality-check seat. The repo is early, so expect rough edges and a thin README — this is a framework to build with, not a product to plug in. Worth pulling the code and reading the architecture before the concept gets absorbed into one of the larger agent platforms and loses its sharp edges. Reservation: early-stage, sparse docs, needs an engineer willing to invest setup time. -> Best for: AI engineer building or maintaining production coding-agent pipelines