Your AI Agent Can Code — But Can It Grade Its Own Homework? Hamel Husain's Evals Skills Kit

Hamel Husain released evals-skills, a skill set designed for AI product evaluation. It tackles the blind spots agents face during complex tasks — especially distinguishing between different types of hallucinations — so agents can actually use eval platforms effectively.