Mastering AI Agent Evaluation with Evals.do