Evaluate, Score, Improve
Quantify the performance of your AI agents, functions, and workflows. Define metrics, run evaluations against datasets, and ensure your AI meets quality and safety standards.
evals.do
{ "evaluationId": "eval_8a7d6e8f4c", "agentId": "customer-support-agent-v2", "status": "completed", "overallScore": 4.15, "passed": false, "metrics": [ { "name": "accuracy", "score": 4.3, "threshold": 4, "passed": true }, { "name": "helpfulness", "score": 4.6, "threshold": 4.2, "passed": true }, { "name": "tone", "score": 3.55, "threshold": 4.5, "passed": false } ], "evaluatedAt": "2024-10-27T10:30:00Z" }