Integrating Human Feedback for Better Agent Evaluation