Crafting the Perfect Dataset for AI Evaluation