Leveraging Datasets for Effective AI Component Evaluation