The Critical Role of Datasets in AI Evaluation