Repeatable Success%3A Ensuring Reproducibility in AI Evaluation