Efficient Benchmarking via Bias-Bounded Subset Selection
Evaluating AI systems, particularly large models, is an essential yet computationally expensive task. The use of extensive benchmarks often leads to substantial computational/human costs that may even exceed those of pretraining. The efficiency of AI model evaluation focuses on estimating the model&...
| Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2025) vom: 12. Aug. |
|---|---|
| 1. Verfasser: | |
| Weitere Verfasser: | , , , , , |
| Format: | Online-Aufsatz |
| Sprache: | English |
| Veröffentlicht: |
2025
|
| Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence |
| Schlagworte: | Journal Article |
| Zusammenfassung: | Evaluating AI systems, particularly large models, is an essential yet computationally expensive task. The use of extensive benchmarks often leads to substantial computational/human costs that may even exceed those of pretraining. The efficiency of AI model evaluation focuses on estimating the model's score on the full benchmark based on its responses to a smaller subset. Various empirical selection methods have been proposed to identify valuable subsets within these benchmarks. In this paper, we formally define and approximate the subset selection problem inherent in efficient evaluation. We prove that this problem actually optimizes a submodular function and that a unified subset can be identified using a simple greedy algorithm. Importantly, this approach is the first to provide theoretical guarantees of bias control and generalizability in score estimation. Using language models as a case study, experimental results across 11 different benchmarks validate its superiority in estimating model scores and maintaining ranking consistency. It can achieve accurate score estimation using no more than 30% of the full benchmark, thus facilitating efficient and sparse benchmark design |
|---|---|
| Beschreibung: | Date Revised 12.08.2025 published: Print-Electronic Citation Status Publisher |
| ISSN: | 1939-3539 |
| DOI: | 10.1109/TPAMI.2025.3598031 |