Scale Events
timezone
+00:00 GMT
Livestream
NeurIPS: Are We Learning Yet? A Meta-Review of Evaluation Failures Across ML

Evaluation issues often undermine the validity of results in machine learning research. In collaboration with researchers from Stanford University, the University of California, Berkeley, and the University of Washington, we conducted a meta-review of 100+ survey papers to identify common benchmark evaluation problems across subfields. In some cases, several years’ worth of progress in certain fields may be misstated. Our meta-review surveys evaluation papers reporting on a broad range of subfields, ranging from computer vision to deep reinforcement learning, to recommender systems and natural language processing, and more. We found a consistent set of failure modes, which we organized into a systematic taxonomy.

Speakers
Thomas Liao
Thomas Liao
Research Scientist (ML) @ Scale AI
Agenda
Track View
6:52 PM
8:23 PM
Stage 1
Presentation
calendar
NeurIPS: Are We Learning Yet? A Meta-Review of Evaluation Failures Across Machine Learning

Evaluation issues often undermine the validity of results in machine learning research. In collaboration with researchers from Stanford University, the University of California, Berkeley, and the University of Washington, we conducted a meta-review of 100+ survey papers to identify common benchmark evaluation problems across subfields. In some cases, several years’ worth of progress in certain fields may be misstated. Our meta-review surveys evaluation papers reporting on a broad range of subfields, ranging from computer vision to deep reinforcement learning, to recommender systems and natural language processing, and more. We found a consistent set of failure modes, which we organized into a systematic taxonomy.

+ Read More
Thomas Liao
Event has finished
February 03, 8:00 PM, GMT
Online
Event has finished
February 03, 8:00 PM, GMT
Online