Skip to main content

Faculty & Research

Close

Comparing Sequential Forecasters

Journal Article
Consider two forecasters, each making a single prediction for a sequence of events over time. The authors ask a relatively basic question: how might they compare these forecasters, either online or post hoc, avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, the authors present a rigorous answer to this question by designing novel sequential inference procedures for estimating the time-varying difference in forecast scores. To do this, the authors employ confidence sequences (CS), which are sequences of confidence intervals that can be continuously monitored and are valid at arbitrary data-dependent stopping times (“anytime-valid”). The widths of our CSs are adaptive to the underlying variance of the score differences. Underlying their construction is a game-theoretic statistical framework in which the authors further identify e-processes and p-processes for sequentially testing a weak null hypothesis—whether one forecaster outperforms another on average (rather than always). The authors' methods do not make distributional assumptions on the forecasts or outcomes; their main theorems apply to any bounded scores, and they later provide alternative methods for unbounded scores. The authors empirically validate their approaches by comparing real-world baseball and weather forecasters.