bootstrap¶
Bootstrap confidence interval calculation for forecaster rankings.
This module provides simplified bootstrap CI estimation for the nightly API, focusing on symmetric confidence intervals around point estimates.
Key principle: Bootstrap resampling is done SEPARATELY for each forecaster, sampling with replacement from their own predictions only. This properly estimates the uncertainty in each forecaster’s individual score.
Granularity: The resampling unit depends on the input data. When the input contains per-market rows (one row per individual market outcome), bootstrap resamples at the market level. When the input contains per-event rows (one row per problem/event with scores averaged across markets), bootstrap resamples at the event level. Market-level resampling is the default and captures both within-event and across-event variance.
Performance: the inner resampling loop is fully vectorized — for each
forecaster we draw a single (num_samples, n_rows) index matrix and
compute all bootstrap statistics in one batch of numpy operations, instead
of running a Python-level for loop over num_samples iterations. This
makes the dominant cost of the nightly recompute small enough that bootstrap
CI is no longer the bottleneck.
Functions¶
|
Compute bootstrap confidence intervals for forecaster scores. |
Module Contents¶
- bootstrap.compute_bootstrap_ci(result_df: pandas.DataFrame, score_col: str, adjusted_weights: numpy.ndarray, bootstrap_config: Dict = None, aggregation: str = 'mean') Tuple[Dict[str, float], Dict[str, Tuple[float, float]]]¶
Compute bootstrap confidence intervals for forecaster scores.
This function performs weighted bootstrap resampling of individual predictions to estimate confidence intervals for forecaster scores. It uses symmetric CIs around the point estimate.
IMPORTANT: Resampling is done SEPARATELY for each forecaster. For each forecaster we draw an entire
(num_samples, n_rows)index matrix in onenp.random.choicecall and compute the bootstrap statistic for every sample at once via vectorized numpy ops, instead of running a Python loop over the bootstrap iterations.- Args:
- result_df: DataFrame with columns (forecaster, score_col, adjusted_weight)
where each row is an individual prediction
score_col: Name of the score column (‘brier_score’ or ‘average_return’) adjusted_weights: Array of adjusted weights for each prediction (same length as result_df) bootstrap_config: Dictionary with bootstrap parameters:
num_samples: Number of bootstrap samples (default: 1000)
ci_level: Confidence level (default: 0.95)
num_se: Number of standard errors for CI bounds (default: None, uses ci_level)
random_seed: Random seed for reproducibility (default: 42)
show_progress: Ignored (kept for backwards compatibility — the vectorized implementation has no per-iteration progress to show)
- aggregation: Aggregation mode for forecaster scores.
‘mean’: weighted mean of score_col
‘sum’: sum of score_col
‘roi’: sum(score_col) / sum(cost), requiring a ‘cost’ column in result_df.
- Returns:
Tuple of (standard_errors, confidence_intervals) where: - standard_errors: Dict mapping forecaster -> SE of the score - confidence_intervals: Dict mapping forecaster -> (lower, upper) bounds