algo

Attributes

DEFAULT_BOOTSTRAP_CONFIG

Functions

add_individualized_market_baselines_to_scores(...)

Add individualized market baseline scores for each forecaster at aggregation time.

rank_forecasters_by_score(→ pandas.DataFrame)

Return a rank_df with columns (forecaster, rank, score).

add_market_baseline_predictions(→ pandas.DataFrame)

We turn the forecasts from a certain forecaster into market baseline predictions.

compute_brier_score(→ pandas.DataFrame)

Calculate the Brier score for the forecasts using row-by-row processing.

compute_average_return_neutral(→ pandas.DataFrame)

Each forecaster is given a fixed $1000 budget spread across ALL markets they

compute_calibration_ece(→ pandas.DataFrame)

Calculate the Expected Calibration Error (ECE) for each forecaster.

compute_sharpe_ratio(→ pandas.DataFrame)

Calculate the Sharpe ratio for each forecaster.

compute_ranked_brier_score(→ dict)

Compute the ranked forecasters for the given score function.

compute_ranked_average_return(→ dict)

Compute the ranked forecasters for the given score function.

Module Contents

algo.DEFAULT_BOOTSTRAP_CONFIG
algo.add_individualized_market_baselines_to_scores(result_df: pandas.DataFrame) pandas.DataFrame

Add individualized market baseline scores for each forecaster at aggregation time.

This function takes per-forecast scores (e.g., from compute_brier_score or compute_average_return_neutral) and creates “{forecaster}-market-baseline” entries by filtering the market-baseline scores to only the (event_ticker, round) combinations where each forecaster participated.

This is efficient because it reuses the already-computed market-baseline scores rather than creating duplicate prediction rows.

Args:
result_df: DataFrame with columns (forecaster, event_ticker, round, weight, <score_col>)

Must contain a ‘market-baseline’ forecaster.

Returns:

DataFrame with added “{forecaster}-market-baseline” rows for each real forecaster.

algo.rank_forecasters_by_score(result_df: pandas.DataFrame, normalize_by_round: bool = False, score_col: str = None, ascending: bool = None, bootstrap_config: Dict | None = None, add_individualized_baselines: bool = False) pandas.DataFrame

Return a rank_df with columns (forecaster, rank, score).

Args:

result_df: DataFrame containing forecaster scores normalize_by_round: If True, downweight by the number of rounds per (forecaster, event_ticker) group

(ignored for ECE scores which are already aggregated)

score_col: Name of the score column to rank by. If None, auto-detects from {‘brier_score’, ‘average_return’, ‘ece_score’} ascending: Whether lower scores are better (True for Brier/ECE, False for returns). If None, auto-detects. bootstrap_config: Optional dict with bootstrap parameters for CI estimation:

  • num_samples: Number of bootstrap samples (default: 1000)

  • ci_level: Confidence level (default: 0.95)

  • num_se: Number of standard errors for CI bounds (default: None, uses ci_level)

  • random_seed: Random seed for reproducibility (default: 42)

  • show_progress: Whether to show progress bar (default: True)

Only supported for ‘brier_score’ and ‘average_return’, not ‘ece_score’.

add_individualized_baselines: If True, create “{forecaster}-market-baseline” entries for each

forecaster by filtering market-baseline scores to their participated (event_ticker, round) combinations. Only works for Brier score and average return (not ECE/Sharpe). Requires ‘market-baseline’ forecaster to be present in result_df.

Returns:

DataFrame with rank as index and columns (forecaster, score). If bootstrap_config is provided, also includes (se, lower, upper) columns.

algo.add_market_baseline_predictions(forecasts: pandas.DataFrame, reference_forecaster: str = None, use_both_sides: bool = False) pandas.DataFrame

We turn the forecasts from a certain forecaster into market baseline predictions. If use_both_sides is True, we will add the market baseline predictions for both YES and NO sides.

Args:

forecasts: DataFrame with columns (forecaster, event_ticker, round, prediction, outcome, weight) reference_forecaster: The forecaster to use as the reference for the market baseline predictions use_both_sides: If True, we will add the market baseline predictions for both YES and NO sides

algo.compute_brier_score(forecasts: pandas.DataFrame) pandas.DataFrame

Calculate the Brier score for the forecasts using row-by-row processing. Handles predictions with different array lengths via key intersection. Automatically filters out illiquid events (yes_ask + no_ask > 1.03).

The result will be a DataFrame containing (forecaster, event_ticker, round, weight, brier_score)

Args:

forecasts: DataFrame with columns (forecaster, event_ticker, round, prediction, outcome, weight, odds, no_odds)

algo.compute_average_return_neutral(forecasts: pandas.DataFrame, num_money_per_round: float = 1.0, spread_market_even: bool = False, max_spread: float = 1.03) pandas.DataFrame

Each forecaster is given a fixed $1000 budget spread across ALL markets they participate in. For each outcome within an event, the forecaster either bets YES or NO depending on their edge.

Betting logic (per outcome): diff = p - yes_ask

If diff > 0: bet YES at price yes_ask If diff < 0: bet NO at price no_ask If diff = 0: skip

Budget allocation:

amount_i = BUDGET * weight_i / sum(weight_j for all j across all events)

Liquidity filter:

Entire events are skipped if ANY outcome has yes_ask + no_ask > max_spread. This avoids betting into illiquid markets with excessive vig.

Args:
forecasts: DataFrame with columns:
  • forecaster: str, model/forecaster identifier

  • event_ticker: str, event identifier

  • round: int, forecast round number

  • prediction: np.ndarray, forecaster’s probability for each outcome

  • outcome: np.ndarray, actual binary outcomes (0 or 1) per market

  • odds: np.ndarray, YES ask prices (implied probabilities) per market

  • no_odds: np.ndarray, NO ask prices per market

  • weight: float, external weight (passed through, not used in computation)

num_money_per_round: Unused, kept for API compatibility. spread_market_even: Unused, kept for API compatibility. max_spread: Maximum allowed spread (yes_ask + no_ask) for liquidity filter.

Events with any outcome exceeding this threshold are skipped entirely.

Returns:

DataFrame with columns (forecaster, event_ticker, round, weight, average_return) where average_return is the net profit (can be negative) for that forecaster on that event/round.

algo.compute_calibration_ece(forecasts: pandas.DataFrame, num_bins: int = 10, strategy: Literal['uniform', 'quantile'] = 'uniform', weight_event: bool = True, return_details: bool = False) pandas.DataFrame

Calculate the Expected Calibration Error (ECE) for each forecaster.

The ECE measures how well-calibrated a forecaster’s probability predictions are. For perfectly calibrated predictions, when a forecaster predicts probability p, the actual outcome should occur with frequency p.

This function combines two types of weights: 1. Prediction-level weight: from the ‘weight’ column (assigned by weight_fn in data loading) 2. Market-level weight: either uniform (1.0) or inverse of number of markets per prediction

The final weight for each market probability is: prediction_weight * market_weight

Args:

forecasts: DataFrame with columns (forecaster, event_ticker, round, prediction, outcome, weight) num_bins: Number of bins to use for discretization (default: 10) strategy: Strategy for discretization, either “uniform” or “quantile” (default: “uniform”) weight_event: If True, weight each market by 1/num_markets within each prediction.

If False, all markets are weighted equally (default: True)

return_details: If True, return the details of the ECE calculation for each forecaster. Useful for plotting.

Returns:

DataFrame with columns (forecaster, ece_score) containing the ECE for each forecaster

algo.compute_sharpe_ratio(average_return_results: pandas.DataFrame, baseline_return: float = 1.0, normalize_by_round: bool = False) pandas.DataFrame

Calculate the Sharpe ratio for each forecaster.

The Sharpe ratio is defined as: E[R - R_b] / std(R - R_b), where R is the return and R_b is the baseline return (typically 1.0 for break-even).

Args:

average_return_results: DataFrame with columns (forecaster, event_ticker, round, weight, average_return) baseline_return: The baseline return to subtract from the average return (default: 1.0 for break-even) normalize_by_round: If True, first average returns within each (forecaster, event_ticker) group,

then calculate Sharpe ratio across events. This prevents events with more rounds from dominating the calculation. (default: False)

Returns:

DataFrame with columns (forecaster, sharpe_ratio, mean_excess_return, std_excess_return) sorted by sharpe_ratio in descending order

algo.compute_ranked_brier_score(forecasts: pandas.DataFrame, by_category: bool = False, stream_every: int = -1, normalize_by_round: bool = False, bootstrap_config: Dict | None = None, add_individualized_baselines: bool = False) dict

Compute the ranked forecasters for the given score function.

Args:

forecasts: DataFrame with forecast data by_category: If True, compute rankings per category stream_every: If > 0, compute rankings at time intervals normalize_by_round: If True, downweight by number of rounds per (forecaster, event_ticker) bootstrap_config: Optional config for bootstrap CI estimation add_individualized_baselines: If True, create “{forecaster}-market-baseline” entries for each

forecaster by filtering market-baseline scores to their participated (event_ticker, round). Requires ‘market-baseline’ forecaster to be present.

algo.compute_ranked_average_return(forecasts: pandas.DataFrame, by_category: bool = False, stream_every: int = -1, spread_market_even: bool = False, num_money_per_round: float = 1.0, normalize_by_round: bool = False, bootstrap_config: Dict | None = None, add_individualized_baselines: bool = False) dict

Compute the ranked forecasters for the given score function.

Args:

forecasts: DataFrame with forecast data by_category: If True, compute rankings per category stream_every: If > 0, compute rankings at time intervals spread_market_even: If True, spread budget evenly across markets num_money_per_round: Amount to bet per round normalize_by_round: If True, downweight by number of rounds per (forecaster, event_ticker) bootstrap_config: Optional config for bootstrap CI estimation add_individualized_baselines: If True, create “{forecaster}-market-baseline” entries for each

forecaster by filtering market-baseline scores to their participated (event_ticker, round). Requires ‘market-baseline’ forecaster to be present.