algo¶
Attributes¶
Functions¶
Add individualized market baseline scores for each forecaster at aggregation time. |
|
|
Return a rank_df with columns (forecaster, rank, score). |
|
We turn the forecasts from a certain forecaster into market baseline predictions. |
|
Calculate the Brier score for the forecasts using row-by-row processing. |
|
Compute global market baseline Brier score: (yes_brier + no_brier) / 2. |
|
Compute individualized market baseline Brier score per forecaster. |
|
Compute the average L2 distance |prediction - yes_ask| per forecaster across all markets. |
|
Buy market shares directly from the forecaster's edge. |
|
Calculate the Expected Calibration Error (ECE) for each forecaster. |
|
Calculate the Sharpe ratio for each forecaster using per-dollar returns (ROI). |
|
Compute the ranked forecasters for the given score function. |
|
Compute the ranked forecasters for the given score function. |
Module Contents¶
- algo.DEFAULT_BOOTSTRAP_CONFIG¶
- algo.DEFAULT_MAX_SPREAD = 1.03¶
- algo.add_individualized_market_baselines_to_scores(result_df: pandas.DataFrame) pandas.DataFrame¶
Add individualized market baseline scores for each forecaster at aggregation time.
This function takes per-forecast scores (e.g., from compute_brier_score or compute_average_return_neutral) and creates “{forecaster}-market-baseline” entries by filtering the market-baseline scores to only the (event_ticker, round) combinations where each forecaster participated.
This is efficient because it reuses the already-computed market-baseline scores rather than creating duplicate prediction rows.
- Args:
- result_df: DataFrame with columns (forecaster, event_ticker, round, weight, <score_col>)
Must contain a ‘market-baseline’ forecaster.
- Returns:
DataFrame with added “{forecaster}-market-baseline” rows for each real forecaster.
- algo.rank_forecasters_by_score(result_df: pandas.DataFrame, normalize_by_round: bool = False, score_col: str = None, ascending: bool = None, bootstrap_config: Dict | None = None, add_individualized_baselines: bool = False, bootstrap_scores_df: pandas.DataFrame = None, aggregation: str = 'mean', analytical_ci: bool | float | None = None) pandas.DataFrame¶
Return a rank_df with columns (forecaster, rank, score).
- Args:
result_df: DataFrame containing forecaster scores (used for point estimates and ranking) normalize_by_round: If True, downweight by the number of rounds per (forecaster, event_ticker) group
(ignored for ECE scores which are already aggregated)
score_col: Name of the score column to rank by. If None, auto-detects from {‘brier_score’, ‘average_return’, ‘ece_score’} ascending: Whether lower scores are better (True for Brier/ECE, False for returns). If None, auto-detects. bootstrap_config: Optional dict with bootstrap parameters for CI estimation:
num_samples: Number of bootstrap samples (default: 1000)
ci_level: Confidence level (default: 0.95)
num_se: Number of standard errors for CI bounds (default: None, uses ci_level)
random_seed: Random seed for reproducibility (default: 42)
show_progress: Whether to show progress bar (default: True)
Only supported for ‘brier_score’ and ‘average_return’, not ‘ece_score’.
- add_individualized_baselines: If True, create “{forecaster}-market-baseline” entries for each
forecaster by filtering market-baseline scores to their participated (event_ticker, round) combinations. Requires ‘market-baseline’ forecaster to be present in result_df.
- bootstrap_scores_df: Optional separate DataFrame for bootstrap resampling. When provided,
the point estimate is computed from result_df (event-level) while bootstrap CI is computed from this DataFrame (e.g. market-level). Must contain the same score column and ‘forecaster’ column. If None, bootstrap uses result_df.
- aggregation: Aggregation mode for score_col.
‘mean’: weighted mean of score_col (default)
‘sum’: sum of score_col
‘roi’: return on investment, computed as sum(score_col) / sum(cost). Requires a ‘cost’ column.
- analytical_ci: Closed-form normal-approximation CI alternative to bootstrap.
Mutually exclusive with
bootstrap_config. Only supported foraggregation='mean'(proper scoring rules); ROI’s ratio-of-sums needs the delta method. - None / False: no analytical CI - True: 95% CI - float in (0, 1): explicit confidence level (e.g. 0.99)
- Returns:
DataFrame with rank as index and columns (forecaster, score). If bootstrap_config or analytical_ci is set, also includes a “{level}% ci” column formatted as “±X.XXXX”.
- algo.add_market_baseline_predictions(forecasts: pandas.DataFrame, reference_forecaster: str = None, use_both_sides: bool = False) pandas.DataFrame¶
We turn the forecasts from a certain forecaster into market baseline predictions. If use_both_sides is True, we will add the market baseline predictions for both YES and NO sides.
- Args:
forecasts: DataFrame with columns (forecaster, event_ticker, round, prediction, outcome, weight) reference_forecaster: The forecaster to use as the reference for the market baseline predictions use_both_sides: If True, we will add the market baseline predictions for both YES and NO sides
- algo.compute_brier_score(forecasts: pandas.DataFrame, per_market: bool = False, max_spread: float = DEFAULT_MAX_SPREAD) pandas.DataFrame¶
Calculate the Brier score for the forecasts using row-by-row processing. Handles predictions with different array lengths via key intersection. Automatically filters out illiquid events (yes_ask + no_ask > max_spread).
The result will be a DataFrame containing (forecaster, event_ticker, round, weight, brier_score). If per_market=True, returns one row per individual market (outcome) within each event, with brier_score = (prediction_i - outcome_i)^2 and weight = event_weight / num_markets.
- Args:
forecasts: DataFrame with columns (forecaster, event_ticker, round, prediction, outcome, weight, odds, no_odds) per_market: If True, return one row per market instead of one per event. Each market’s
Brier score is the individual squared error, and its weight is the event weight divided by the number of markets. This preserves the same weighted-average point estimate as event-level scoring.
max_spread: Maximum allowed spread for illiquidity filtering.
- algo.compute_global_market_brier(forecasts: pandas.DataFrame, max_spread: float = DEFAULT_MAX_SPREAD) float¶
Compute global market baseline Brier score: (yes_brier + no_brier) / 2.
- For each market across all events:
yes_brier = (odds - outcome)^2
no_brier = (no_odds - (1 - outcome))^2
market_brier = (yes_brier + no_brier) / 2
Returns a single global average across all markets.
- algo.compute_per_forecaster_market_brier(forecasts: pandas.DataFrame, max_spread: float = DEFAULT_MAX_SPREAD) Dict[str, float]¶
Compute individualized market baseline Brier score per forecaster.
For each forecaster, computes the market baseline Brier only over the events that forecaster participated in (using the median round per event).
- Returns:
Dict mapping forecaster name -> market baseline Brier score (raw, not 1-brier).
- algo.compute_avg_market_distance(forecasts: pandas.DataFrame, max_spread: float = DEFAULT_MAX_SPREAD) dict¶
Compute the average L2 distance |prediction - yes_ask| per forecaster across all markets. Uses the same dedup filters as compute_brier_score and skips illiquid events where any market has yes_ask + no_ask > max_spread.
- Returns:
Dict mapping forecaster name -> average |prediction - odds| across all markets.
- algo.compute_average_return_neutral(forecasts: pandas.DataFrame, num_money_per_round: float = 1.0, spread_market_even: bool = False, max_spread: float = DEFAULT_MAX_SPREAD, per_market: bool = False) pandas.DataFrame¶
Buy market shares directly from the forecaster’s edge.
- Betting logic (per market):
diff = p - yes_ask
If diff > 0: buy YES at price yes_ask with shares = p - yes_ask If diff < 0: buy NO at price no_ask with shares = (1 - p) - no_ask If diff = 0: skip
- Spread handling:
Markets where prediction falls inside the bid-ask spread [1 - no_odds, odds] are treated as no-bet markets.
- Liquidity filter:
Entire events are skipped if ANY market has yes_ask + no_ask > max_spread.
- Args:
- forecasts: DataFrame with columns:
forecaster: str, model/forecaster identifier
event_ticker: str, event identifier
round: int, forecast round number
prediction: np.ndarray, forecaster’s probability for each outcome
outcome: np.ndarray, actual binary outcomes (0 or 1) per market
odds: np.ndarray, YES ask prices (implied probabilities) per market
no_odds: np.ndarray, NO ask prices per market
weight: float, external weight (passed through, not used in computation)
num_money_per_round: Unused, kept for API compatibility. spread_market_even: Unused, kept for API compatibility. max_spread: Maximum allowed spread (yes_ask + no_ask) for liquidity filter.
Events with any outcome exceeding this threshold are skipped entirely.
- per_market: If True, return one row per traded market instead of one per event.
Each market row keeps the full event weight so ROI aggregation matches between event-level and market-level representations.
- Returns:
DataFrame with columns (forecaster, event_ticker, round, weight, average_return, cost) where average_return is net profit and cost is total amount spent. If per_market=True, also includes market_index.
- algo.compute_calibration_ece(forecasts: pandas.DataFrame, num_bins: int = 10, strategy: Literal['uniform', 'quantile'] = 'uniform', weight_event: bool = True, return_details: bool = False) pandas.DataFrame¶
Calculate the Expected Calibration Error (ECE) for each forecaster.
The ECE measures how well-calibrated a forecaster’s probability predictions are. For perfectly calibrated predictions, when a forecaster predicts probability p, the actual outcome should occur with frequency p.
This function combines two types of weights: 1. Prediction-level weight: from the ‘weight’ column (assigned by weight_fn in data loading) 2. Market-level weight: either uniform (1.0) or inverse of number of markets per prediction
The final weight for each market probability is: prediction_weight * market_weight
- Args:
forecasts: DataFrame with columns (forecaster, event_ticker, round, prediction, outcome, weight) num_bins: Number of bins to use for discretization (default: 10) strategy: Strategy for discretization, either “uniform” or “quantile” (default: “uniform”) weight_event: If True, weight each market by 1/num_markets within each prediction.
If False, all markets are weighted equally (default: True)
return_details: If True, return the details of the ECE calculation for each forecaster. Useful for plotting.
- Returns:
DataFrame with columns (forecaster, ece_score) containing the ECE for each forecaster
- algo.compute_sharpe_ratio(average_return_results: pandas.DataFrame, baseline_return: float = 0.0, normalize_by_round: bool = False) pandas.DataFrame¶
Calculate the Sharpe ratio for each forecaster using per-dollar returns (ROI).
- Args:
- average_return_results: DataFrame with columns
(forecaster, event_ticker, round, weight, average_return, cost)
- baseline_return: The baseline ROI to subtract from realized ROI
(default: 0.0 for break-even)
- normalize_by_round: If True, first average returns within each (forecaster, event_ticker) group,
then calculate Sharpe ratio across events. (default: False)
- Returns:
DataFrame with columns (forecaster, sharpe_ratio, mean_roi, std_roi, num_events) sorted by sharpe_ratio in descending order
- algo.compute_ranked_brier_score(forecasts: pandas.DataFrame, by_category: bool = False, stream_every: int = -1, normalize_by_round: bool = False, bootstrap_config: Dict | None = None, resample_level: Literal['market', 'event'] = 'market', add_individualized_baselines: bool = False, max_spread: float = DEFAULT_MAX_SPREAD, analytical_ci: bool | float | None = None) dict¶
Compute the ranked forecasters for the given score function.
- Args:
forecasts: DataFrame with forecast data by_category: If True, compute rankings per category stream_every: If > 0, compute rankings at time intervals normalize_by_round: If True, downweight by number of rounds per (forecaster, event_ticker) bootstrap_config: Optional config for bootstrap CI estimation resample_level: Granularity for bootstrap resampling. “market” resamples individual markets
(flattened across events), “event” resamples event-level aggregated scores. Default “market”.
- add_individualized_baselines: If True, create “{forecaster}-market-baseline” entries for each
forecaster by filtering market-baseline scores to their participated (event_ticker, round). Requires ‘market-baseline’ forecaster to be present.
- max_spread: Liquidity filter passed through to compute_brier_score. Events whose markets have
yes_ask + no_ask exceeding this value are skipped. Defaults to DEFAULT_MAX_SPREAD (1.03).
- analytical_ci: Closed-form normal-approximation CI alternative to bootstrap.
Mutually exclusive with bootstrap_config. None/False = off, True = 95% CI, float = explicit confidence level (e.g. 0.99). See rank_forecasters_by_score.
- algo.compute_ranked_average_return(forecasts: pandas.DataFrame, by_category: bool = False, stream_every: int = -1, spread_market_even: bool = False, num_money_per_round: float = 1.0, normalize_by_round: bool = False, bootstrap_config: Dict | None = None, resample_level: Literal['market', 'event'] = 'market', add_individualized_baselines: bool = False, max_spread: float = DEFAULT_MAX_SPREAD) dict¶
Compute the ranked forecasters for the given score function.
- Args:
forecasts: DataFrame with forecast data by_category: If True, compute rankings per category stream_every: If > 0, compute rankings at time intervals spread_market_even: If True, spread budget evenly across markets num_money_per_round: Amount to bet per round normalize_by_round: If True, downweight by number of rounds per (forecaster, event_ticker) bootstrap_config: Optional config for bootstrap CI estimation resample_level: Granularity for bootstrap resampling. “market” resamples individual markets
(flattened across events), “event” resamples event-level aggregated scores. Default “market”.
- add_individualized_baselines: If True, create “{forecaster}-market-baseline” entries for each
forecaster by filtering market-baseline scores to their participated (event_ticker, round). Requires ‘market-baseline’ forecaster to be present.
- max_spread: Liquidity filter passed through to compute_average_return_neutral. Events whose
markets have yes_ask + no_ask exceeding this value are skipped. Defaults to DEFAULT_MAX_SPREAD (1.03).