src.pm_rank.model¶

Model subpackage for pm_rank.

Submodules¶

Classes¶

`GeneralizedBT`	Generalized Bradley-Terry model for ranking forecasters in prediction markets.
`BrierScoringRule`	Brier scoring rule for evaluating probabilistic forecasts.
`SphericalScoringRule`	Spherical scoring rule for evaluating probabilistic forecasts.
`LogScoringRule`	Logarithmic scoring rule for evaluating probabilistic forecasts.
`AverageReturn`	Average Return Model for ranking forecasters based on their expected market returns.
`AverageReturnConfig`	Configuration class for AverageReturn model parameters.
`CalibrationMetric`	Initialize the CalibrationMetric.

Functions¶

`spearman_correlation`(→ float)	Compute the Spearman correlation between two rankings.
`kendall_correlation`(→ float)	Compute the Kendall correlation between two rankings.

Package Contents¶

class src.pm_rank.model.GeneralizedBT(method: Literal['MM', 'Elo'] = 'MM', num_iter: int = 100, threshold: float = 0.001, verbose: bool = False)¶

Bases: object

Generalized Bradley-Terry model for ranking forecasters in prediction markets.

This class implements a generalization of the traditional Bradley-Terry model to handle prediction market scenarios. Each event outcome is treated as a contest between two “pseudo-teams”: a winning team (the realized outcome) and a losing team (all other outcomes). Each forecaster contributes fractions of their capability proportional to their predicted probabilities.

The model estimates skill parameters for each forecaster using an iterative Majorization-Minimization (MM) algorithm, which provides convergence guarantees and intuitive comparative scores similar to Elo ratings.

Parameters:

method – Optimization method to use (“MM” for Majorization-Minimization).
num_iter – Maximum number of iterations for the MM algorithm (default: 100).
threshold – Convergence threshold for parameter updates (default: 1e-3).
verbose – Whether to enable verbose logging (default: False).

Initialize the generalized Bradley-Terry model.

Parameters:

method – Optimization method to use (“MM” for Majorization-Minimization).
num_iter – Maximum number of iterations for the MM algorithm (default: 100).
threshold – Convergence threshold for parameter updates (default: 1e-3).
verbose – Whether to enable verbose logging (default: False).

method = 'MM'¶

num_iter = 100¶

threshold = 0.001¶

verbose = False¶

logger¶

fit(problems: List[pm_rank.data.base.ForecastProblem], include_scores: bool = True) → Tuple[Dict[str, Any], Dict[str, int]] | Dict[str, int]¶

Fit the generalized Bradley-Terry model to the given problems.

This method estimates skill parameters for each forecaster using the MM algorithm and returns rankings based on these parameters. The skill parameters represent the relative predictive ability of each forecaster.

Parameters:

problems – List of ForecastProblem instances to evaluate.
include_scores – Whether to include scores in the results (default: True).

Returns:

Ranking results, either as a tuple of (scores, rankings) or just rankings.

class src.pm_rank.model.BrierScoringRule(negate: bool = True, verbose: bool = False)¶

Bases: ScoringRule

Brier scoring rule for evaluating probabilistic forecasts.

The Brier score is a quadratic proper scoring rule that measures the squared difference between predicted probabilities and actual outcomes. It is widely used in prediction markets and provides a good balance between rewarding accuracy and calibration.

Parameters:

negate – Whether to negate the scores so that higher values are better (default: True).
verbose – Whether to enable verbose logging (default: False).

Initialize the Brier scoring rule.

Parameters:

negate – Whether to negate the scores so that higher values are better (default: True).
verbose – Whether to enable verbose logging (default: False).

negate = True¶

class src.pm_rank.model.SphericalScoringRule(verbose: bool = False)¶

Bases: ScoringRule

Spherical scoring rule for evaluating probabilistic forecasts.

The spherical scoring rule normalizes probability vectors to unit vectors and measures the cosine similarity with the actual outcome. This rule is less sensitive to extreme probability values compared to the logarithmic rule.

Parameters:: verbose – Whether to enable verbose logging (default: False).

Initialize the spherical scoring rule.

Parameters:: verbose – Whether to enable verbose logging (default: False).

class src.pm_rank.model.LogScoringRule(clip_prob: float = 0.01, verbose: bool = False)¶

Bases: ScoringRule

Logarithmic scoring rule for evaluating probabilistic forecasts.

The logarithmic scoring rule is a proper scoring rule that rewards forecasters based on the logarithm of their predicted probability for the actual outcome. This rule heavily penalizes overconfident predictions and rewards well-calibrated forecasts.

Parameters:

clip_prob – Minimum probability value to prevent log(0) (default: 0.01).
verbose – Whether to enable verbose logging (default: False).

Initialize the logarithmic scoring rule.

Parameters:

clip_prob – Minimum probability value to prevent log(0) (default: 0.01).
verbose – Whether to enable verbose logging (default: False).

clip_prob = 0.01¶

class src.pm_rank.model.AverageReturn(num_money_per_round: int = None, risk_aversion: float = None, use_approximate: bool = None, break_tie_by_uniform: bool = None, use_binary_reduction: bool = None, verbose: bool = False, config: AverageReturnConfig = None, bootstrap_ci_config: pm_rank.model.utils.BootstrapCIConfig = DEFAULT_BOOTSTRAP_CI_CONFIG)¶

Average Return Model for ranking forecasters based on their expected market returns.

This class implements a ranking algorithm that evaluates forecasters based on how much money they could earn from prediction markets using different risk aversion strategies. The model calculates expected returns for each forecaster and ranks them accordingly.

Initialize the AverageReturn model.

Parameters:

num_money_per_round – Amount of money to bet per round (default: 1).
risk_aversion – Risk aversion parameter between 0 and 1 (default: 0.0).
use_approximate – Whether to use the approximate CRRA betting strategy (default: False).
break_tie_by_uniform – When the edges are all the same, whether to break tie by spending uniform money on each leg. Only effective when use_approximate is True (default: True).
use_binary_reduction – Whether to use the binary reduction strategy (default: False).
verbose – Whether to enable verbose logging (default: False).
config – Configuration object containing model parameters. If provided, individual parameters are ignored.

Raises:

ValueError – If risk_aversion is not between 0 and 1.

num_money_per_round¶

risk_aversion¶

use_approximate¶

break_tie_by_uniform¶

use_binary_reduction¶

bootstrap_ci_config¶

verbose = False¶

logger¶

process_problem_fn¶

fit(problems: List[pm_rank.data.base.ForecastProblem], sharpe_mode: Literal[None, 'marginal', 'relative'] = None, include_scores: bool = True, include_bootstrap_ci: bool = False, include_per_problem_info: bool = False) → Tuple[Dict[str, Any], Dict[str, int]] | Dict[str, int]¶

Fit the average return model to the given problems.

This method processes all problems at once and returns the final rankings based on average returns across all problems.

Parameters:

problems – List of ForecastProblem instances to process.
sharpe_mode – Whether to return the sharpe ratio (mean over sd). If None, we will return the average (mean) only (default: None). If “marginal”, we will return the marginal sharpe ratio, i.e. the sharpe ratio calculated on the forecasters’ earnings only. If “relative”, we will return the relative sharpe ratio, i.e. the sharpe ratio calculated on the forecasters’ earnings minus the baseline earnings.
include_scores – Whether to include scores in the results (default: True).
include_bootstrap_ci – Whether to include bootstrap confidence intervals in the results (default: False).
include_per_problem_info – Whether to include per-problem info in the results (default: False).

Returns:

Ranking results, either as a tuple of (scores, rankings) or just rankings. If include_per_problem_info is True, returns a tuple of (scores, rankings, per_problem_info).

fit_stream(problem_iter: Iterator[List[pm_rank.data.base.ForecastProblem]], sharpe_mode: Literal[None, 'marginal', 'relative'] = None, include_scores: bool = True) → Dict[int, Tuple[Dict[str, Any], Dict[str, int]] | Dict[str, int]]¶

Fit the model to streaming problems and return incremental results.

This method processes problems as they arrive and returns rankings after each batch, allowing for incremental analysis of forecaster performance.

Parameters:

problem_iter – Iterator over batches of ForecastProblem instances.
sharpe_mode – Whether to return the sharpe ratio (mean over sd). If None, we will return the average (mean) only (default: None). If “marginal”, we will return the marginal sharpe ratio, i.e. the sharpe ratio calculated on the forecasters’ earnings only. If “relative”, we will return the relative sharpe ratio, i.e. the sharpe ratio calculated on the forecasters’ earnings minus the baseline earnings.
include_scores – Whether to include scores in the results (default: True).

Returns:

Mapping of batch indices to ranking results.

fit_stream_with_timestamp(problem_time_iter: Iterator[Tuple[str, List[pm_rank.data.base.ForecastProblem]]], sharpe_mode: Literal[None, 'marginal', 'relative'] = None, include_scores: bool = True) → collections.OrderedDict¶

Fit the model to streaming problems with timestamps and return incremental results.

This method processes problems with associated timestamps and returns rankings after each batch, maintaining chronological order.

Parameters:

problem_time_iter – Iterator over (timestamp, problems) tuples.
sharpe_mode – Whether to return the sharpe ratio (mean over sd). If None, we will return the average (mean) only (default: None). If “marginal”, we will return the marginal sharpe ratio, i.e. the sharpe ratio calculated on the forecasters’ earnings only. If “relative”, we will return the relative sharpe ratio, i.e. the sharpe ratio calculated on the forecasters’ earnings minus the baseline earnings.
include_scores – Whether to include scores in the results (default: True).

Returns:

Chronologically ordered mapping of timestamps to ranking results.

fit_by_category(problems: List[pm_rank.data.base.ForecastProblem], sharpe_mode: Literal[None, 'marginal', 'relative'] = None, include_scores: bool = True, stream_with_timestamp: bool = False, stream_increment_by: Literal['day', 'week', 'month'] = 'day', min_bucket_size: int = 1) → Tuple[Dict[str, Any], Dict[str, int]] | Dict[str, int]¶

Fit the average return model to the given problems by category.

This method processes all problems at once and returns the final rankings based on average returns across all problems.

Parameters:

problems – List of ForecastProblem instances to process.
sharpe_mode – Whether to return the sharpe ratio (mean over sd). If None, we will return the average (mean) only (default: None). If “marginal”, we will return the marginal sharpe ratio, i.e. the sharpe ratio calculated on the forecasters’ earnings only. If “relative”, we will return the relative sharpe ratio, i.e. the sharpe ratio calculated on the forecasters’ earnings minus the baseline earnings.
include_scores – Whether to include scores in the results (default: True).
stream_with_timestamp – Whether to stream problems with timestamps (default: False).
stream_increment_by – The increment by which to stream problems (default: “day”).
min_bucket_size – The minimum number of problems to include in a bucket (default: 1).

class src.pm_rank.model.AverageReturnConfig¶

Configuration class for AverageReturn model parameters.

Parameters:

num_money_per_round – Amount of money to bet per round.
risk_aversion – Risk aversion parameter between 0 and 1. - 0: Risk neutral - 1: Log risk averse - 0 < x < 1: Intermediate risk aversion levels
use_approximate – Whether to use the approximate CRRA betting strategy.
break_tie_by_uniform – When edges are all the same, whether to break tie by spending uniform money on each leg. Only effective when use_approximate is True.
use_binary_reduction – Whether to use the binary reduction strategy.

num_money_per_round: int = 1¶

risk_aversion: float = 0.0¶

use_approximate: bool = False¶

break_tie_by_uniform: bool = True¶

use_binary_reduction: bool = False¶

bootstrap_ci_config: pm_rank.model.utils.BootstrapCIConfig¶

__post_init__()¶: Validate configuration parameters.

__getitem__(key)¶: Allow dict-like access to config parameters.

__setitem__(key, value)¶: Allow dict-like setting of config parameters.

get(key, default=None)¶: Get config parameter with default value.

keys()¶: Return config parameter names.

items()¶: Return config parameter name-value pairs.

classmethod default() → AverageReturnConfig¶: Create a default configuration.

class src.pm_rank.model.CalibrationMetric(num_bins: int = 10, strategy: Literal['uniform', 'quantile'] = 'uniform', weight_event: bool = True, verbose: bool = False)¶

Initialize the CalibrationMetric.

Parameters:

num_bins – The number of bins to use for discretization.
strategy – The strategy to use for discretization.
weight_event – Whether to weight the event by the number of markets in it. If False, then each market will be treated equally.

num_bins = 10¶

strategy = 'uniform'¶

weight_event = True¶

verbose = False¶

logger¶

fit(problems: List[pm_rank.data.base.ForecastProblem], include_scores: bool = True)¶

Fit the calibration metric to the given problems.

Parameters:: problems – List of ForecastProblem instances to process.
Returns:: A dictionary containing the calibration metric.

plot(name: str, title: str = 'Reliability diagram', save_path: str = None, figsize: tuple[float, float] = (4, 4), percent: bool = True)¶

src.pm_rank.model.spearman_correlation(rank_dict_a: Dict[str, int], rank_dict_b: Dict[str, int]) → float¶: Compute the Spearman correlation between two rankings. Reference: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient

src.pm_rank.model.kendall_correlation(rank_dict_a: Dict[str, int], rank_dict_b: Dict[str, int]) → float¶: Compute the Kendall correlation between two rankings. Reference: https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient