src.pm_rank.model.scoring_rule¶

Scoring Rules for Ranking Forecasters in Prediction Markets.

This module implements proper scoring rules to evaluate and rank forecasters based on their probabilistic predictions. Proper scoring rules are essential for ensuring that forecasters are incentivized to report their true beliefs, as they are rewarded for accuracy and calibration rather than just getting the highest probability outcome correct.

Reference: https://www.cis.upenn.edu/~aaroth/courses/slides/agt17/lect23.pdf

Key Concepts:

Proper Scoring Rules: Mathematical functions that incentivize honest reporting of probabilistic beliefs by rewarding accuracy and calibration.
Brier Score: A quadratic scoring rule that measures the squared difference between predicted probabilities and actual outcomes.
Logarithmic Score: A scoring rule based on the logarithm of the predicted probability of the actual outcome.
Spherical Score: A scoring rule that normalizes predictions to unit vectors and measures the cosine similarity with the actual outcome.

Attributes¶

`MAX_PROBLEM_WEIGHT_QUANTILE`
`MIN_PROBLEM_WEIGHT_QUANTILE`

Classes¶

`ScoringRule`	Abstract base class for proper scoring rules.
`LogScoringRule`	Logarithmic scoring rule for evaluating probabilistic forecasts.
`BrierScoringRule`	Brier scoring rule for evaluating probabilistic forecasts.
`SphericalScoringRule`	Spherical scoring rule for evaluating probabilistic forecasts.

Module Contents¶

src.pm_rank.model.scoring_rule.MAX_PROBLEM_WEIGHT_QUANTILE = 0.75¶

src.pm_rank.model.scoring_rule.MIN_PROBLEM_WEIGHT_QUANTILE = 0.25¶

class src.pm_rank.model.scoring_rule.ScoringRule(verbose: bool = False)¶

Bases: abc.ABC

Abstract base class for proper scoring rules.

This class provides the foundation for implementing various proper scoring rules used to evaluate probabilistic forecasts. Proper scoring rules ensure that forecasters are incentivized to report their true beliefs by rewarding both accuracy and calibration.

Parameters:: verbose – Whether to enable verbose logging (default: False).

Initialize the scoring rule.

Parameters:: verbose – Whether to enable verbose logging (default: False).

verbose = False¶

logger¶

fit(problems: List[pm_rank.data.base.ForecastProblem], problem_discriminations: numpy.ndarray | List[float] | None = None, include_scores: bool = True, include_bootstrap_ci: bool = False, include_per_problem_info: bool = False, bootstrap_ci_config: pm_rank.model.utils.BootstrapCIConfig = DEFAULT_BOOTSTRAP_CI_CONFIG) → Tuple[Dict[str, Any], Dict[str, int]] | Dict[str, int]¶

Fit the scoring rule to the given problems and return rankings.

This method processes all problems and calculates scores for each forecaster using the implemented scoring rule. Optionally, problem weights can be applied based on discrimination parameters to give more importance to more informative problems.

Parameters:

problems – List of ForecastProblem instances to evaluate.
problem_discriminations – Optional array of discrimination parameters for weighting problems. If None, all problems are weighted equally.
include_scores – Whether to include scores in the results (default: True).
include_bootstrap_ci – Whether to include bootstrap confidence intervals (default: False).
include_per_problem_info – Whether to include per-problem info in the results (default: False).
bootstrap_ci_config – Configuration for bootstrap confidence intervals.

Returns:

Ranking results, either as a tuple of (scores, rankings) or just rankings. If include_bootstrap_ci is True, adds bootstrap_cis to the tuple. If include_per_problem_info is True, adds per_problem_info to the tuple.

fit_stream(problem_iter: Iterator[List[pm_rank.data.base.ForecastProblem]], include_scores: bool = True) → Dict[int, Tuple[Dict[str, Any], Dict[str, int]]]¶

Fit the scoring rule to streaming problems and return incremental results.

This method processes problems as they arrive and returns rankings after each batch, allowing for incremental analysis of forecaster performance over time.

Parameters:

problem_iter – Iterator over batches of ForecastProblem instances.
include_scores – Whether to include scores in the results (default: True).

Returns:

Mapping of batch indices to ranking results.

fit_stream_with_timestamp(problem_time_iter: Iterator[Tuple[str, List[pm_rank.data.base.ForecastProblem]]], include_scores: bool = True) → collections.OrderedDict¶

Fit the scoring rule to streaming problems with timestamps and return incremental results.

This method processes problems with associated timestamps and returns rankings after each batch, maintaining chronological order.

Parameters:

problem_time_iter – Iterator over (timestamp, problems) tuples.
include_scores – Whether to include scores in the results (default: True).

Returns:

Chronologically ordered mapping of timestamps to ranking results.

fit_by_category(problems: List[pm_rank.data.base.ForecastProblem], include_scores: bool = True, stream_with_timestamp: bool = False, stream_increment_by: Literal['day', 'week', 'month'] = 'day', min_bucket_size: int = 1) → Tuple[Dict[str, Any], Dict[str, int]] | Dict[str, int]¶

Fit the scoring rule to the given problems by category.

This method processes problems grouped by category and returns rankings for each category. Optionally, it can stream problems within each category over time.

Parameters:

problems – List of ForecastProblem instances to process.
include_scores – Whether to include scores in the results (default: True).
stream_with_timestamp – Whether to stream problems with timestamps (default: False).
stream_increment_by – The increment by which to stream problems (default: “day”).
min_bucket_size – The minimum number of problems to include in a bucket (default: 1).

Returns:

Mapping of categories to ranking results.

class src.pm_rank.model.scoring_rule.LogScoringRule(clip_prob: float = 0.01, verbose: bool = False)¶

Bases: ScoringRule

Logarithmic scoring rule for evaluating probabilistic forecasts.

The logarithmic scoring rule is a proper scoring rule that rewards forecasters based on the logarithm of their predicted probability for the actual outcome. This rule heavily penalizes overconfident predictions and rewards well-calibrated forecasts.

Parameters:

clip_prob – Minimum probability value to prevent log(0) (default: 0.01).
verbose – Whether to enable verbose logging (default: False).

Initialize the logarithmic scoring rule.

Parameters:

clip_prob – Minimum probability value to prevent log(0) (default: 0.01).
verbose – Whether to enable verbose logging (default: False).

clip_prob = 0.01¶

class src.pm_rank.model.scoring_rule.BrierScoringRule(negate: bool = True, verbose: bool = False)¶

Bases: ScoringRule

Brier scoring rule for evaluating probabilistic forecasts.

The Brier score is a quadratic proper scoring rule that measures the squared difference between predicted probabilities and actual outcomes. It is widely used in prediction markets and provides a good balance between rewarding accuracy and calibration.

Parameters:

negate – Whether to negate the scores so that higher values are better (default: True).
verbose – Whether to enable verbose logging (default: False).

Initialize the Brier scoring rule.

Parameters:

negate – Whether to negate the scores so that higher values are better (default: True).
verbose – Whether to enable verbose logging (default: False).

negate = True¶

class src.pm_rank.model.scoring_rule.SphericalScoringRule(verbose: bool = False)¶

Bases: ScoringRule

Spherical scoring rule for evaluating probabilistic forecasts.

The spherical scoring rule normalizes probability vectors to unit vectors and measures the cosine similarity with the actual outcome. This rule is less sensitive to extreme probability values compared to the logarithmic rule.

Parameters:: verbose – Whether to enable verbose logging (default: False).

Initialize the spherical scoring rule.

Parameters:: verbose – Whether to enable verbose logging (default: False).