src.pm_rank.model.calibration

Calibration Metric for LLM Predictions.

Currently this model adopts the following definition of a (perfectly-calibrated) probabilistic predictor f():

For all \(p \in [0, 1]\) and a pair of covariate \(X\) and binary outcome \(Y\), we have:

\[ \mathbb{P}(Y = 1 | f(X) = p) = p \]

We then define the (theoretical) expected calibration error (ECE) as a measure of deviation from the above property:

\[ \text{ECE}^* = \mathbb{E}_{X, Y}[ | \mathbb{P}(Y = 1 | f(X)) - f(X) | ] \]

In practice, we will calculate an empirical version of the above ECE via binning (discretization).

Reference: https://arxiv.org/pdf/2501.19047v2

Classes

CalibrationMetric

Initialize the CalibrationMetric.

Module Contents

class src.pm_rank.model.calibration.CalibrationMetric(num_bins: int = 10, strategy: Literal['uniform', 'quantile'] = 'uniform', weight_event: bool = True, verbose: bool = False)

Initialize the CalibrationMetric.

Parameters:
  • num_bins – The number of bins to use for discretization.

  • strategy – The strategy to use for discretization.

  • weight_event – Whether to weight the event by the number of markets in it. If False, then each market will be treated equally.

num_bins = 10
strategy = 'uniform'
weight_event = True
verbose = False
logger
fit(problems: List[pm_rank.data.base.ForecastProblem], include_scores: bool = True)

Fit the calibration metric to the given problems.

Parameters:

problems – List of ForecastProblem instances to process.

Returns:

A dictionary containing the calibration metric.

plot(name: str, title: str = 'Reliability diagram', save_path: str = None, figsize: tuple[float, float] = (4, 4), percent: bool = True)