src.pm_rank.model.calibration¶

Calibration Metric for LLM Predictions.

Currently this model adopts the following definition of a (perfectly-calibrated) probabilistic predictor f():

For all \(p \in [0, 1]\) and a pair of covariate \(X\) and binary outcome \(Y\), we have:

\[ \mathbb{P}(Y = 1 | f(X) = p) = p \]

We then define the (theoretical) expected calibration error (ECE) as a measure of deviation from the above property:

\[ \text{ECE}^* = \mathbb{E}_{X, Y}[ | \mathbb{P}(Y = 1 | f(X)) - f(X) | ] \]

In practice, we will calculate an empirical version of the above ECE via binning (discretization).

Reference: https://arxiv.org/pdf/2501.19047v2

Classes¶

CalibrationMetric

Initialize the CalibrationMetric.

Module Contents¶

class src.pm_rank.model.calibration.CalibrationMetric(num_bins: int = 10, strategy: Literal['uniform', 'quantile'] = 'uniform', weight_event: bool = True, verbose: bool = False)¶

Initialize the CalibrationMetric.

Parameters:

num_bins – The number of bins to use for discretization.
strategy – The strategy to use for discretization.
weight_event – Whether to weight the event by the number of markets in it. If False, then each market will be treated equally.

num_bins = 10¶

strategy = 'uniform'¶

weight_event = True¶

verbose = False¶

logger¶

fit(problems: List[pm_rank.data.base.ForecastProblem], include_scores: bool = True)¶

Fit the calibration metric to the given problems.

Parameters:: problems – List of ForecastProblem instances to process.
Returns:: A dictionary containing the calibration metric.

plot(name: str, title: str = 'Reliability diagram', save_path: str = None, figsize: tuple[float, float] = (4, 4), percent: bool = True)¶