src.pm_rank.model.calibration¶
Calibration Metric for LLM Predictions.
Currently this model adopts the following definition of a (perfectly-calibrated) probabilistic predictor f():
For all \(p \in [0, 1]\) and a pair of covariate \(X\) and binary outcome \(Y\), we have:
We then define the (theoretical) expected calibration error (ECE) as a measure of deviation from the above property:
In practice, we will calculate an empirical version of the above ECE via binning (discretization).
Reference: https://arxiv.org/pdf/2501.19047v2
Classes¶
Initialize the CalibrationMetric. |
Module Contents¶
- class src.pm_rank.model.calibration.CalibrationMetric(num_bins: int = 10, strategy: Literal['uniform', 'quantile'] = 'uniform', weight_event: bool = True, verbose: bool = False)¶
Initialize the CalibrationMetric.
- Parameters:
num_bins – The number of bins to use for discretization.
strategy – The strategy to use for discretization.
weight_event – Whether to weight the event by the number of markets in it. If False, then each market will be treated equally.
- num_bins = 10¶
- strategy = 'uniform'¶
- weight_event = True¶
- verbose = False¶
- logger¶
- fit(problems: List[pm_rank.data.base.ForecastProblem], include_scores: bool = True)¶
Fit the calibration metric to the given problems.
- Parameters:
problems – List of ForecastProblem instances to process.
- Returns:
A dictionary containing the calibration metric.
- plot(name: str, title: str = 'Reliability diagram', save_path: str = None, figsize: tuple[float, float] = (4, 4), percent: bool = True)¶