src.pm_rank.data.loaders¶

Concrete implementations of ChallengeLoader for different data sources.

Classes¶

`GJOChallengeLoader`	Load forecast challenges from GJO (Good Judgment Open) data format.
`ProphetArenaChallengeLoader`	Load forecast challenges from Prophet Arena data format.

Module Contents¶

class src.pm_rank.data.loaders.GJOChallengeLoader(predictions_df: pandas.DataFrame | None = None, predictions_file: str | None = None, metadata_file: str | None = None, challenge_title: str = '')¶

Bases: src.pm_rank.data.base.ChallengeLoader

Load forecast challenges from GJO (Good Judgment Open) data format.

Initialize the GJOChallengeLoader. The challenge can be either loaded with a given pd.DataFrame or with a combination of paths predictions_file and metadata_file.

Args:: predictions_df (pd.DataFrame): a pd.DataFrame containing the predictions. If provided, predictions_file and metadata_file will be ignored. predictions_file (str): the path to the predictions file metadata_file (str): the path to the metadata file challenge_title (str): the title of the challenge

challenge_title = ''¶

logger¶

load_challenge(forecaster_filter: int = 0, problem_filter: int = 0) → src.pm_rank.data.base.ForecastChallenge¶

Load challenge data from GJO format files.

Args:: forecaster_filter: minimum number of events for a forecaster to be included problem_filter: minimum number of events for a problem to be included
Returns:: ForecastChallenge: a ForecastChallenge object containing the forecast problems and events

get_challenge_metadata() → Dict[str, Any]¶: Get basic metadata about the GJO challenge.

class src.pm_rank.data.loaders.ProphetArenaChallengeLoader(predictions_df: pandas.DataFrame | None = None, predictions_file: str | None = None, challenge_title: str = '', use_bid_for_odds: bool = False, use_open_time: bool = False)¶

Bases: src.pm_rank.data.base.ChallengeLoader

Load forecast challenges from Prophet Arena data format.

Initialize the ProphetArenaChallengeLoader.

The challenge can be either loaded with a given pd.DataFrame or with a path to a predictions file.

Parameters:

predictions_df – A pd.DataFrame containing the predictions. If provided, predictions_file will be ignored.
predictions_file – The path to the predictions file.
challenge_title – The title of the challenge.
use_bid_for_odds – Whether to use the yes_bid field for implied probability calculation. If True, the implied probability will be calculated as the (yes_bid + no_bid) / 2. If False, the implied probability will be simply yes_ask (normalized to sum to 1).
use_open_time – Whether to use the open_time field for the end_time of the problem. If True, the end_time will be the open_time of the problem. If False, the end_time will be the close_time of the problem.

challenge_title = ''¶

use_bid_for_odds = False¶

use_open_time = False¶

logger¶

load_challenge(add_market_baseline: bool = False) → src.pm_rank.data.base.ForecastChallenge¶

Load challenge data from Prophet Arena data format. Group by submission_id, then for each group, build the list of forecasts, then the ForecastProblem.

Parameters:: add_market_baseline – Whether to add the market baseline as a forecaster
Returns:: A ForecastChallenge object containing the forecast problems and events.

get_challenge_metadata() → Dict[str, Any]¶: Get basic metadata about the Prophet Arena challenge using pandas groupby (no full parsing).