src.pm_rank.data.loaders

Concrete implementations of ChallengeLoader for different data sources.

Classes

GJOChallengeLoader

Load forecast challenges from GJO (Good Judgment Open) data format.

ProphetArenaChallengeLoader

Load forecast challenges from Prophet Arena data format.

Module Contents

class src.pm_rank.data.loaders.GJOChallengeLoader(predictions_df: pandas.DataFrame | None = None, predictions_file: str | None = None, metadata_file: str | None = None, challenge_title: str = '')

Bases: src.pm_rank.data.base.ChallengeLoader

Load forecast challenges from GJO (Good Judgment Open) data format.

Initialize the GJOChallengeLoader. The challenge can be either loaded with a given pd.DataFrame or with a combination of paths predictions_file and metadata_file.

Args:

predictions_df (pd.DataFrame): a pd.DataFrame containing the predictions. If provided, predictions_file and metadata_file will be ignored. predictions_file (str): the path to the predictions file metadata_file (str): the path to the metadata file challenge_title (str): the title of the challenge

challenge_title = ''
logger
load_challenge(forecaster_filter: int = 0, problem_filter: int = 0) src.pm_rank.data.base.ForecastChallenge

Load challenge data from GJO format files.

Args:

forecaster_filter: minimum number of events for a forecaster to be included problem_filter: minimum number of events for a problem to be included

Returns:

ForecastChallenge: a ForecastChallenge object containing the forecast problems and events

get_challenge_metadata() Dict[str, Any]

Get basic metadata about the GJO challenge.

class src.pm_rank.data.loaders.ProphetArenaChallengeLoader(predictions_df: pandas.DataFrame | None = None, predictions_file: str | None = None, challenge_title: str = '', use_bid_for_odds: bool = False, use_open_time: bool = False)

Bases: src.pm_rank.data.base.ChallengeLoader

Load forecast challenges from Prophet Arena data format.

Initialize the ProphetArenaChallengeLoader.

The challenge can be either loaded with a given pd.DataFrame or with a path to a predictions file.

Parameters:
  • predictions_df – A pd.DataFrame containing the predictions. If provided, predictions_file will be ignored.

  • predictions_file – The path to the predictions file.

  • challenge_title – The title of the challenge.

  • use_bid_for_odds – Whether to use the yes_bid field for implied probability calculation. If True, the implied probability will be calculated as the (yes_bid + no_bid) / 2. If False, the implied probability will be simply yes_ask (normalized to sum to 1).

  • use_open_time – Whether to use the open_time field for the end_time of the problem. If True, the end_time will be the open_time of the problem. If False, the end_time will be the close_time of the problem.

challenge_title = ''
use_bid_for_odds = False
use_open_time = False
logger
load_challenge(add_market_baseline: bool = False) src.pm_rank.data.base.ForecastChallenge

Load challenge data from Prophet Arena data format. Group by submission_id, then for each group, build the list of forecasts, then the ForecastProblem.

Parameters:

add_market_baseline – Whether to add the market baseline as a forecaster

Returns:

A ForecastChallenge object containing the forecast problems and events.

get_challenge_metadata() Dict[str, Any]

Get basic metadata about the Prophet Arena challenge using pandas groupby (no full parsing).