src.pm_rank.data.loaders¶
Concrete implementations of ChallengeLoader for different data sources.
Classes¶
Load forecast challenges from GJO (Good Judgment Open) data format. |
|
Load forecast challenges from Prophet Arena data format. |
Module Contents¶
- class src.pm_rank.data.loaders.GJOChallengeLoader(predictions_df: pandas.DataFrame | None = None, predictions_file: str | None = None, metadata_file: str | None = None, challenge_title: str = '')¶
Bases:
src.pm_rank.data.base.ChallengeLoader
Load forecast challenges from GJO (Good Judgment Open) data format.
Initialize the GJOChallengeLoader. The challenge can be either loaded with a given pd.DataFrame or with a combination of paths predictions_file and metadata_file.
- Args:
predictions_df (pd.DataFrame): a pd.DataFrame containing the predictions. If provided, predictions_file and metadata_file will be ignored. predictions_file (str): the path to the predictions file metadata_file (str): the path to the metadata file challenge_title (str): the title of the challenge
- challenge_title = ''¶
- logger¶
- load_challenge(forecaster_filter: int = 0, problem_filter: int = 0) src.pm_rank.data.base.ForecastChallenge ¶
Load challenge data from GJO format files.
- Args:
forecaster_filter: minimum number of events for a forecaster to be included problem_filter: minimum number of events for a problem to be included
- Returns:
ForecastChallenge: a ForecastChallenge object containing the forecast problems and events
- get_challenge_metadata() Dict[str, Any] ¶
Get basic metadata about the GJO challenge.
- class src.pm_rank.data.loaders.ProphetArenaChallengeLoader(predictions_df: pandas.DataFrame | None = None, predictions_file: str | None = None, challenge_title: str = '', use_bid_for_odds: bool = False, use_open_time: bool = False)¶
Bases:
src.pm_rank.data.base.ChallengeLoader
Load forecast challenges from Prophet Arena data format.
Initialize the ProphetArenaChallengeLoader.
The challenge can be either loaded with a given pd.DataFrame or with a path to a predictions file.
- Parameters:
predictions_df – A pd.DataFrame containing the predictions. If provided, predictions_file will be ignored.
predictions_file – The path to the predictions file.
challenge_title – The title of the challenge.
use_bid_for_odds – Whether to use the yes_bid field for implied probability calculation. If True, the implied probability will be calculated as the (yes_bid + no_bid) / 2. If False, the implied probability will be simply yes_ask (normalized to sum to 1).
use_open_time – Whether to use the open_time field for the end_time of the problem. If True, the end_time will be the open_time of the problem. If False, the end_time will be the close_time of the problem.
- challenge_title = ''¶
- use_bid_for_odds = False¶
- use_open_time = False¶
- logger¶
- load_challenge(add_market_baseline: bool = False) src.pm_rank.data.base.ForecastChallenge ¶
Load challenge data from Prophet Arena data format. Group by submission_id, then for each group, build the list of forecasts, then the ForecastProblem.
- Parameters:
add_market_baseline – Whether to add the market baseline as a forecaster
- Returns:
A ForecastChallenge object containing the forecast problems and events.
- get_challenge_metadata() Dict[str, Any] ¶
Get basic metadata about the Prophet Arena challenge using pandas groupby (no full parsing).