src.pm_rank.data¶

Data subpackage for pm_rank.

Submodules¶

Classes¶

`ForecastEvent`	Individual forecast from a user for a specific problem.
`ForecastProblem`	A prediction problem with multiple options and forecasts.
`ForecastChallenge`	A collection of forecast problems with validation and computed properties.
`ChallengeLoader`	Abstract base class for loading forecast challenges from different data sources.
`GJOChallengeLoader`	Load forecast challenges from GJO (Good Judgment Open) data format.

Package Contents¶

class src.pm_rank.data.ForecastEvent¶

Bases: pydantic.BaseModel

Individual forecast from a user for a specific problem.

forecast_id: str¶

problem_id: str¶

username: str¶

timestamp: datetime.datetime¶

probs: List[float]¶

unnormalized_probs: List[float] | None¶

weight: float¶

odds: List[float] | None¶

no_odds: List[float] | None¶

validate_weight(v)¶: Validate that weight is non-negative.

validate_probabilities(v)¶: Validate that probabilities sum to 1 and are non-negative.

set_unnormalized_probs_default()¶: Set unnormalized_probs to probs if not provided.

validate_unnormalized_probabilities(v)¶: Validate that unnormalized probabilities are non-negative. we only require every number to be in [0, 1], and the vector dimension is the same as the number of options.

validate_odds(v, info)¶: Validate that odds match the number of probabilities if provided.

smooth_odds()¶: Smooth the odds to not be too close to 0 or 1.

class src.pm_rank.data.ForecastProblem¶

Bases: pydantic.BaseModel

A prediction problem with multiple options and forecasts.

title: str¶

problem_id: str¶

options: List[str]¶

correct_option_idx: List[int]¶

forecasts: List[ForecastEvent]¶

end_time: datetime.datetime¶

num_forecasters: int¶

url: str | None¶

category: str | None¶

validate_correct_option_idx(v, info)¶: Validate that correct_option_idx is in the options list.

validate_forecasts(v, info)¶: Validate that all forecasts have (1) correct number of probabilities, (2) unique forecast_id.

property has_odds: bool¶: Check if the problem has odds data.

property has_no_odds: bool¶: Check if the problem has no_odds data.

property crowd_probs: List[float]¶: Calculate crowd probabilities from the forecasts.

property unique_forecasters: List[str]¶: Get list of unique forecasters for this problem.

class src.pm_rank.data.ForecastChallenge¶

Bases: pydantic.BaseModel

A collection of forecast problems with validation and computed properties.

title: str¶

forecast_problems: List[ForecastProblem]¶

categories: List[str] | None¶

validate_problems(v)¶: Validate that there are problems and they have unique IDs.

validate_categories(v, info)¶: Validate that categories are a list of strings.

property forecaster_map: Dict[str, List[ForecastEvent]]¶: Map from forecaster username to their forecasts across all problems.

property num_forecasters: int¶: Total number of unique forecasters across all problems.

property unique_forecasters: List[str]¶: List of unique forecaster usernames.

get_forecaster_problems(username: str) → List[ForecastProblem]¶: Get all problems that a specific forecaster participated in.

get_problem_by_id(problem_id: str) → ForecastProblem | None¶: Get a specific problem by its ID.

get_problems(nums: int = -1) → List[ForecastProblem]¶: Get a list of problems. If nums is -1, return all problems.

stream_problems(order: Literal['sequential', 'random', 'time'] = 'sequential', increment: int = 100) → Iterator[List[ForecastProblem]]¶

Stream the problems in the challenge. Either by random or by the problem end time.

Args:: order: The order in which to stream the problems. increment: The number of problems to stream in each iteration.
Returns:: An iterator of lists of problems.

stream_problems_over_time(increment_by: Literal['day', 'week', 'month'] = 'day', min_bucket_size: int = 1) → Iterator[Tuple[str, List[ForecastProblem]]]¶: Stream all problems in chronological buckets.

class src.pm_rank.data.ChallengeLoader¶

Bases: abc.ABC

Abstract base class for loading forecast challenges from different data sources. This separates the loading logic from the data model.

abstract load_challenge() → ForecastChallenge¶: Load and return a ForecastChallenge from the data source.

abstract get_challenge_metadata() → Dict[str, Any]¶: Get metadata about the challenge without loading all data.

class src.pm_rank.data.GJOChallengeLoader(predictions_df: pandas.DataFrame | None = None, predictions_file: str | None = None, metadata_file: str | None = None, challenge_title: str = '')¶

Bases: src.pm_rank.data.base.ChallengeLoader

Load forecast challenges from GJO (Good Judgment Open) data format.

Initialize the GJOChallengeLoader. The challenge can be either loaded with a given pd.DataFrame or with a combination of paths predictions_file and metadata_file.

Args:: predictions_df (pd.DataFrame): a pd.DataFrame containing the predictions. If provided, predictions_file and metadata_file will be ignored. predictions_file (str): the path to the predictions file metadata_file (str): the path to the metadata file challenge_title (str): the title of the challenge

challenge_title = ''¶

logger¶

load_challenge(forecaster_filter: int = 0, problem_filter: int = 0) → src.pm_rank.data.base.ForecastChallenge¶

Load challenge data from GJO format files.

Args:: forecaster_filter: minimum number of events for a forecaster to be included problem_filter: minimum number of events for a problem to be included
Returns:: ForecastChallenge: a ForecastChallenge object containing the forecast problems and events

get_challenge_metadata() → Dict[str, Any]¶: Get basic metadata about the GJO challenge.