src.pm_rank.data

Data subpackage for pm_rank.

Submodules

Classes

ForecastEvent

Individual forecast from a user for a specific problem.

ForecastProblem

A prediction problem with multiple options and forecasts.

ForecastChallenge

A collection of forecast problems with validation and computed properties.

ChallengeLoader

Abstract base class for loading forecast challenges from different data sources.

GJOChallengeLoader

Load forecast challenges from GJO (Good Judgment Open) data format.

Package Contents

class src.pm_rank.data.ForecastEvent

Bases: pydantic.BaseModel

Individual forecast from a user for a specific problem.

problem_id: str
username: str
timestamp: datetime.datetime
probs: List[float]
unnormalized_probs: List[float] | None
validate_probabilities(v)

Validate that probabilities sum to 1 and are non-negative.

set_unnormalized_probs_default()

Set unnormalized_probs to probs if not provided.

validate_unnormalized_probabilities(v)

Validate that unnormalized probabilities are non-negative. we only require every number to be in [0, 1], and the vector dimension is the same as the number of options.

class src.pm_rank.data.ForecastProblem

Bases: pydantic.BaseModel

A prediction problem with multiple options and forecasts.

title: str
problem_id: str
options: List[str]
correct_option_idx: List[int]
forecasts: List[ForecastEvent]
end_time: datetime.datetime
num_forecasters: int
url: str | None
odds: List[float] | None
category: str | None
validate_correct_option_idx(v, info)

Validate that correct_option_idx is in the options list.

validate_forecasts(v, info)

Validate that all forecasts have correct number of probabilities.

validate_odds(v, info)

Validate that odds match the number of options if provided.

smooth_odds()

Smooth the odds to not be too close to 0 or 1.

property has_odds: bool

Check if the problem has odds data.

property crowd_probs: List[float]

Calculate crowd probabilities from the forecasts.

property unique_forecasters: List[str]

Get list of unique forecasters for this problem.

property option_payoffs: List[Tuple[int, float]]

Obtain a sorted list of (option_idx, payoff) for this problem. The payoff is the (1 / odds) for a correct option, and 0 for an incorrect option.

class src.pm_rank.data.ForecastChallenge

Bases: pydantic.BaseModel

A collection of forecast problems with validation and computed properties.

title: str
forecast_problems: List[ForecastProblem]
categories: List[str] | None
validate_problems(v)

Validate that there are problems and they have unique IDs.

validate_categories(v, info)

Validate that categories are a list of strings.

property forecaster_map: Dict[str, List[ForecastEvent]]

Map from forecaster username to their forecasts across all problems.

property num_forecasters: int

Total number of unique forecasters across all problems.

property unique_forecasters: List[str]

List of unique forecaster usernames.

property problem_option_payoffs: Dict[str, List[Tuple[int, float]]]

Map from problem_id to a sorted list of (option_idx, payoff) for this problem.

get_forecaster_problems(username: str) List[ForecastProblem]

Get all problems that a specific forecaster participated in.

get_problem_by_id(problem_id: str) ForecastProblem | None

Get a specific problem by its ID.

get_problems(nums: int = -1) List[ForecastProblem]

Get a list of problems. If nums is -1, return all problems.

stream_problems(order: Literal['sequential', 'random', 'time'] = 'sequential', increment: int = 100) Iterator[List[ForecastProblem]]

Stream the problems in the challenge. Either by random or by the problem end time.

Args:

order: The order in which to stream the problems. increment: The number of problems to stream in each iteration.

Returns:

An iterator of lists of problems.

stream_problems_over_time(increment_by: Literal['day', 'week', 'month'] = 'day', min_bucket_size: int = 1) Iterator[Tuple[str, List[ForecastProblem]]]

Stream all problems in chronological buckets.

fill_problem_with_fair_odds(force: bool = False) None

Certain challenge do not have odds data, we can fill in fair/uniform odds for each problem. If force is True, we will not check whether the problem already has odds data.

class src.pm_rank.data.ChallengeLoader

Bases: abc.ABC

Abstract base class for loading forecast challenges from different data sources. This separates the loading logic from the data model.

abstract load_challenge() ForecastChallenge

Load and return a ForecastChallenge from the data source.

abstract get_challenge_metadata() Dict[str, Any]

Get metadata about the challenge without loading all data.

class src.pm_rank.data.GJOChallengeLoader(predictions_df: pandas.DataFrame | None = None, predictions_file: str | None = None, metadata_file: str | None = None, challenge_title: str = '')

Bases: src.pm_rank.data.base.ChallengeLoader

Load forecast challenges from GJO (Good Judgment Open) data format.

Initialize the GJOChallengeLoader. The challenge can be either loaded with a given pd.DataFrame or with a combination of paths predictions_file and metadata_file.

Args:

predictions_df (pd.DataFrame): a pd.DataFrame containing the predictions. If provided, predictions_file and metadata_file will be ignored. predictions_file (str): the path to the predictions file metadata_file (str): the path to the metadata file challenge_title (str): the title of the challenge

challenge_title = ''
logger
load_challenge(forecaster_filter: int = 0, problem_filter: int = 0) src.pm_rank.data.base.ForecastChallenge

Load challenge data from GJO format files.

Args:

forecaster_filter: minimum number of events for a forecaster to be included problem_filter: minimum number of events for a problem to be included

Returns:

ForecastChallenge: a ForecastChallenge object containing the forecast problems and events

get_challenge_metadata() Dict[str, Any]

Get basic metadata about the GJO challenge.