src.pm_rank.data.base

Defining the prediction market data structure and functions to load them from different types of data sources.

Attributes

SMOOTH_ODDS_EPS

Classes

ForecastEvent

Individual forecast from a user for a specific problem.

ForecastProblem

A prediction problem with multiple options and forecasts.

ForecastChallenge

A collection of forecast problems with validation and computed properties.

ChallengeLoader

Abstract base class for loading forecast challenges from different data sources.

Module Contents

src.pm_rank.data.base.SMOOTH_ODDS_EPS = 0.005
class src.pm_rank.data.base.ForecastEvent

Bases: pydantic.BaseModel

Individual forecast from a user for a specific problem.

problem_id: str
username: str
timestamp: datetime.datetime
probs: List[float]
unnormalized_probs: List[float] | None
validate_probabilities(v)

Validate that probabilities sum to 1 and are non-negative.

set_unnormalized_probs_default()

Set unnormalized_probs to probs if not provided.

validate_unnormalized_probabilities(v)

Validate that unnormalized probabilities are non-negative. we only require every number to be in [0, 1], and the vector dimension is the same as the number of options.

class src.pm_rank.data.base.ForecastProblem

Bases: pydantic.BaseModel

A prediction problem with multiple options and forecasts.

title: str
problem_id: str
options: List[str]
correct_option_idx: List[int]
forecasts: List[ForecastEvent]
end_time: datetime.datetime
num_forecasters: int
url: str | None
odds: List[float] | None
category: str | None
validate_correct_option_idx(v, info)

Validate that correct_option_idx is in the options list.

validate_forecasts(v, info)

Validate that all forecasts have correct number of probabilities.

validate_odds(v, info)

Validate that odds match the number of options if provided.

smooth_odds()

Smooth the odds to not be too close to 0 or 1.

property has_odds: bool

Check if the problem has odds data.

property crowd_probs: List[float]

Calculate crowd probabilities from the forecasts.

property unique_forecasters: List[str]

Get list of unique forecasters for this problem.

property option_payoffs: List[Tuple[int, float]]

Obtain a sorted list of (option_idx, payoff) for this problem. The payoff is the (1 / odds) for a correct option, and 0 for an incorrect option.

class src.pm_rank.data.base.ForecastChallenge

Bases: pydantic.BaseModel

A collection of forecast problems with validation and computed properties.

title: str
forecast_problems: List[ForecastProblem]
categories: List[str] | None
validate_problems(v)

Validate that there are problems and they have unique IDs.

validate_categories(v, info)

Validate that categories are a list of strings.

property forecaster_map: Dict[str, List[ForecastEvent]]

Map from forecaster username to their forecasts across all problems.

property num_forecasters: int

Total number of unique forecasters across all problems.

property unique_forecasters: List[str]

List of unique forecaster usernames.

property problem_option_payoffs: Dict[str, List[Tuple[int, float]]]

Map from problem_id to a sorted list of (option_idx, payoff) for this problem.

get_forecaster_problems(username: str) List[ForecastProblem]

Get all problems that a specific forecaster participated in.

get_problem_by_id(problem_id: str) ForecastProblem | None

Get a specific problem by its ID.

get_problems(nums: int = -1) List[ForecastProblem]

Get a list of problems. If nums is -1, return all problems.

stream_problems(order: Literal['sequential', 'random', 'time'] = 'sequential', increment: int = 100) Iterator[List[ForecastProblem]]

Stream the problems in the challenge. Either by random or by the problem end time.

Args:

order: The order in which to stream the problems. increment: The number of problems to stream in each iteration.

Returns:

An iterator of lists of problems.

stream_problems_over_time(increment_by: Literal['day', 'week', 'month'] = 'day', min_bucket_size: int = 1) Iterator[Tuple[str, List[ForecastProblem]]]

Stream all problems in chronological buckets.

fill_problem_with_fair_odds(force: bool = False) None

Certain challenge do not have odds data, we can fill in fair/uniform odds for each problem. If force is True, we will not check whether the problem already has odds data.

class src.pm_rank.data.base.ChallengeLoader

Bases: abc.ABC

Abstract base class for loading forecast challenges from different data sources. This separates the loading logic from the data model.

abstract load_challenge() ForecastChallenge

Load and return a ForecastChallenge from the data source.

abstract get_challenge_metadata() Dict[str, Any]

Get metadata about the challenge without loading all data.