src.pm_rank.data.base¶
Defining the prediction market data structure and functions to load them from different types of data sources.
Attributes¶
Classes¶
Individual forecast from a user for a specific problem. |
|
A prediction problem with multiple options and forecasts. |
|
A collection of forecast problems with validation and computed properties. |
|
Abstract base class for loading forecast challenges from different data sources. |
Module Contents¶
- src.pm_rank.data.base.SMOOTH_ODDS_EPS = 0.005¶
- class src.pm_rank.data.base.ForecastEvent¶
Bases:
pydantic.BaseModel
Individual forecast from a user for a specific problem.
- problem_id: str¶
- username: str¶
- timestamp: datetime.datetime¶
- probs: List[float]¶
- unnormalized_probs: List[float] | None¶
- validate_probabilities(v)¶
Validate that probabilities sum to 1 and are non-negative.
- set_unnormalized_probs_default()¶
Set unnormalized_probs to probs if not provided.
- validate_unnormalized_probabilities(v)¶
Validate that unnormalized probabilities are non-negative. we only require every number to be in [0, 1], and the vector dimension is the same as the number of options.
- class src.pm_rank.data.base.ForecastProblem¶
Bases:
pydantic.BaseModel
A prediction problem with multiple options and forecasts.
- title: str¶
- problem_id: str¶
- options: List[str]¶
- correct_option_idx: List[int]¶
- forecasts: List[ForecastEvent]¶
- end_time: datetime.datetime¶
- num_forecasters: int¶
- url: str | None¶
- odds: List[float] | None¶
- category: str | None¶
- validate_correct_option_idx(v, info)¶
Validate that correct_option_idx is in the options list.
- validate_forecasts(v, info)¶
Validate that all forecasts have correct number of probabilities.
- validate_odds(v, info)¶
Validate that odds match the number of options if provided.
- smooth_odds()¶
Smooth the odds to not be too close to 0 or 1.
- property has_odds: bool¶
Check if the problem has odds data.
- property crowd_probs: List[float]¶
Calculate crowd probabilities from the forecasts.
- property unique_forecasters: List[str]¶
Get list of unique forecasters for this problem.
- property option_payoffs: List[Tuple[int, float]]¶
Obtain a sorted list of (option_idx, payoff) for this problem. The payoff is the (1 / odds) for a correct option, and 0 for an incorrect option.
- class src.pm_rank.data.base.ForecastChallenge¶
Bases:
pydantic.BaseModel
A collection of forecast problems with validation and computed properties.
- title: str¶
- forecast_problems: List[ForecastProblem]¶
- categories: List[str] | None¶
- validate_problems(v)¶
Validate that there are problems and they have unique IDs.
- validate_categories(v, info)¶
Validate that categories are a list of strings.
- property forecaster_map: Dict[str, List[ForecastEvent]]¶
Map from forecaster username to their forecasts across all problems.
- property num_forecasters: int¶
Total number of unique forecasters across all problems.
- property unique_forecasters: List[str]¶
List of unique forecaster usernames.
- property problem_option_payoffs: Dict[str, List[Tuple[int, float]]]¶
Map from problem_id to a sorted list of (option_idx, payoff) for this problem.
- get_forecaster_problems(username: str) List[ForecastProblem] ¶
Get all problems that a specific forecaster participated in.
- get_problem_by_id(problem_id: str) ForecastProblem | None ¶
Get a specific problem by its ID.
- get_problems(nums: int = -1) List[ForecastProblem] ¶
Get a list of problems. If nums is -1, return all problems.
- stream_problems(order: Literal['sequential', 'random', 'time'] = 'sequential', increment: int = 100) Iterator[List[ForecastProblem]] ¶
Stream the problems in the challenge. Either by random or by the problem end time.
- Args:
order: The order in which to stream the problems. increment: The number of problems to stream in each iteration.
- Returns:
An iterator of lists of problems.
- stream_problems_over_time(increment_by: Literal['day', 'week', 'month'] = 'day', min_bucket_size: int = 1) Iterator[Tuple[str, List[ForecastProblem]]] ¶
Stream all problems in chronological buckets.
- fill_problem_with_fair_odds(force: bool = False) None ¶
Certain challenge do not have odds data, we can fill in fair/uniform odds for each problem. If force is True, we will not check whether the problem already has odds data.
- class src.pm_rank.data.base.ChallengeLoader¶
Bases:
abc.ABC
Abstract base class for loading forecast challenges from different data sources. This separates the loading logic from the data model.
- abstract load_challenge() ForecastChallenge ¶
Load and return a ForecastChallenge from the data source.
- abstract get_challenge_metadata() Dict[str, Any] ¶
Get metadata about the challenge without loading all data.