src.pm_rank.data.base¶

Defining the prediction market data structure and functions to load them from different types of data sources.

Attributes¶

SMOOTH_ODDS_EPS

Classes¶

`ForecastEvent`	Individual forecast from a user for a specific problem.
`ProphetArenaForecastEvent`	Specialized forecast event for Prophet Arena.
`ForecastProblem`	A prediction problem with multiple options and forecasts.
`ForecastChallenge`	A collection of forecast problems with validation and computed properties.
`ChallengeLoader`	Abstract base class for loading forecast challenges from different data sources.

Module Contents¶

src.pm_rank.data.base.SMOOTH_ODDS_EPS = 0.005¶

class src.pm_rank.data.base.ForecastEvent¶

Bases: pydantic.BaseModel

Individual forecast from a user for a specific problem.

forecast_id: str¶

problem_id: str¶

username: str¶

timestamp: datetime.datetime¶

probs: List[float]¶

unnormalized_probs: List[float] | None¶

weight: float¶

odds: List[float] | None¶

no_odds: List[float] | None¶

validate_weight(v)¶: Validate that weight is non-negative.

validate_probabilities(v)¶: Validate that probabilities sum to 1 and are non-negative.

set_unnormalized_probs_default()¶: Set unnormalized_probs to probs if not provided.

validate_unnormalized_probabilities(v)¶: Validate that unnormalized probabilities are non-negative. we only require every number to be in [0, 1], and the vector dimension is the same as the number of options.

validate_odds(v, info)¶: Validate that odds match the number of probabilities if provided.

smooth_odds()¶: Smooth the odds to not be too close to 0 or 1.

class src.pm_rank.data.base.ProphetArenaForecastEvent¶

Bases: ForecastEvent

Specialized forecast event for Prophet Arena.

submission_id: str¶

class src.pm_rank.data.base.ForecastProblem¶

Bases: pydantic.BaseModel

A prediction problem with multiple options and forecasts.

title: str¶

problem_id: str¶

options: List[str]¶

correct_option_idx: List[int]¶

forecasts: List[ForecastEvent]¶

end_time: datetime.datetime¶

num_forecasters: int¶

url: str | None¶

category: str | None¶

validate_correct_option_idx(v, info)¶: Validate that correct_option_idx is in the options list.

validate_forecasts(v, info)¶: Validate that all forecasts have (1) correct number of probabilities, (2) unique forecast_id.

property has_odds: bool¶: Check if the problem has odds data.

property has_no_odds: bool¶: Check if the problem has no_odds data.

property crowd_probs: List[float]¶: Calculate crowd probabilities from the forecasts.

property unique_forecasters: List[str]¶: Get list of unique forecasters for this problem.

class src.pm_rank.data.base.ForecastChallenge¶

Bases: pydantic.BaseModel

A collection of forecast problems with validation and computed properties.

title: str¶

forecast_problems: List[ForecastProblem]¶

categories: List[str] | None¶

validate_problems(v)¶: Validate that there are problems and they have unique IDs.

validate_categories(v, info)¶: Validate that categories are a list of strings.

property forecaster_map: Dict[str, List[ForecastEvent]]¶: Map from forecaster username to their forecasts across all problems.

property num_forecasters: int¶: Total number of unique forecasters across all problems.

property unique_forecasters: List[str]¶: List of unique forecaster usernames.

get_forecaster_problems(username: str) → List[ForecastProblem]¶: Get all problems that a specific forecaster participated in.

get_problem_by_id(problem_id: str) → ForecastProblem | None¶: Get a specific problem by its ID.

get_problems(nums: int = -1) → List[ForecastProblem]¶: Get a list of problems. If nums is -1, return all problems.

stream_problems(order: Literal['sequential', 'random', 'time'] = 'sequential', increment: int = 100) → Iterator[List[ForecastProblem]]¶

Stream the problems in the challenge. Either by random or by the problem end time.

Args:: order: The order in which to stream the problems. increment: The number of problems to stream in each iteration.
Returns:: An iterator of lists of problems.

stream_problems_over_time(increment_by: Literal['day', 'week', 'month'] = 'day', min_bucket_size: int = 1) → Iterator[Tuple[str, List[ForecastProblem]]]¶: Stream all problems in chronological buckets.

class src.pm_rank.data.base.ChallengeLoader¶

Bases: abc.ABC

Abstract base class for loading forecast challenges from different data sources. This separates the loading logic from the data model.

abstract load_challenge() → ForecastChallenge¶: Load and return a ForecastChallenge from the data source.

abstract get_challenge_metadata() → Dict[str, Any]¶: Get metadata about the challenge without loading all data.