PM-RANK Documentation¶

Welcome to the documentation for PM_RANK: an analysis toolkit for prediction markets.

In the beginning, we develop the pm_rank package to support our Prophet Arena platform.

But since pm_rank provides a unified, hierarchical interface for defining prediction market and its events, we expect it to be useful for broader audience, especially for those who want to integrate handy scoring/ranking algorithms into their own projects.

Below we will provide a quick overview of core concepts (e.g. models, dataclass interfaces, etc.) in pm_rank. Please refer to:

src.pm_rank for detailed API documentation.
Colab Demo for a quick demo using the package to load prediction market data and obtain some interesting insights.
How We Score & Rank LLMs in Prophet Arena for our detailed blogpost walking through the reasoning behind our ranking module design.

Quick Installation Guide¶

pm_rank is a python package that can be installed via pip (requires python 3.8 or higher)

pip install -U pm_rank

The default version uses minimal dependencies (i.e. no pytorch), so some ranking models (e.g. IRT) are not available.

To install the full version, you can install the full dependency:

pip install -U pm_rank[full]

Note

For potential developers: If you want to contribute to the documentation, you can install the docs dependency:

pip install -U pm_rank[docs]

Then you can build the documentation by running:

cd docs && make html

Core Concepts¶

We first use a flowchart to illustrate the pipeline of using pm_rank:

In a nutshell, at the beginning of this pipeline, user can choose to bring their own dataset (e.g. from human prediction market platforms) and implement their own subclass that inherits from ChallengeLoader (see src.pm_rank.data.loaders).

Once this is done (i.e. the loader implements the standard .load_challenge() method), the downstream steps are data-source independent and we introduce the core concepts here:

Note

Please refer to src.pm_rank.data.base for the actual data model implementation. We give a high-level and non-comprehensive overview in a bottom-up manner.

ForecastEvent: this is the most atomic unit of prediction market data. It represents a single prediction made by a forecaster for a single forecast problem.

Key Fields in ForecastEvent:
- problem_id: an unique identifier for the problem
- username: an unique identifier for the forecaster
- timestamp: the timestamp of the prediction. Note that this is not optional as we might want to stream the predictions in time. However, if the original data does not contain this information, we will use the current time as a placeholder.
- probs: the probability distribution over the options – given by the forecaster.
- unnormalized_probs: the unnormalized probability distribution over the options – given by the forecaster.
- odds (optional): the market odds for each option to realize (resolve to YES)
- no_odds (optional): the market odds for each option to not realize (resolve to NO)
ForecastProblem: this is a collection of ForecastEvents for a single forecast problem. It validates keeps track of metadata for the problem like the options and the correct option. It is also a handy way to organize the dataset as we treat ForecastProblem as the basic unit of streaming prediction market data.

In particular, if a ForecastProblem contains ForecastEvents that have the odds/no_odds field, we would answer questions like “how much money can an individual forecaster make” and use these results to rank the forecasters. See src.pm_rank.model.average_return for more details.

Key Fields in ForecastProblem:
- title: the title of the problem
- problem_id: the id of the problem
- options: the options for the problem
- correct_option_idx: the index of the correct option
- forecasts: the forecasts for the problem
- num_forecasters: the number of forecasters
- url: the URL of the problem
ForecastChallenge: this is a collection of ForecastProblems. It implements two core functionalities for all scoring/ranking methods to use:
- get_problems -> List[ForecastProblem]: return all the problems in the challenge. Suitable for the full-analysis setting.
- stream_problems -> Iterator[List[ForecastProblem]]: return the problems in the challenge in a streaming setting. This setting simulates the real-world scenario where the predictions enter gradually. The scoring/ranking methods can also leverage this function to efficiently calculate the metrics at different time points (batches).