PM-RANK Documentation¶
Welcome to the documentation for PM_RANK
: an analysis toolkit for prediction markets.
In the beginning, we develop the pm_rank
package to support our Prophet Arena
platform.

But since
pm_rank
provides a unified, hierarchical interface for defining prediction market and its events, we expect it to be useful for broader audience,
especially for those who want to integrate handy scoring/ranking algorithms into their own projects.
Below we will provide a quick overview of core concepts (e.g. models, dataclass interfaces, etc.) in pm_rank
. Please refer to:
src.pm_rank for detailed API documentation.
Colab Demo for a quick demo using the package to load prediction market data and obtain some interesting insights.
How We Score & Rank LLMs in Prediction Markets for our detailed blogpost walking through the reasoning behind our ranking module design.
Quick Installation Guide¶
pm_rank
is a python package that can be installed via pip
(requires python 3.8 or higher)
pip install -U pm_rank
The default version uses minimal dependencies (i.e. no pytorch
), so some ranking models (e.g. IRT
) are not available.
To install the full version, you can install the full
dependency:
pip install -U pm_rank[full]
Note
For potential developers:
If you want to contribute to the documentation, you can install the docs
dependency:
pip install -U pm_rank[docs]
Then you can build the documentation by running:
cd docs && make html
Core Concepts¶
We first use a flowchart to illustrate the pipeline of using pm_rank
:

In a nutshell, at the beginning of this pipeline, user can choose to bring their own dataset (e.g. from human prediction market platforms) and implement
their own subclass that inherits from ChallengeLoader
(see src.pm_rank.data.loaders).
Once this is done (i.e. the loader implements the standard .load_challenge()
method), the downstream steps are data-source independent and we
introduce the core concepts here:
Note
Please refer to src.pm_rank.data.base for the actual data model implementation. We give a high-level and non-comprehensive overview in a bottom-up manner.
ForecastEvent: this is the most atomic unit of prediction market data. It represents a single prediction made by a forecaster for a single forecast problem.
Key Fields in ForecastEvent:
problem_id
: an unique identifier for the problemusername
: an unique identifier for the forecastertimestamp
: the timestamp of the prediction. Note that this is not optional as we might want to stream the predictions in time. However, if the original data does not contain this information, we will use the current time as a placeholder.probs
: the probability distribution over the options – given by the forecaster.unnormalized_probs
: the unnormalized probability distribution over the options – given by the forecaster.
ForecastProblem: this is a collection of
ForecastEvent
s for a single forecast problem. It validates keeps track of metadata for the problem like the options and the correct option. It is also a handy way to organize the dataset as we treatForecastProblem
as the basic unit of streaming prediction market data.In particular, if a
ForecastProblem
has theodds
field, we would answer questions like “how much money can an individual forecaster make” and use these results to rank the forecasters. See src.pm_rank.model.average_return for more details.Key Fields in ForecastProblem:
title
: the title of the problemproblem_id
: the id of the problemoptions
: the options for the problemcorrect_option_idx
: the index of the correct optionforecasts
: the forecasts for the problemnum_forecasters
: the number of forecastersurl
: the URL of the problemodds
(optional): the market odds for each option
ForecastChallenge: this is a collection of
ForecastProblem
s. It implements two core functionalities for all scoring/ranking methods to use:get_problems -> List[ForecastProblem]
: return all the problems in the challenge. Suitable for the full-analysis setting.stream_problems -> Iterator[List[ForecastProblem]]
: return the problems in the challenge in a streaming setting. This setting simulates the real-world scenario where the predictions enter gradually. The scoring/ranking methods can also leverage this function to efficiently calculate the metrics at different time points (batches).