Probabilistic Inference Evaluation, UAI'08
Call for Participation
Update 5/26: Example problem instances are now available
(Download this document as PDF)
Scope
The evaluation will include both Bayesian and Markov networks and consider three inference tasks:
PR: probability of evidence (partition function),
MPE: most probable explanation (energy minimization), and
MAR: node marginals.
The evaluation will consider both exact and approximate algorithms. Hence, each solver will have to declare the task it solves and whether it is exact or approximate (six different types of solvers). A formal definition of the three tasks is given at the end of this document. A team can submit multiple solvers per task as long as the solvers are based on different algorithmic principles.
Evaluation Format
Our goal is to evaluate efficiency and approximation accuracy. Each solver is given problem instances consisting of:
a Bayesian or Markov network M;
a piece of evidence e.
A solver is expected to provide a solution (an exact solution or an approximation) for each problem instance, given constraints on time and space resources. A solver will also be given a set of time checkpoints ti , for i = 1,..., n. At each time point, the solver will be polled for solutions. If a solver does not produce a solution by the last checkpoint tn , it would be considered as having failed on the problem instance. Producing solutions at other checkpoints ti , i < n, is optional and is meant for evaluating solvers that can generate improved approximations over time.
At completion, the organizers will generate efficiency/accuracy curves which will form a basis of evaluation.
Execution Environment
The evaluations will be run on an Intel machine running Linux with up to 4 GB memory made available (some memory will be reserved for evaluation scripts, etc.). We request that solvers be 32-bit executables, using only a single core (i.e., no parallel processes/threads).
If a solver does not stop gracefully when approaching or exceeding the given time and space limits, the organizers shall kill the offending process and accept the final solution appearing in the output.
Input Format
Each solver will receive as input a model, some evidence, and a seed:
./solver input-model input-evidence seed
We ask randomized algorithms to use an input seed, so that two runs of a solver, given the same input, will yield the same output in a comparable amount of time. Deterministic algorithms may ignore the seed.
The input model will be a Bayesian or a Markov network specified in a simple text format:
FileFormat (now with example instances)
The evidence (observations) will be specified by the accompanying file format (see link above).
A Markov network is simply a set of factors, while a Bayesian network is a set of factors with some additional properties (see the formal definitions at the end of this document). Solvers need to only handle Markov networks. However, they may optionally take advantage of the extra properties of a Bayesian network.
A solver will also have available the following environment variables:
UAI_TIME: total runtime limit (in seconds, e.g., "600");
UAI_MEMORY: total memory limit (in gigabytes, e.g., "2.5");
UAI_CHECKPOINTS: an array of time checkpoints ti when solutions are polled (in seconds, e.g., "30 60 120 240 480").
Solver exit status will be recorded, treating 0 as normal and 1 as an error. Processes exceeding time or memory limits will be killed (via a SIGKILL signal), and the final output solution will be taken as the solver solution.
Output Format
Solver output to stdout and stderr will be logged, for debugging purposes. Output to stdout will be further parsed for solutions.
Solvers will specify a solution on a line, which is composed of a prefix followed by the solution.
Probability of Evidence, PR: Line prefixed by "z " followed by the value of the log10 of the probability of evidence (partition function). For example, an approximation log10 Pr(e) = −0.2008 may have a solution line:
z -0.2008
Most probable explanation, MPE: Line prefixed by "s " followed by:
log10 of the solution value,
the number n of model variables, and
the MPE instantiation, a list of value indices for all n variables.
s -0.2008 3 0 1 0
Marginals, MAR: Line prefixed by "m " followed by:
the number n of model variables, and
a list of marginal approximations for all n variables. Each marginal approximation is specified also by a list, starting with the number of variable values, followed by the approximation Pr(x|e) for each value x.
m 3 2 0.5 0.5 2 0.2 0.8 2 1.0 0.0
Along with the solution lines, which are prefixed by either "z ", "s ", or "m ", we request that each solver output its own measurement of the wall-clock time required to compute a solution, ignoring the time to load and parse the input file: (total time)-(load+parse time). In particular, each solver should output a a line prefixed by "t " followed by the time in seconds. For example, a solver taking a time of 2008s and 7ms would output:
t 2008.007
Note that the organizers will also measure total wall-clock time. No other output should appear on a solution line, and only solution lines should start with prefixes "z ", "s ", "m ", and "t ".
For each time checkpoint ti , our scripts will evaluate the last available output line, taking it as the solver solution for checkpoint ti . Again, a solver is not required to produce a solution for each checkpoint. Solvers may also ignore checkpoints by printing solutions periodically (at reasonable intervals). However, if a solver does not produce a solution by the last checkpoint, it would be considered as having failed on the instance.
Task Definitions
Notation
Every inference task is defined with respect to a set of discrete variables X = X1 ,..., Xn . In what follows, we will use capital letters (X) to denote variables, and small letters (x) to denote their values. We will also use boldface capital letters (X) to denotes sets of variables, and boldface small letters (x) to denote instantiations {X1 = x1 ,..., Xn = xn }. We will say that two variable instantiations y and z are compatible, written y ∼ z , if they agree on the value of every common variable in Y
Z . A factor fi (Xi), where Xi
X , is a function that maps each instantiation xi into a non-negative number, denoted fi (xi ).
Bayesian networks
Let G be a directed acyclic graph (DAG) over variables X = X1 ,..., Xn and let Ui be the parents of variable Xi in the DAG G . A Bayesian network over DAG G is a set of factors f1 (X1 U1) ,..., fn (Xn Un ) (one factor for each variable Xi ). Each factor fi (Xi Ui) satisfies a normalization condition, where for each instantiation ui of parents Ui , we have
.
A Bayesian network defines a joint factor
,
which is a product of the Bayesian network factors. An evidence e is an instantiation of some network variables E
X.
The following inference tasks are defined for a Bayesian network with joint factor f(X) and evidence e:
PR (probability of evidence): compute the probability
.
MPE (most probable explanation): compute the MPE instantiation
and the corresponding MPE probability
. MAR (node marginals): for each variable Xi and each of its values xi , compute the marginal probabilities:
.
Markov networks
A Markov network over variables X = X1 ,..., Xn is a set of m factors f1(X1) ,..., fm(Xm) , where Xa
X for a = 1 ,..., m. Similar to a Bayesian network, a Markov network defines a joint factor:
.
Given a joint factor and some evidence, the PR, MPE, and MAR problems are defined similarly for Markov networks, although the values for PR and MPE may not strictly be probabilities.
