Results for PR Solvers: Approximate Track
The following solvers participated in the Approximate Probability of Evidence evaluation:
- Sample-Search (UC Irvine)
- Accepts all network types
- VEC (UC Irvine)
- Accepts all network types
- The same solver was also evaluated in the Exact PR task.
- ED-BP (UCLA)
- Accepts all network types
- tlsbp (UPF/Radboud University)
- Accepts only networks with binary variables
As noted earlier, some benchmark sets were reduced (or eliminated altogether) due to time constraints in running all solvers.
Solvers were evaluated based on different measures:
- Cumulative Score vs Cumulative Time: compares the quality of the given approximations and the time it takes to compute them
- Instances vs Cumulative Error: illustrates the quality of approximations given over a benchmark
These measures are described in more detail below.
Cumulative Score vs Cumulative Time
These plots generalize the "Instances Solved vs Cumulative Time" plots used for the exact probability of evidence (PR) task. In this approximate PR task, each solver is given a score from zero to one, based on the accuracy of the given solution. If a solver finds the exact solution, it is credited with a score of one; if a solver outputs no solution, it is credited with a score of zero.
Each solver is given, per problem instance, a score 10-error where error can range from zero to infinity, depending on the measure of error used. Let
and
denote the exact and approximate probability of evidence, respectively. The error used to compute the score is the relative error of the log probability of evidence:
Example:
Let us assume that we have five problems in the evaluation and the time-bound is 60s. Let us assume that the scores and times for Solver "A" are: * Scores: [1,1,0.99,0.99,0.8] * Times: [10,10,60,60,60] Note that the maximum time a solver could run is 60s. The plot for cumulative score vs cumulative time for Solver "A" would have the following points. [(1,10),(2,20),(2.99,80),(3.98,140),(4.78,200)].
Plots are given below, in tabular form. Each row corresponds to a set of benchmarks, and each column corresponds to increasing bounds on time. For a given solver and time bound, instances are first sorted by decreasing score, and cumulative score is plotted on the x-axis against the cumulative time for those instances. Note that variance in time occurs because solvers which are both "anytime and complete" may terminate with an exact answer (score of 1) earlier than other solvers which do not terminate at all. Going from left-to-right on the x-axis, a point corresponds to the cumulative time incurred for the corresponding cumulative score of the solver's most accurately solved instances. Roughly, the further right a solver reaches (higher score), the better it is considered to perform.
Instances vs Cumulative Error
These plots illustrate the cumulative error incurred by solvers, over all instances in which the solver provided a solution. For a given solver and error measure, instances are first sorted by increasing error, and cumulative error is plotted against number of instances. Going from left-to-right on the x-axis, a point corresponds to the cumulative error incurred for the corresponding number of the solver's easiest instances. Note that plots use a logarithmic y-axis, so instances with zero error (on the left) and instances with infinite error (on the right) are not plotted. When a solver solves all instances exactly, it is denoted with a dot on the x-axis.
Plots for scores using different error measures are given below, in tabular form. Each row corresponds to a set of benchmarks, and each column corresponds to increasing limits on time. Roughly, the closer to the x-axis a solver is (more accurate), the better it is considered to perform.
