Results for MAR Solvers: Approximate Track
The following solvers participated in the Approximate Marginals evaluation:
- IJGP (UC Irvine)
- Accepts all network types
- Sample-Search (UC Irvine)
- Accepts all network types
- EPIS (UPitt)
- Accepts only Bayesian networks
- ED-BP (UCLA)
- Accepts all network types
- tlsbp (UPF/Radboud University)
- Accepts only networks with binary variables
As noted earlier, some benchmark sets were reduced (or eliminated altogether) due to time constraints in running all solvers.
Solvers were evaluated based on different measures:
- Cumulative Score vs Cumulative Time: compares the quality of the given approximations and the time it takes to compute them
- Instances vs Cumulative Error: illustrates the quality of approximations given over a benchmark
These measures are described in more detail below.
Cumulative Score vs Cumulative Time
These plots generalize the "Instances Solved vs Cumulative Time" plots used for the exact marginals task. In this approximate marginal task, each solver is given a score from zero to one, based on the accuracy of the given solution. If a solver finds the exact solution, it is credited with a score of one; if a solver outputs no solution, it is credited with a score of zero. For a given solver and time bound, instances are first sorted by decreasing score, and cumulative score is plotted on the x-axis against the cumulative time for those instances. Note that variance in time occurs because solvers which are both "anytime and complete" may terminate with an exact answer (score of 1) earlier than other solvers which do not terminate at all. Going from left-to-right on the x-axis, a point corresponds to the cumulative time incurred for the corresponding cumulative score of the solver's most accurately solved instances. Roughly, the further right a solver reaches (higher score), the better it is considered to perform.
Each solver is given, per problem instance, a score 10-error where error can range from zero to infinity, depending on the measure of error used. We considered 5 different ways to compute the error of a single variable's marginal, and considered 2 different ways to accumulate marginal errors over all variables. Let
and
denote the exact and approximate marginals, respectively, for a network variable
. The five measures of marginal errors are:
- mean squared error
- KL-divergence
- Hellinger's distance
- absolute error
- relative error
We computed the error for a given instance in two ways:
- average marginal error, over all variables
- sum marginal error, over all variables
Note that mean squared error, absolute error and Hellinger's distance yield errors in [0,1], and often yield average marginal errors close to 1. Thus, sum marginal error will tend to differentiate solvers more, but it makes large networks more difficult to score highly on.
Example:
Let us assume that we have five problems in the evaluation and the time-bound is 60s. Let us assume that the scores and times for Solver "A" are: * Scores: [1,1,0.99,0.99,0.8] * Times: [10,10,60,60,60] Note that the maximum time a solver could run is 60s. The plot for cumulative score vs cumulative time for Solver "A" would have the following points. [(1,10),(2,20),(2.99,80),(3.98,140),(4.78,200)].
The results using average relative error are given for all networks, only Bayesian networks and only binary networks. (Note that EPIS is Bayesian only and tlsbp is binary only).
|
|
|
For another example, here are the results using sum absolute error.
|
|
|
Plots for scores using different error measures are given below, in tabular form. Each row corresponds to a set of benchmarks, and each column corresponds to increasing limits on time. Roughly, the further right a solver reaches (higher score), and closer to the x-axis a solver is (faster), the better it is considered to perform.
Instances vs Cumulative Error
These plots illustrate the cumulative error incurred by solvers, over all instances in which the solver provided a solution. For a given solver and error measure, instances are first sorted by increasing error, and cumulative error is plotted against number of instances. Going from left-to-right on the x-axis, a point corresponds to the cumulative error incurred for the corresponding number of the solver's easiest instances. Note that plots use a logarithmic y-axis, so instances with zero error (on the left) and instances with infinite error (on the right) are not plotted. When a solver solves all instances exactly, it is denoted with a dot on the x-axis.
For example, the results using (cumulative) average relative error are given for all networks, only Bayesian networks and only binary networks. (Note again that EPIS is Bayesian only and tlsbp is binary only).
|
|
|
For another example, here are the results using (cumulative) sum absolute error.
|
|
|
Plots for scores using different error measures are given below, in tabular form. Each row corresponds to a set of benchmarks, and each column corresponds to increasing limits on time. Roughly, the closer to the x-axis a solver is (more accurate), the better it is considered to perform.
