Results for MAR Solvers: Approximate Track

The following solvers participated in the Approximate Marginals evaluation:

As noted earlier, some benchmark sets were reduced (or eliminated altogether) due to time constraints in running all solvers.

Solvers were evaluated based on different measures:

These measures are described in more detail below.


Cumulative Score vs Cumulative Time

These plots generalize the "Instances Solved vs Cumulative Time" plots used for the exact marginals task. In this approximate marginal task, each solver is given a score from zero to one, based on the accuracy of the given solution. If a solver finds the exact solution, it is credited with a score of one; if a solver outputs no solution, it is credited with a score of zero. For a given solver and time bound, instances are first sorted by decreasing score, and cumulative score is plotted on the x-axis against the cumulative time for those instances. Note that variance in time occurs because solvers which are both "anytime and complete" may terminate with an exact answer (score of 1) earlier than other solvers which do not terminate at all. Going from left-to-right on the x-axis, a point corresponds to the cumulative time incurred for the corresponding cumulative score of the solver's most accurately solved instances. Roughly, the further right a solver reaches (higher score), the better it is considered to perform.

Each solver is given, per problem instance, a score 10-error where error can range from zero to infinity, depending on the measure of error used. We considered 5 different ways to compute the error of a single variable's marginal, and considered 2 different ways to accumulate marginal errors over all variables. Let

latex error! exitcode was 2 (signal 0), transscript follows:

and
latex error! exitcode was 2 (signal 0), transscript follows:

denote the exact and approximate marginals, respectively, for a network variable
latex error! exitcode was 2 (signal 0), transscript follows:

. The five measures of marginal errors are:

We computed the error for a given instance in two ways:

Note that mean squared error, absolute error and Hellinger's distance yield errors in [0,1], and often yield average marginal errors close to 1. Thus, sum marginal error will tend to differentiate solvers more, but it makes large networks more difficult to score highly on.

Example:

Let us assume that we have five problems in the evaluation and the time-bound is 60s. Let us assume that the scores and times for Solver "A" are:

* Scores: [1,1,0.99,0.99,0.8]
* Times: [10,10,60,60,60]

Note that the maximum time a solver could run is 60s.

The plot for cumulative score vs cumulative time for Solver "A" would have the following points.

[(1,10),(2,20),(2.99,80),(3.98,140),(4.78,200)].

The results using average relative error are given for all networks, only Bayesian networks and only binary networks. (Note that EPIS is Bayesian only and tlsbp is binary only).

avg-rel-score-all.png

avg-rel-score-bayes.png

avg-rel-score-bin.png

For another example, here are the results using sum absolute error.

sum-abs-score-all.png

sum-abs-score-bayes.png

sum-abs-score-bin.png

Plots for scores using different error measures are given below, in tabular form. Each row corresponds to a set of benchmarks, and each column corresponds to increasing limits on time. Roughly, the further right a solver reaches (higher score), and closer to the x-axis a solver is (faster), the better it is considered to perform.


Instances vs Cumulative Error

These plots illustrate the cumulative error incurred by solvers, over all instances in which the solver provided a solution. For a given solver and error measure, instances are first sorted by increasing error, and cumulative error is plotted against number of instances. Going from left-to-right on the x-axis, a point corresponds to the cumulative error incurred for the corresponding number of the solver's easiest instances. Note that plots use a logarithmic y-axis, so instances with zero error (on the left) and instances with infinite error (on the right) are not plotted. When a solver solves all instances exactly, it is denoted with a dot on the x-axis.

For example, the results using (cumulative) average relative error are given for all networks, only Bayesian networks and only binary networks. (Note again that EPIS is Bayesian only and tlsbp is binary only).

avg-rel-error-all.png

avg-rel-error-bayes.png

avg-rel-error-bin.png

For another example, here are the results using (cumulative) sum absolute error.

sum-abs-error-all.png

sum-abs-error-bayes.png

sum-abs-error-bin.png

Plots for scores using different error measures are given below, in tabular form. Each row corresponds to a set of benchmarks, and each column corresponds to increasing limits on time. Roughly, the closer to the x-axis a solver is (more accurate), the better it is considered to perform.


Evaluation/Report/ApproxMAR (last edited 2008-09-20 19:53:41 by vibhav)