# UAI'08 Workshop: Evaluating and Disseminating Probabilistic Reasoning Systems

** Date: July 9, 2008 - Co-located with UAI'08 **

**Update: the results of the Probablistic Inference Evaluation, presented at UAI'08, are now available here.**

The workshop will provide a forum for discussing issues arising in empirical evaluation and dissemination of probabilistic reasoning algorithms. It will also provide a framework for a *probabilistic reasoning evaluation* which will take place a month before the workshop and whose results will be part of the workshop discussion.

## MOTIVATION

Over the past two decades a variety of exact and approximate algorithms were developed across several communities (e.g. UAI, NIPS, SAT/CSPs) for answering optimization and likelihood queries over probabilistic graphical models. Since all these tasks are NP-hard, theoretical guarantees are rare and empirical evaluation becomes a central evaluation tool. Yet, the empirical comparison be- tween algorithms requires agreement on representations, benchmarks and evaluation criteria which is challenging, especially in the context of approximation algorithms.

Some communities have already addressed similar challenges through yearly empirical evaluations and competitions (e.g. SAT, CSP and planning) which proved effective, leading to algorithmic advances and to software development and dissemination. We believe that such an effort could benefit probabilistic inference algorithms as well. Probabilistic reasoning presents additional challenges, however, as it tends to be harder, requires heterogenous knowledge representation frameworks, and must deal with the issue of evaluating approximate inference algorithms.

## GOALS

Our goal is to establish some standards for evaluating probabilistic reasoning systems based on both exact and approximate algorithms that take the following issues into account:

- The set of benchmarks used in driving evaluations.
- The criterion used for evaluating the performance of inference algorithms. For approximate algorithms, this could include general-purpose performance measures that are based on a tradeoff between inference time and accuracy, or task-specific performance measures that are based on the final solutions enabled by inference algorithms.

On the dissemination side, the goal is to reinforce a tradition of building and sharing probabilistic reasoning systems that allows easy access to state-of-the-art inference algorithms by members of the broader scientific and engineering communities. This dissemination is meant to achieve a number of objectives:

- Increase the utilization of probabilistic inference algorithms in real-world applications by reducing the investment needed for building applications based on probabilistic reasoning.
- Allow newer members of the inference community to quickly capitalize on the expertise of more senior members of the community by providing broader access to existing code.
- Foster an environment where reported empirical results are accompanied by the very systems used to obtain them.

## FORMAT

The workshop will consist of paper and poster presentations, invited talks, panels, and system demonstrations. An *inference evaluation* will take place in the month preceding the workshop, with the results presented and discussed during the workshop.

## CALL FOR PAPERS

We welcome abstracts describing contributions as well as position papers which will be reviewed and selected for either plenary or poster presentations. Subjects of interest include (but are not limited to) evaluation criteria of probabilistic reasoning algorithms, whether domain specific or domain independent, especially on problems for which exact inference is not feasible; trading off accuracy with computational resources in real-world applications; descriptions of challenging benchmarks, whether real-world or synthetic, and their role in driving empirical evaluations; representations of graphical models (and factors) that are commonplace in certain domains (e.g., speech recognition); system descriptions and demonstrations.

Abstract submissions should not exceed 10 pages and must be in pdf format (plain text is acceptable for short abstracts).

## CALL FOR PARTICIPATION IN THE EVALUATION

We encourage participation in the probabilistic inference evaluation which will include both Bayesian and Markov networks and consider three inference tasks: probability of evidence (partition function), most probable explanations (also called MPE, MAP or energy minimization), and node marginals. The evaluation will consider both exact and approximate algorithms, especially any-time algo- rithms that improve their approximations with time. Details of the evaluation can be found at:

## CALL FOR BENCHMARKS

We encourage the submission of benchmarks in the form of either Bayesian or Markov networks. The preferred file format is described at:

Other formats may potentially be acceptable, yet the evaluation will assume the format above.

## DATES

Submissions should be emailed to pc-chairs@ics.uci.edu by the following deadlines:

*Abstracts.*Submission deadline is May 9. Notification to authors will be sent out on May 19.*Evaluation.*Interests in participation should be declared by May 9. Submission deadline of software systems is May 30.*Benchmarks.*Submission deadline is May 9.

## ORGANIZERS

Fahiem Bacchus: http://www.cs.toronto.edu/~fbacchus/

Jeff Bilmes: http://ssli.ee.washington.edu/people/bilmes/

(organizer) Adnan Darwiche: http://www.cs.ucla.edu/~darwiche/

(organizer) Rina Dechter: http://www.ics.uci.edu/~dechter/

Hector Geffner: http://www.tecn.upf.es/~hgeffner/

Alexander Ihler: http://www.ics.uci.edu/~ihler/

Joris Mooij: http://www.jorismooij.nl/

Kevin Murphy: http://www.cs.ubc.ca/~murphyk/