7/14/2006UAI06-Inference EvaluationPage 1 UAI 2006 report from the 1 st Evaluation of Probabilistic...

7/14/2006 UAI06-Inference Evaluation

UAI 2006report from the

1st Evaluation of Probabilistic InferenceJuly 14th, 2006

B,C,D,FF,D,G

D,F


What is this presentation about?

• Goal: The purpose of this evaluation is to compare the performance of a variety of different software systems on a single set of Bayesian network (BN) problems.

• By creating a friendly evaluation (as is often done in other communities such as SAT, and also in speech recognition and machine translation with their DARPA evaluations), we hope to foster new research in fast inference methods for performing a variety of queries in graphical models.

• Over the past few months, the 1st such an evaluation took place at UAI.

• This presentation summarizes the outcome of this evaluation.


Who we are

• Evaluators– Jeff Bilmes – University of Washington, Seattle– Rina Dechter – University of California, Irvine

• Graduate Student Assistance – Chris Bartels – University of Washington, Seattle– Radu Marinescu – University of CA, Irvine– Karim Filali – University of Washington, Seattle

• Advisory Council– Dan Geiger -- Technion - Israel Institute of Technology – Faheim Bacchus – University of Toronto– Kevin Murphy – University of British Columbia


Outline

• Background, goals.• Scope (rational)• Final chosen queries• The UAI 2006 BN benchmark evaluation corpus • Scoring strategies• Participants and team members• Results for PE and MPE• Team presentations

– team1 (UCLA), team2 (IET), team3 (UBC), team4 (U. Pitt/DSL), team 5 (UCI)

• Conclusion/Open discussion


Acknowledgements: Graduate Student Help

Radu Marinescu,University of CA, Irvine

Chris Bartels,University of Washington

Karim Filali,University of Washington

Also, thanks to another U. Washington Student, Mukund Narasimhan (now at MSR)


Background

• Early 2005: Rina Dechter & Dan Geiger decide there should be some form of UAI inference evaluation (like in the SAT community) and discuss the idea (by email) with Adnan Darwiche, Faheim Bacchus, Hector Geffner, Nir Friedman, Thomas Richardson.

• I (Jeff Bilmes) take on the task to run it this first time.– Speech recognition and DARPA evaluations

• evaluation of ASR systems using error rate as a metric.


Scope

• Many “queries” could be evaluated including:– MAP – maximal a posteriori hypothesis– MPE – most probable explanation (also called Viterbi assignment)– PE – probability of evidence– N-best – compute the N-best of the above

• Many algorithmic variants– Exact inference– Enforced limited time-bounds and/or space bounds– Approximate inference, and tradeoffs between time/space/accuracy

• Classes of models– Static BNs with a generic description (list of CPTs)– More complex description language (e.g., context specific indep.)– Static models vs. Dynamic models (e.g., Dynamic Bayesian

Networks, and DGMs) vs. relational models


Decisions for this first evaluation.

• Emphasis: Keep things simple.• Focus on exact inference

– exact inference can still be useful. – “Exact inference is NP-complete, so we perform approximate

inference” is often seen in the literature– With smart algorithms, and for fixed (but real-world) problem sizes,

exact is quite doable and can be better for applications.

• Focus on small number of queries:• Original plan: PE, MPE, and MAP for both static and dynamic

models• From final participants list, narrowed this down to:

PE and MPE on static Bayesian networks


Query: Probability of Evidence (PE)


Query: Most Probable Explanation (MPE)


The UAI06 BN Evaluation Corpus

• J=78 BNs used for PE, and J=57 BNs used for MPE queries. The BNs were not exactly the same.

• BNs were the following (more details will appear on web page):– random mutations of the burglar alarm graph– diagnosis network (Druzdzel)– DBNs from speech recognition that were unrolled a fixed amount. – Variations on the Water DBN – Various forms of grids– Variations on the ISCAS 85 electrical circuit– Variations on the ISCAS 89 electrical circuit– Various genetic linkage graphs (Geiger)– BNs from computer-based patient care system (cpcs)– Various randomly generated graphs (F. Cozman’s alg).– Various known-tree-width random k-trees, with determinism (k=24)– Various known-tree-width random positive k-trees, (k=24)– Various linear block coding graphs.

• While some of these have been seen before, BNs were “anonymized” before being distributed.

• BNs distributed in xbif format (basically XML)


Timing Platform and Limits

• Timing machines: dual-CPU 3.8GHz Pentium Xeons with 8Gb of RAM each, with hyper-threading turned on.

• Single threaded performance only in this evaluation.• Each team had 4 days of dedicated machine usage to

complete there timings (other than this, there was no upper time bound).

• No-one asked for more time than these 4 days -- after timing the BNs, teams could use rest of 4 days as they wish for further tuning. After final numbers were sent to me, no further adjustment of timing numbers have taken place (say based on seeing other’s results).

• Each timing number was the result of running a query 10 times, and then reporting the fastest (lowest) time.


The Teams

• Thanks to every member of every team: Each member of every team was crucial to making this a successful event!!


Team 1: UCLA

•David Allen (now at HRL Labs, CA)•Mark Chavira (graduate student)•Adnan Darwiche

• Keith Cascio• Arthur Choi (graduate student)• Jinbo Huang (now at NICTA, Australia)


Team 2: IET

From right to left in photo: • Masami Takikawa• Hans Dettmar• Francis Fung• Rick Kissh

Other team members:•Stephen Cannon •Chad Bisk•Brandon Goldfedder

Other key contributors:• Bruce D'Ambrosio• Kathy Laskey• Ed Wright• Suzanne Mahoney• Charles Twardy• Tod Levitt


Team 3: UBC

Jacek Kisynski,University of British Columbia

Michael Chiang ,University of British Columbia

David Poole,University of British Columbia


Team 4: U. Pittsburgh, DSL

Tomasz Sowinski,University of Pittsburgh, DSL Marek J. Druzdzel,

University of Pittsburgh, DSL

UAI Software Competition

Team 4: Decision Systems LaboratoryTeam 4: Decision Systems Laboratory


Team 5: UCI

Robert Mateescu ,University of CA, Irvine

Rina Dechter,University of CA, IrvineRadu Marinescu,

University of CA, Irvine


The Results


Definition Correct Answer


Definition of “FAIL”

• Each team had 4 days to complete the evaluation• No time-limit placed on any particular BN.• A “FAILED” score meant that either the system failed to

complete the query, or that the system underflowed their own numeric precision.

– some of the networks were designed not to fit within IEEE 64-bit double precision, so either scaling or log-arithmetic needed to be used (which is a speed hit for PE).

• Teams had the option to submit multiple separate submissions, none did.

• Systems were allowed to “backoff” from no-scaling to, say, a log-arithmetic mode (but that was included in the time charge)


Definition of Average Speedup


Results – PE and MPE

78 PE BNs, and 57 MPE BNs

1. Failure rates – number of networks that each team failed to produce a score or a correct score (including underflow)

2. Speedup results on categorized BNs– a BN is categorized based on how many teams failed to produce a

score (so 0-5 for PE, or 0-3 for MPE)– Average speedup of the best performance over a particular team’s

performance on each category.

3. Rank scores– Number of times a particular team was rank n for various n

4. Workload scores– In what workload region (if any) is a particular team optimal.


PE Results


PE Failure Rate Results (low is best)

0

10.26

19.23

14.1

19.23

0

2

4

6

8

10

12

14

16

18

20F

ailu

re R

ate

(per

cen

t)

Reminder: 78 BNs total for PE


Avg. Speedups: BNs that ran on all 5 systems

85.59

52.44

77.33

1.02

17.15

0

10

20

30

40

50

60

70

80

90S

pee

du

p o

f fa

stes

t o

ver

team

j

61 out of 78 BNs ran on all systems.

Remember: lower is better!


Avg. Speedups: BNs that ran on all systems

-1

-0.5

0

0.5

1

1.5

2

2.5

3

Sp

eed

up

sta

ts (

log

10)

Team 1Team 2

Team 3Team 4

Team 5

Average Std Min Max


Avg. Speedups: BNs that ran on (3-4)/5 systems

4.93

12.95

1.851

18.42

0

2

4

6

8

10

12

14

16

18

20S

pee

du

p o

f fa

stes

t o

ver

team

j

8 out of 78 BNs ran on only 3 or 4 systems.



0

10

20

30

40

50

60

Sp

eed

up

sta

ts

Team 1Team 2

Team 3Team 4

Team 5

Average Std Min Max



1

11.28

0

2

4

6

8

10

12S

pee

du

p o

f fa

stes

t o

ver

team

j

9 out of 78 BNs ran on only 1 or 2 systems.

Only 2 teams (team 1 and 2) had systems that could run this category BN.

These include the genetic linkage BNs.


Rank Proportions (how often was each team a particular rank, rank 1 is best)

0%

20%

40%

60%

80%

100%

Ran

k P

rop

ort

ion

s

Team 1Team 2

Team 3Team 4

Team 5

Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Fail

7/14/2006

UAI06-Inference Evaluation

33

Team 1 Team 2 Team 3 Team 4 Team 5 Team 1 Team 2 Team 3 Team 4 Team 5

BN_0 Alarm graph 0.476 1.680 0.400 0.010 0.020 0.17700 0.07000 0.17000 0.00065 0.02000

BN_1 0.548 2.130 0.520 0.010 0.040 0.25500 0.07700 0.29000 0.00133 0.03000BN_2 0.507 2.160 0.500 0.010 0.010 0.17500 0.06800 0.22000 0.00065 0.01000BN_3 0.484 3.020 0.490 0.010 0.020 0.19900 0.09500 0.21000 0.00144 0.02000BN_4 0.564 2.220 0.690 0.010 0.020 0.26700 0.10600 0.36000 0.00097 0.02000BN_5 0.623 2.490 0.670 0.010 0.060 0.26400 0.12500 0.32000 0.00217 0.05000BN_6 0.785 2.970 0.820 0.010 0.070 0.36400 0.16000 0.41000 0.00241 0.06000BN_7 0.781 3.000 0.810 0.010 0.140 0.44300 0.11100 0.53000 0.00392 0.13000BN_8 0.675 1.780 0.540 0.010 0.080 0.36600 0.09000 0.28000 0.00278 0.08000BN_9 0.876 3.200 0.530 0.010 0.170 0.56800 0.10000 0.27000 0.00568 0.17000BN_10 0.669 1.610 0.540 0.010 0.060 0.37400 0.09100 0.29000 0.00200 0.06000BN_11 0.916 2.070 0.690 0.010 0.220 0.55700 0.14600 0.37000 0.00808 0.21000BN_12 0.636 2.540 0.780 0.010 0.200 0.30100 0.11000 0.51000 0.00438 0.20000BN_13 0.628 2.240 0.790 0.020 0.120 0.28800 0.11700 0.47000 0.01415 0.11000BN_14 0.979 3.580 1.780 0.100 1.790 0.64700 0.19800 1.45000 0.09400 1.79000BN_15 0.908 2.370 0.920 0.060 1.270 0.55400 0.16300 0.57000 0.05521 1.26000

BN_16 Diagnosis 1.581 4.430 1.110 0.070 0.170 0.43600 0.32300 0.16000 0.00225 0.02000

BN_18 1.583 3.590 1.050 0.070 0.170 0.45800 0.29600 0.12000 0.00229 0.02000

BN_20 Speech 66.103 28.300 56.910 FAIL 467.440 60.07600 20.53900 46.81000 FAIL 463.24000

BN_22 recognition 8.070 11.110 11.710 1.960 13.640 2.72100 5.40400 4.07000 0.72258 10.44000BN_24 7.030 8.290 9.680 1.420 7.340 2.44000 2.36000 3.50000 0.34183 4.50000BN_26 52.795 16.630 19.850 FAIL 123.120 47.56100 8.35000 11.84000 FAIL 119.23000

BN_28 Water DBN 3.723 3.620 3.630 0.720 1.620 0.22400 0.09400 0.41000 0.02208 0.21000

BN_30 Grids 2.744 4.630 FAIL 0.420 FAIL 1.86700 2.21700 FAIL 0.39248 FAILBN_32 3.192 6.350 FAIL 0.520 FAIL 2.25000 3.99400 FAIL 0.47857 FAILBN_34 3.158 30.190 FAIL 0.490 FAIL 2.21000 26.72500 FAIL 0.45130 FAILBN_36 3.124 5.470 FAIL 0.100 FAIL 2.14500 3.47500 FAIL 0.39153 FAILBN_38 3.042 6.850 FAIL 0.590 FAIL 2.06600 4.29300 FAIL 0.55135 FAILBN_40 3.074 5.810 FAIL 0.320 FAIL 2.07200 3.36700 FAIL 0.28734 FAIL

WALL TIME INFERENCE TIMEAnother look at PE:

7/14/2006


34

Team 1 Team 2 Team 3 Team 4 Team 5 Team 1 Team 2 Team 3 Team 4 Team 5

BN_42 iscas85 1.296 2.960 0.950 0.020 0.130 0.59800 0.48400 0.34000 0.00738 0.11000

BN_43 1.212 3.040 0.990 0.030 0.100 0.53700 0.48600 0.41000 0.00707 0.08000BN_44 1.608 2.890 1.060 0.130 0.300 0.91200 0.52700 0.46000 0.10763 0.27000BN_45 0.981 3.030 0.680 0.020 0.050 0.34400 0.32700 0.14000 0.00217 0.03000BN_46 1.192 2.880 1.210 0.160 1.370 0.63800 0.43600 0.71000 0.15363 1.36000

BN_47 iscas89 1.373 3.480 2.170 0.190 0.750 0.70000 0.55200 1.52000 0.17385 0.74000

BN_49 1.325 3.410 9.120 0.420 0.650 0.64200 0.52800 8.36000 0.40277 0.63000BN_51 1.056 2.960 1.240 0.030 0.200 0.44100 0.50400 0.68000 0.01871 0.18000BN_53 0.834 2.740 0.630 0.010 0.040 0.21000 0.26000 0.16000 0.00104 0.02000BN_55 1.161 2.180 1.450 0.140 0.720 0.55700 0.47200 0.88000 0.12664 0.70000BN_57 0.817 2.950 0.630 0.020 0.020 0.21000 0.25400 0.12000 0.00097 0.01000BN_59 0.748 2.470 0.580 0.010 0.020 0.17700 0.27100 0.07000 0.00067 0.01000BN_61 1.100 2.700 1.390 0.100 0.100 0.41600 0.47200 0.78000 0.08385 0.08000BN_63 0.974 2.330 1.010 0.030 0.260 0.36700 0.42900 0.48000 0.02013 0.25000BN_65 0.739 2.490 0.630 0.010 0.030 0.24200 0.24500 0.19000 0.00327 0.02000BN_67 0.938 2.630 0.910 0.040 0.370 0.47600 0.33000 0.46000 0.02433 0.36000

BN_69 Genetic 5.103 FAIL FAIL FAIL FAIL 3.99500 FAIL FAIL FAIL FAIL

BN_70 linkage 8.069 75.300 FAIL FAIL FAIL 6.38800 72.02900 FAIL FAIL FAILBN_71 8.844 FAIL FAIL FAIL FAIL 7.27100 FAIL FAIL FAIL FAILBN_72 16.581 FAIL FAIL FAIL FAIL 14.79000 FAIL FAIL FAIL FAILBN_73 16.634 FAIL FAIL FAIL FAIL 15.00800 FAIL FAIL FAIL FAILBN_74 10.532 FAIL FAIL FAIL FAIL 9.58500 FAIL FAIL FAIL FAILBN_75 25.656 FAIL FAIL FAIL FAIL 24.10700 FAIL FAIL FAIL FAILBN_76 31.216 FAIL FAIL FAIL FAIL 29.22600 FAIL FAIL FAIL FAILBN_77 59.385 FAIL FAIL FAIL FAIL 57.87100 FAIL FAIL FAIL FAIL

WALL TIME INFERENCE TIME

7/14/2006


35

Team 1 Team 2 Team 3 Team 4 Team 5 Team 1 Team 2 Team 3 Team 4 Team 5BN_78 CPCS 0.409 1.520 0.350 0.010 0.010 0.14800 0.04900 0.12000 0.00044 0.01000BN_80 0.903 1.740 0.970 0.040 0.080 0.19200 0.12800 0.22000 0.00112 0.02000BN_82 0.910 2.080 1.150 0.050 0.090 0.25200 0.12300 0.29000 0.00116 0.03000BN_84 1.004 1.940 1.070 0.040 0.090 0.29500 0.15700 0.34000 0.00162 0.03000BN_86 2.397 4.520 3.220 0.780 1.790 0.47400 0.32800 0.45000 0.04229 0.67000BN_88 2.590 4.470 3.330 0.760 1.820 0.47000 0.31100 0.46000 0.02511 0.74000BN_90 2.621 4.370 3.210 0.780 1.800 0.49300 0.30800 0.40000 0.04246 0.74000BN_92 2.691 4.260 3.300 0.800 1.870 0.52900 0.27700 0.54000 0.07392 0.77000BN_94 Random 0.754 1.550 0.730 0.040 0.090 0.11500 0.03400 0.05000 0.00040 0.01000BN_96 graphs 0.603 1.570 0.880 0.020 0.090 0.19200 0.08200 0.48000 0.00472 0.07000BN_98 0.771 1.590 0.940 0.050 0.140 0.19300 0.07800 0.23000 0.00435 0.05000BN_100 1.089 1.630 1.040 0.060 0.270 0.49200 0.09900 0.32000 0.01411 0.18000BN_102 1.748 2.020 3.190 0.330 1.700 0.62000 0.15900 1.64000 0.10794 1.27000

BN_104 k-trees 15.708 21.630 14.550 4.020 5.930 0.34800 0.86300 0.19000 0.00699 0.34000

BN_106 (k=24) 11.097 18.500 20.170 3.650 6.880 0.54900 0.87300 8.53000 0.38662 0.19000BN_108 determinism 9.527 17.080 13.390 3.300 5.670 0.48900 0.87000 3.00000 0.42090 0.87000BN_110 7.704 14.930 11.250 2.790 4.120 0.40000 0.69100 1.62000 0.17202 0.73000BN_112 10.310 16.790 19.390 3.430 6.990 0.57100 0.84000 8.69000 0.44628 0.40000

BN_114 k-trees 5.345 7.550 7.620 1.880 6.480 1.04400 0.49500 1.96000 0.14516 3.50000

BN_116 (k=24) 8.592 12.350 10.040 3.140 5.790 0.63700 0.60200 0.38000 0.01697 0.58000BN_118 positive 4.928 8.790 6.630 1.920 4.290 0.41500 0.42700 0.76000 0.04710 1.04000BN_120 5.457 8.220 7.380 2.020 5.060 0.78500 0.44400 1.06000 0.06795 1.67000BN_122 6.891 12.020 8.890 2.580 5.440 0.68400 0.52500 0.84000 0.04319 1.22000BN_124 4.297 8.020 6.810 1.700 4.340 0.43400 0.39900 1.44000 0.07180 1.54000



MPE Results

• Only three teams participated:– Team 1– Team 2– Team 5

• 57 BNs, not the same ones, but some are variations of the same original BN.


MPE Failure Rate Results

0

38.6

7.01

0

5

10

15

20

25

30

35

40

Fai

lure

Rat

e (p

erce

nt)

Team 1Team 2

Team 5

57 B

Ns

tota

l


MPE Avg. Speedups: BNs that ran on all 3 systems

5.38

2.75

7.37

0

1

2

3

4

5

6

7

8S

pee

du

p o

f fa

stes

t o

ver

team

j

31 out of 57 BNs ran on all systems.


MPE Avg. Speedups: BNs that ran on all 3 systems

0

10

20

30

40

50

60

70

80

90

Sp

eed

up

sta

ts

Team 1Team 2

Team 5

Average Std Min Max31 out of 57 BNs ran on all systems.


6.6

1.2

47.46

0

5

10

15

20

25

30

35

40

45

50S

pee

du

p o

f fa

stes

t o

ver

team

j

26 out of 57 BNs ran on 2 systems.

MPE Avg. Speedups: BNs that ran on 2/3 systems


MPE Avg. Speedups: BNs that ran on 2/3 systems

0

100

200

300

400

500

600

700

800

Sp

eed

up

sta

ts

Team 1Team 2

Team 5

Average Std Min Max


Rank Proportions (how often was each team a particular rank, rank 1 is best)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Ran

k P

rop

ort

ion

s

Team 1Team 2

Team 5

Rank 1 Rank 2 Rank 3 Fail

7/14/2006 UAI06-Inference Evaluation 43

team 1 team 2 team 5 team 1 team 2 team 5BN_17 46.193 FAIL 0.79 44.988 FAIL 0.75BN_19 47.436 FAIL 0.72 46.259 FAIL 0.68 Diagnosis graphBN_21 34.689 24.82 u/flow 28.182 17.66 u/floeBN_23 9.514 12.46 u/flow 4.389 5.857 u/flowBN_25 8.284 8.47 u/flow 3.765 2.951 u/flowBN_27 10.963 16.76 u/flow 5.688 8.843 u/flow Speech recognitionBN_29 3.324 4.07 1.54 0.4 0.153 0.17 Water DBNBN_31 2.951 3.31 74.47 2.073 0.835 74.46BN_33 3.253 21.98 38.76 2.261 19.551 38.76BN_35 3.192 5.1 41.81 2.203 2.243 41.8BN_37 3.263 4.73 12.39 2.296 2.895 12.38BN_39 3.254 7.84 137.24 2.265 5.158 137.23BN_41 3.08 4.58 21.7 2.097 2.29 21.68 GridsBN_48 1.216 FAIL 0.25 0.559 FAIL 0.25BN_50 1.36 FAIL 0.7 0.673 FAIL 0.69BN_52 1.165 FAIL 0.85 0.488 FAIL 0.84BN_54 0.818 FAIL 8.36 0.203 FAIL 8.35BN_56 1.133 FAIL 97.05 0.513 FAIL 97.04BN_58 0.81 FAIL 0.78 0.205 FAIL 0.77BN_60 0.758 FAIL 118.39 0.168 FAIL 118.38BN_62 1.107 FAIL 5.01 0.454 FAIL 5BN_64 0.925 FAIL 16.7 0.375 FAIL 16.69BN_66 0.693 FAIL 4.81 0.233 FAIL 4.81BN_68 1.047 FAIL 0.43 0.54 FAIL 0.42 ISCAS 85BN_79 0.462 2.15 0.01 0.173 0.056 0.01BN_81 1.481 3.65 0.11 0.814 0.513 0.05BN_83 1.687 2.85 0.13 1.003 0.544 0.07BN_85 2.067 2.91 0.19 1.403 0.514 0.13BN_87 9.019 5.9 1.42 6.415 1.21 0.33BN_89 3.523 4.94 1.43 1.254 0.644 0.32BN_91 8.531 5.61 1.39 6.28 1.183 0.32BN_93 5.016 5.42 1.67 2.958 0.902 0.57 CPCSBN_95 1.789 2.7 0.27 0.966 0.541 0.19BN_97 1.896 2.56 0.27 1.433 0.396 0.25BN_99 6.213 3.06 1.01 5.607 1.021 0.93BN_101 1.648 2.37 0.19 1.087 0.259 0.1BN_103 3.257 3.47 0.94 2.226 1 0.52 RandomBN_105 14.65 22.23 6.88 0.473 1.108 0.17BN_107 37.053 19.16 6.25 0.992 1.929 0.82BN_109 9.782 16.33 5.64 0.774 1.206 0.88BN_111 8.319 14.94 4.96 0.606 1.002 0.53BN_113 10.093 17.3 5.69 0.835 1.516 0.96 k-trees w determinismBN_115 6.955 7.69 4.5 2.438 1.005 1.38BN_117 8.57 12.71 5.88 0.888 0.738 0.6BN_119 5.616 8.49 4.06 1.099 0.607 0.88BN_121 6.36 8.05 4.2 1.168 0.681 0.84BN_123 6.958 11.52 5.07 0.925 0.674 0.67BN_125 4.866 8.29 3.6 1.114 0.589 0.81 k-trees positiveBN_126 30.904 FAIL 30.12 30.276 FAIL 30.11BN_127 23.6 FAIL 111.65 22.979 FAIL 111.64BN_128 8.306 FAIL 2.07 7.713 FAIL 2.07BN_129 42.683 FAIL 147.32 42.106 FAIL 147.32BN_130 37.685 FAIL 13.59 37.071 FAIL 13.58BN_131 10.146 FAIL 16.47 9.577 FAIL 16.46BN_132 78.846 FAIL 164.41 78.243 FAIL 164.4BN_133 24.212 FAIL 9.86 23.6 FAIL 9.86BN_134 21.243 FAIL 130.48 20.655 FAIL 130.48 Coding


Raw

MP

E R

esul

ts


PE and MPE Results


Workload Scores: PE and MPE


Workload Scores and Linear Programming


Workload Scores: PE and MPE

• So each team is a winner, it depends on the workload.

• Could attempt further to rank teams based on volume of workload region where a team wins.

• Which measure, however, should we on the simplex, uniform? Why not something else.

• “A Bayesian approach to performance ranking”– UAI does system performance measures …


Team technical descriptions

• 5 minute for each team.• Current plan: more details to ultimately appear on the

inference evaluation web site (see main UAI page).


Team 1: UCLA Technical Description

• presented by Adnan Darwiche


Team UCLA Performance summary:

Solved all 78 P(e) networks in 319s: about 4s per instance Solved all 57 MPE networks in 466s: about 8s per instance

MPE approach Prune network If network has treewidth 25 or less, run RC Else if network has enough local structure, run Ace Else run BnB/Ace

P(e) approach Prune network If network has genetic net characteristics, run RC_Link Else if network has treewidth 25 or less, run RC Else run Ace

Approach is powerful enough to solve every network in every suite. Yet, it incurs a fixed overhead that disadvantages it on easy networks


RC and RC Link Recursive Conditioning

Conditioning/Search algorithm Based on decomposing the network Inference exponential in treewidth VE/Jointree could have been used for

this!

RC Link RC with local structure exploitation Not necessarily exponential in treewith Download:

http://reasoning.cs.ucla.edu/rc_link


Ace

Compiles BN to arithmetic circuit Reduce to logical inference Strength: Local Structure

(determinism & CSI) Strength: online inference Inference not exponential in

treewidth http://reasoning.cs.ucla.edu/ace


Branch & Bound

Approximate network by deleting edges to provides an upper bound on MPE

Compile the network using Ace and use to drive search

Use belief propagation to construct seed a static variable order for each variable, an ordering on

values


Team 2: IET Technical Description

• presented by Masami Takikawa

Basic SPI Algorithm

Collect factors

Order nodes

d-separation, barren nodes removal & evidence propagation

Minimum weight heuristic

Multiplication

Summation

Was able to solve 59 out of 78 challenges.BN

factors

Repeat these steps until all variables are eliminated

UAI2006 BN Engine Eval. Copyright©, IET, 2006.

#BNs MaxWeight

4 <=100

2 <=1,000

20 <=10,000

16 <=100,000

13 <=1,000,000

4 <=2,000,000

Team 2Team 2

Extensions (aka additional overhead)

Collect factors

Order nodes

Time-slice ordering if DBN

Multiplication

Normalization

factored graph

factors

Summation

Intra-node factorizationBN

Needed to avoid underflow for BN_20-26.

BN Without INF With INF

BN_30 130M (27) 66K (16)

BN_32 17G (34) 260K (18)

BN_34 1.1G (30) 3.1M (21)

BN_36 4.3G (32) 130K (17)

BN_38 4.3G (32) 520K (19)

BN_40 1.1G (30) 130K (17)

BN Min-Weight Time-slice

BN_70 8.3E19 (56) 9.0E7 (21)

BN_72 2.9E21 (46) 3.5E11 (38)

BN_73 6.4E22 (51) 4.3E9 (26)

BN_75 5.5E18 (43) 1.7E10 (34)

BN_76 1.9E20 (35) 1.7E11 (24)

UAI2006 BN Engine Eval. Copyright©, IET, 2006.

Max Weight (Max

Width)

Max Weight (Max

Width)

Solved additional 11 challenges.


Team 3: UBC Technical Description

• presented by David Poole


Variable Elimination Codeby David Poole and Jacek Kisyński

•This is an implementation of variable elimination in Java 1.5 (without threads).

•We wanted to test how well our base VE system that we were using compared with other systems.

•We use the min-fill heuristic for the elimination orderings and 2GB of memory.


The most interesting part of the implementation is in the representation of factors:

• A factor is essentially a list of variables, and a one-dimensional array of values.

• There is a total order of all variables and a total ordering of all values, which gives a canonical order of the values in a factor.

• We can multiply factors and sum out variables without doing random access to the values, but rather using the canonical ordering to enumerate the values.

• Multiplication is done lazily.

• We can sum out multiple variables at once (but this was not actually done for this competition).


• This code was written for David Poole and Nevin Lianwen Zhang, ``Exploiting contextual independence in probabilistic inference'', Journal of Artificial Intelligence Research,18, 263-313, 2003. http://www.jair.org/papers/paper1122.html

• This is also the code that is used in the CIspace belief and decision network applet. A new version of the applet will be released in July. See: http://www.cs.ubc.ca/labs/lci/CIspace/

• We plan to release the VE code as open source.


Team 4: U. Pitt/DSL Technical Description

• presented by Jeff Bilmes (Marek Druzdzel was unable to attend).


Decision Systems LaboratoryUniversity of Pittsburgh

[email protected]

http://dsl.sis.pitt.edu/

mailto:[email protected]




UAI-06 Software Evaluation

UAI Competition: Sources of speedupUAI Competition: Sources of speedup

1. Clustering algorithm at the foundation of the program [Lauritzen & Spiegelhalter] (Pr(E) as the normalizing factor).

2. Relevance reasoning, based on conditional independence [Dawid 1979, Geiger 1990], as structured in [Suermondt 1992] and [Druzdzel 1992], summarized in [Druzdzel & Suermondt 1994].

3. For very large models: Relevance-based Decomposition [Lin & Druzdzel 1997] and Relevance-based Incremental Belief Updating [Lin & Druzdzel 1999].

Full references are included in GeNIe on-line help, http://genie.sis.pitt.edu/.

Good theory (in addition to good implementation)

Good engineering (Tomek Sowinski).• Efficient and reliable implementation in C++ (SMILE)

• Tested by over eight years of both academic and industrial use

Relevance steps:

1. In p(E), focusing inference on the evidence nodes

2. Removal of barren nodes

3. Evidence absorption

4. Removal of nuisance nodes

5. Reuse of valid posteriors

http://genie.sis.pitt.edu/




bn_18 (diagnosis)

Relevance

Triangulation

Find Hosts

Init Potentials

Collect

Distribute

Other

Where our program spent the most time?Where our program spent the most time?

bn_22 (speech dbn)

Relevance

Triangulation

Find Hosts

Init Potentials

Collect

Distribute

Other

bn_82 (cpcs)

Relevance

Triangulation

Find Hosts

Init Potentials

Collect

Distribute

Other

bn_94 (random)

Relevance

Triangulation

Find Hosts

Init Potentials

Collect

Distribute

Other

38x

1,046x

1x

∞

Speedup due to relevance

Eas

y p

rob

lem

s

Har

d p

rob

lem

s


Model developer module: GeNIe.

Implemented in Visual C++ in Windows environment.

GeNIe

GeNIeRate

SMILE.NET

Wrappers: SMILE.NET jSMILE, Pocket SMILE

Allow SMILE to be accessed from applications other than C++compiler

jSMILE

Pocket SMILE

Broader context: GeNIe and SMILE

Broader context: GeNIe and SMILE

A developer’s environment for graphical decision models(http://genie.sis.pitt.edu/).

Reasoning engine: SMILE (Structural Modeling, Inference, and Learning Engine).

A platform independent library of C++ classes for graphical models.

SMILE

SMiner

Learning and discovery module: SMiner

Support for model building: ImaGeNIe

ImaGeNIe

Diagnosis: Diagnosis

Diagnosis

Qualitative interface: QGeNIe





UAI Competition: Sources of speedupUAI Competition: Sources of speedup

1. Clustering algorithm at the foundation of the program [Lauritzen & Spiegelhalter] (Pr(E) as the normalizing factor).

2. Relevance reasoning, based on conditional independence [Dawid 1979, Geiger et al, 1990], structured in [Suermondt 1992] and [Druzdzel 1992], summarized in [Druzdzel & Suermondt 1994].

3. For very large models: Relevance-based Decomposition [Lin & Druzdzel 1997] and Relevance-based Incremental Belief Updating [Lin & Druzdzel 1999].

Full references are included in GeNIe on-line help, http://genie.sis.pitt.edu/.

Good theory rather than engineering tricks

Top research programmer (Tomek Sowinski).

Efficient and reliable implementation

in C++ (SMILE).





Model developer module: GeNIe.

Implemented in Visual C++ in Windows environment.

GeNIe

GeNIeRate

SMILE.NET

Wrappers: SMILE.NET jSMILE, Pocket SMILE

Allow SMILE to be accessed from applications other than C++compiler

jSMILE

Pocket SMILE

Broader Context: GeNIe and SMILE

Broader Context: GeNIe and SMILE

A developer’s environment for graphical decision models(http://genie.sis.pitt.edu/).

Reasoning engine: SMILE (Structural Modeling, Inference, and Learning Engine).

A platform independent library of C++ classes for graphical models.

SMILE

SMiner

Learning and discovery module: SMiner

Support for model building: ImaGeNIe

ImaGeNIe

Diagnosis: Diagnosis

Diagnosis

Qualitative interface: QGeNIe





Team 5: UCI Technical Description

• presented by Rina Dechter

7/14/2006


70

PE & MPE – AND/OR Search

0

A

B

0

E C

0 1

D

0 1

D

0 1 0 1

1

E C

0 1

D

0 1

D

0 1 0 1

1

B

0

E C

0 1

D

0 1

D

0 1 0 1

1

E C

0 1

D

0 1

D

0 1 0 1

Bayesian network

A

D

B

CE

[ ]

[A]

[AB]

[BC]

[AB]

A

D

B C

E

0

A

B

0

E C

0 1

D

0 1

D

0 1 0 1

1

E C

0 1

D

0 1

D

0 1 0 1

1

B

0

E C

0 1 0 1

1

E C

0 1 0 1

Pseudo tree

AND/OR search tree Context minimal graph

7/14/2006


71

Adaptive caching

context(X) = [X1X2… Xk-i Xk-i+1… Xk]

i-bound < k

i-context(X) = [Xk-i+1…Xk] in conditioned subproblem

i-cache for X is purged for every new instantiation of Xk-i

X1

XK-i

X

XK

XK-i+1

X1

XK-i

X

XK

XK-i+1 ]

7/14/2006


72

PE solver - implementation

• C++ implementation• Caching based on context (table caching)

– Adaptive caching when contexts are too large

• Switch to variable elimination for small and nondeterministic problems

• Constraint propagation• No good learning – just caching no goods• Dynamic range support (for very small

probabilities)

7/14/2006


73

MPE solver - AOMB(i,j)

Node value v(n): most probable explanation of the sub-problem rooted by n

Caching: identical sub-problems rooted at AND nodes (identified by their contexts) are solved once and the results cached

j-bound (context size) controls the memory used for caching

Heuristics: pruning is based on heuristics estimates which are pre-computed by bounded inference (i.e. mini-bucket approximation)

i-bound (mini-bucket size) controls the accuracy of the heuristic

No constraint propagation

7/14/2006


74

AOMB(i,j) – Mini-Bucket Heuristics

• Each node n has a static heuristic estimate h(n) of v(n)

– h(n) is an upper bound on the value v(n)– h(h) is computed based on the augmented bucket

structure generated by the mini-bucket approximation MBE(i)

• For every node n in the AND/OR search graph:– lb(n) – current best solution cost rooted at n– ub(n) – upper bound on the most probable explanation at n– Prune the search space below current node t if ub(m) < lb(m), where m is an

ancestor of t along the current path from the root

• During search, merge nodes based on context (caching); maintain cache tables of size O(exp(j)), where j is a bound on the size of the context.

7/14/2006


75

AOMB(i,j) – Implementation• C++ implementation• B&B procedure is recursive

– Could be a bit faster if we simulate the stack• Cache tables implemented as hash tables• No ASM code or other optimizations• Static variable ordering determined by the min-fill

ordering (minimizes the context size)• Choosing (i,j) parameters:

– i-bound: choose i such that the augmented bucket structure generated by MBE(i) fits 2GB of RAM (i < 22)

– j-bound: j = i + 0.75*i (j < 30)• No Constraint propagation

7/14/2006


76

References

• 1. Rina Dechter and Robert Mateescu. AND/OR Search Spaces for Grahpical Models. In Artificial Intelligence, 2006. To appear.

• 2. Rina Dechter and Robert Mateescu. Mixtures of Deterministic-Probabilistic Networks and their AND/OR Search Space. In proceedings of UAI-04, Banff, Canada.

• 3. Robert Mateescu and Rina Dechter. AND/OR Cutset Conditioning. In proceedings of IJCAI-05, Edinburgh, Scotland.

• 4. Radu Marinescu and Rina Dechter. Memory Intensive Branch-and-Bound Search for Graphical Models. In proceedings of AAAI-06, Boston, USA.

• 5. Radu Marinescu and Rina Dechter. AND/OR Branch-and-Bound for Graphical Models“. In proceedings of IJCAI-05, Edinburgh, Scotland


Conclusions


Conclusions and Discussion

• Most teams said they had fun and it was a learning experience – people also became somewhat competitive

• Teams that used C++ (teams 4-5) arguably had faster times than those who used Java (teams 1-3).

• Use harder BNs and or harder queries next year– hard to find real-world BNs that are easily available but that are hard.

If you have a BN that is hard, please make it available for next year.– Regardless of who runs it next year, please send candidate networks

directly to me for now <[email protected]>

• Provide better resolution and standardized/unified timer (use IPM package).

• Have a dynamic category (needs to be more interest).• Have an approximate inference category, look at

time/space/accuracy tradeoffs

7/14/2006UAI06-Inference EvaluationPage 1 UAI 2006 report from the 1 st Evaluation of Probabilistic...

Documents

Transcript of 7/14/2006UAI06-Inference EvaluationPage 1 UAI 2006 report from the 1 st Evaluation of Probabilistic...