Karras BVH

Post on 20-Jul-2016

254 views 2 download

Transcript of Karras BVH

Fast Parallel Construction of High-Quality Bounding Volume Hierarchies

Tero Karras

Timo Aila

Ray tracing comes in many flavors

2

Interactive apps

1M–100Mrays/frame

Architecture & design

100M–10Grays/frame

Movie production

10G–1Trays/frame

© Activision 2009, Game trailer by Blur Studio

Courtesy of Delta Tracing Lucasfilm Ltd.™, Digital work by ILM

Courtesy of Columbia Pictures

NVIDIA

Courtesy of Dassault Systemes

Effective performance

3

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒

Effective performance

4

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 = 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑏𝑢𝑖𝑙𝑑 𝐵𝑉𝐻 +𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠

𝑟𝑎𝑦 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒

“speed” “quality”

Effective performance

5

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 = 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑏𝑢𝑖𝑙𝑑 𝐵𝑉𝐻 +𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠

𝑟𝑎𝑦 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡

Interactiveapps

Architecture& design

Movieproduction

Both matter

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒

Mrays/s

Effective performance

6

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

SODA (2.2M tris)

NVIDIA GTX Titan

Diffuse rays

Effective performance

7

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

SBVH[Stich et al. 2009]

(CPU, 4 cores)

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Build timedominates

Mrays/s

Effective performance

8

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

HLBVH[Garanzha et al. 2011]

(GPU)

???

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

SBVH[Stich et al. 2009]

(CPU, 4 cores)

Effective performance

9

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

Our method(GPU)

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

HLBVH[Garanzha et al. 2011]

(GPU)

SBVH[Stich et al. 2009]

(CPU, 4 cores)

0

50

100

150

200

250

300

350

400

450

1M 10M 100M 1G 10G 100G 1T

Effective performance

10

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒

30M–500Grays/frame

97% ofSBVH

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

Best quality–speed tradeoff for wide range of applications

Treelet restructuring

11

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

Treelet restructuring

12

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

R

Treelet restructuring

13

Treelet root

Treelet leaf

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

R

Treelet restructuring

14

Treelet root

Treelet leaf

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Grow

R

Treelet restructuring

15

Treeletinternal

node

Grow

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

R

Treelet restructuring

16

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

R

Treelet restructuring

17

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

R

Treelet restructuring

18

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Treelet restructuring

19

R

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Largest leaves → best results

Treelet restructuring

20

R

C

A B

F

D

G

E

𝑛 = 7treelet leaves

𝑛 − 1 = 6treelet internal

nodesIdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Largest leaves → best results

Valid binary tree in itself

Treelet restructuring

21

R

C

A B

F

D

G

E

ActualBVH leaf

Arbitrarysubtree

IdeaBuild a low-quality BVH

Optimize its node topology

Look at multiple nodes at once

TreeletSubset of a node’s descendants

Grow by turning leaves into internal nodes

Largest leaves → best results

Valid binary tree in itselfLeaves can represent arbitrary subtrees of the BVH

Treelet restructuring

22

R

C

A B

F

D

G

E

RestructuringConstruct optimal binary tree for the same set of leaves

Replace old treelet

Treelet restructuring

23

C

A B

F

D

G

E

R

RestructuringConstruct optimal binary tree for the same set of leaves

Replace old treelet

Reuse the same nodesUpdate connectivity and AABBs

New AABBs should be smaller

D

E

G

Treelet restructuring

24

A

R

C

F

B

RestructuringConstruct optimal binary tree for the same set of leaves

Replace old treelet

Reuse the same nodesUpdate connectivity and AABBs

New AABBs should be smaller

Treelet restructuring

25

CA BF

D

G E

R

RestructuringConstruct optimal binary tree for the same set of leaves

Replace old treelet

Reuse the same nodesUpdate connectivity and AABBs

New AABBs should be smaller

Treelet restructuring

26

R

CA BF

D

G E

RestructuringConstruct optimal binary tree for the same set of leaves

Replace old treelet

Reuse the same nodesUpdate connectivity and AABBs

New AABBs should be smaller

Perfectly localized operationLeaves and their subtrees are kept intact

No need to look at subtree contents

Processing stages

27

Initial BVH construction

Post-processing

Optimization

Input triangles

One triangleper leaf

Processing stages

28

Initial BVH construction

Post-processing

Optimization

Parallel LBVH[Karras 2012]

60-bit Morton codesfor accurate spatial

partitioning

Processing stages

29

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Restructure multipletreelets in parallel

Processing stages

30

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Processing stages

31

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Processing stages

32

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Processing stages

33

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Processing stages

34

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Processing stages

35

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Strict bottom-up order→ no overlap between treelets

Processing stages

36

Initial BVH construction

Post-processing

Rinse and repeat(3 times is plenty)

Optimization

Processing stages

37

Initial BVH construction

Post-processing

Optimization

Collapse subtreesinto leaf nodes

Processing stages

38

Initial BVH construction

Post-processing

Optimization

Collect trianglesinto linear lists Prepare them for Woop’s

intersection test[Woop 2004]

Processing stages

39

Initial BVH construction

Post-processing

Optimization

Fast GPU ray traversal[Aila et al. 2012]

Processing stages

40

Initial BVH construction

Post-processing

Optimization

Triangle splitting

Fast GPU ray traversal[Aila et al. 2012]

Processing stages

41

Initial BVH construction

Post-processing

Optimization

Triangle splitting

Fast GPU ray traversal[Aila et al. 2012]

0.4 ms

5.4 ms

6.6 ms

17.0 ms

21.4 ms

1.2 ms

1.6 ms

DRAGON (870K tris)

NVIDIA GTX Titan23.6 ms / 30.0 ms

No splits

30% splits

Cost model

Surface area cost model[Goldsmith and Salmon 1987], [MacDonald and Booth 1990]

𝑆𝐴𝐻 ≔ 𝐶𝑖

𝑛∈𝐼

𝐴(𝑛)

𝐴(root)+ 𝐶𝑡

𝑙∈𝐿

𝐴 𝑙

𝐴 root𝑁(𝑙)

Track cost and triangle count of each subtree

Minimize SAH cost of the final BVH

Make collapsing decisions already during optimization

→ Unified processing of leaves and internal nodes

42

Optimal restructuring

43

Finding the optimal node topology is NP-hard

Naive algorithm → ࣩ 𝑛!

Our approach → ࣩ 3𝑛

But it becomes very powerful as 𝑛 grows

𝑛 = 7 treelet leaves is enough for high-quality results

Use fixed-size treelets

Constant cost per treelet

→ Linear with respect to scene size

Optimal restructuring

44

Treelet size Layouts Quality vs. SBVH *

4 15 78%

5 105 85%

6 945 88%

7 10,395 97%

8 135,135 98%

* SODA (2.2M tris)

Number of unique ways forrestructuring a given treelet

Ray tracing performanceafter 3 rounds of optimization

Optimal restructuring

45

Treelet size Layouts Quality vs. SBVH *

4 15 78%

5 105 85%

6 945 88%

7 10,395 97%

8 135,135 98%

* SODA (2.2M tris)

Almost thesame thing astree rotations

[Kensler 2008]

Limited options during optimization→ easy to get stuck in a local optimum

Varies a lotbetweenscenes

Optimal restructuring

46

Treelet size Layouts Quality vs. SBVH *

4 15 78%

5 105 85%

6 945 88%

7 10,395 97%

8 135,135 98%

* SODA (2.2M tris)

Can still beimplemented

efficiently

Surely one of thesewill take us forward

Consistentacross scenes

Further improvementis marginal

Algorithm

Dynamic programming

Solve small subproblems first

Tabulate their solutions

Build on them to solve larger subproblems

Subproblem:

What’s the best node topology for a subset of the leaves?

47

Algorithm

48

input: set of 𝑛 treelet leavesfor 𝑘 = 2 to 𝑛 do

for each subset of size 𝑘 dofor each way of partitioning the leaves do

look up subtree costscalculate SAH cost

end forrecord the best solution

end forend forreconstruct optimal topology

Process subsets fromsmallest to largest

Record the optimalSAH cost for each

Algorithm

49

input: set of 𝑛 treelet leavesfor 𝑘 = 2 to 𝑛 do

for each subset of size 𝑘 dofor each way of partitioning the leaves do

look up subtree costscalculate SAH cost

end forrecord the best solution

end forend forreconstruct optimal topology

Exhaustive search:assign each leaf toleft/right subtree

We already knowhow much thesubtrees will cost

Backtrack thepartitioning choices

Scalar vs. SIMD

50

Scalar processing

Each thread processes one treelet

Need many treelets in flight

SIMD processing

32 threads collaborate on the same treelet

Need few treelets in flight

✗Spills to off-chip memory

✗Doesn’t scale to small scenes

✓Trivial to implement

✓Data fits in on-chip memory

✓Easy to fill the entire GPU

✗Need to keep all threads busy

Parallelize over subproblems usinga pre-optimized processing schedule

(details in the paper)

Scalar vs. SIMD

51

Scalar processing

Each thread processes one treelet

Need many treelets in flight

SIMD processing

32 threads collaborate on the same treelet

Need few treelets in flight

✓Data fits in on-chip memory

✓Easy to fill the entire GPU

✓Possible to keep threads busy

✗Spills to off-chip memory

✗Doesn’t scale to small scenes

✓Trivial to implement

Quality vs. speed

Spend less effort on bottom-most nodes

Low contribution to SAH cost

Quick convergence

Additional parameter 𝛾

Only process subtrees that are large enough

Trade quality for speed

Double 𝛾 after each round

Significant speedup

Negligible effect on quality

52

Triangle splitting

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

53

Bounding box is nota good approximation

Split it!

Large triangle

Triangle splitting

54

Resulting boxesprovide atighter bound

Keep going until theyare small enough

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

Triangle splitting

55

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

Keep going until theyare small enough

Triangle splitting

56

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

Treat each box as aseparate primitive

Triangle itselfremains the same

Triangle splitting

Shortcomings of pre-process splitting

Can hurt ray tracing performance

Unpredictable memory usage

Requires manual tuning

Improve with better heuristics

Select good split planes

Concentrate splits where they matter

Use a fixed split budget

57

Split plane selection

58

Root node partitions thescene at its spatial median

Reduce node overlap in the initial BVH

Split plane selection

59

Left child

Right child

Reduce node overlap in the initial BVH

If a trianglecrosses the plane...

Split plane selection

60

...the bounding boxes will overlap

Use the samespatial medianas a split plane

Reduce node overlap in the initial BVH

Split plane selection

61

No overlap

Use the samespatial medianas a split plane

Reduce node overlap in the initial BVH

Split plane selection

62

Splitting one triangledoes not help much

Need to split them allto get the benefits

Reduce node overlap in the initial BVH

Split plane selection

63

Reduce node overlap in the initial BVH

Split plane selection

64

Same reasoning holds on multiple levels

Reduce node overlap in the initial BVH

Level 0

Level 1

Level 2 Level 2

Level 3

Level 3

Split plane selection

65

Look at all spatialmedian planes thatintersect a triangle

Split it with thedominant one

Reduce node overlap in the initial BVH

Level 1

Level 2 Level 2Level 0

Level 3

Level 3

Algorithm

1. Allocate memory for a fixed split budget

66

Algorithm

1. Allocate memory for a fixed split budget

2. Calculate a priority value for each triangle

67

Algorithm

1. Allocate memory for a fixed split budget

2. Calculate a priority value for each triangle

3. Distribute the split budget among triangles

Proportional to their priority values

68

Algorithm

1. Allocate memory for a fixed split budget

2. Calculate a priority value for each triangle

3. Distribute the split budget among triangles

Proportional to their priority values

4. Split each triangle recursively

Distribute remaining splits according to the size of the resulting AABBs

69

Split priority

70

𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦 = 2(−𝑙𝑒𝑣𝑒𝑙) ∙ 𝐴𝑎𝑎𝑏𝑏 − 𝐴𝑖𝑑𝑒𝑎𝑙 1 3

Crosses an importantspatial median plane?

Has large potential forreducing surface area?

Concentrate on triangleswhere both apply

…but leavesomething forthe rest, too

ResultsCompare against 4 CPU and 3 GPU builders

4-core i7 930, NVIDIA GTX Titan

Average of 20 test scenes, multiple viewpoints

71

Ray tracing performance

72

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

0%

20%

40%

60%

80%

100%

120%

140%

SweepSAH = 100%

High-quality CPU builders

SBVH = 131%

0%

20%

40%

60%

80%

100%

120%

140%

Ray tracing performance

73

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

LBVH[Karras]

HLBVH[Garanzha]

GridSAH[Garanzha]

Fast GPU builders

67% – 69%

SBVH = 131%

Almost 2×

Ray tracing performance

74

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

TRBVH TRBVH+30% split

LBVH[Karras]

HLBVH[Garanzha]

GridSAH[Garanzha]

0%

20%

40%

60%

80%

100%

120%

140%

No splits

30% splits

Our method

Ray tracing performance

75

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

TRBVH TRBVH+30% split

LBVH[Karras]

HLBVH[Garanzha]

GridSAH[Garanzha]

0%

20%

40%

60%

80%

100%

120%

140%

96% of SweepSAH

91% of SBVH

Effective performance

76

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Tree rotations[Kensler]

Iterative reinsertion[Bittner]

Not Pareto-optimal

SBVH[Stich]

SweepSAH[MacDonald]

Effective performance

77

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

SBVH[Stich]

SweepSAH[MacDonald]

Effective performance

78

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

HLBVH[Garanzha] GridSAH

[Garanzha]

SBVH[Stich]

SweepSAH[MacDonald]

LBVH[Karras]

Not Pareto-optimal

Effective performance

79

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

SBVH[Stich]

SweepSAH[MacDonald]

LBVH[Karras]

Effective performance

80

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

SBVH[Stich]

Our method(no splits)

SweepSAH[MacDonald]

LBVH[Karras]

Our method(30% splits)

Effective performance

81

0%

20%

40%

60%

80%

100%

120%

140%

1M 10M 100M 1G 10G 100G 1T

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

7M–60G rays/frame→ our method is the best choice

Below 7M→ LBVH

Above 60G→ SBVH

Conclusion

General framework for optimizing trees

Inherently parallel

Approximate restructuring → larger treelets?

Practical GPU-based BVH builder

Best choice in a large class of applications

Adjustable quality–speed tradeoff

Will be integrated into NVIDIA OptiX

82

83

Thank you

Acknowledgements

Samuli Laine

Jaakko Lehtinen

Sami Liedes

David McAllister

Anonymous reviewers

Anat Grynberg and Greg Ward for CONFERENCE

University of Utah for FAIRY

Marko Dabrovic for SIBENIK

Ryan Vance for BUBS

Samuli Laine for HAIRBALL and VEGETATION

Guillermo Leal Laguno for SANMIGUEL

Jonathan Good for ARABIC, BABYLONIAN and ITALIAN

Stanford Computer Graphics Laboratory for ARMADILLO, BUDDHA and DRAGON

Cornell University for BAR

Georgia Institute of Technology for BLADE