The effectiveness of type-based unboxing
Transcript of The effectiveness of type-based unboxing
![Page 1: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/1.jpg)
The effectiveness of type-based unboxing
Xavier Leroy
Vincent Foley-BourgonCOMP-763 - Fall 2013
McGill University
September 2013
![Page 2: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/2.jpg)
Plan
1. About the paper
2. The Big Idea
3. Type-directed unboxing
4. Type-directed unboxing overhead
5. Type-directed unboxing GC overhead
6. Untyped unboxing
7. Experimental results
2 / 37
![Page 3: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/3.jpg)
About the paper
3 / 37
![Page 4: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/4.jpg)
About the paper
I Written by Xavier Leroy, INRIA
I Published in 1997
I Presented at the “Types in Compilation” workshop of ICFP’97
4 / 37
![Page 5: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/5.jpg)
The Big Idea
5 / 37
![Page 6: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/6.jpg)
Why do we need boxing?C, Pascal
I All data types are known at compile-time
I Efficient memory layout
I Efficient calling conventions
6 / 37
![Page 7: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/7.jpg)
Why do we need boxing?ML
I Polymorphism and type abstraction
I Compile-time type 6= Run-time type
v a l t r i p l i c a t e : ’ a −> ’ a a r r a yl e t t r i p l i c a t e x = [ | x ; x ; x | ]
7 / 37
![Page 8: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/8.jpg)
Why do we need boxing?ML
I Abandon C-style representation
I Revert to Lisp-style representation
I All data structures fit a common format (e.g. one word)
8 / 37
![Page 9: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/9.jpg)
Boxing and unboxingExplanation
Boxing: heap-allocating and handling through a pointerUnboxing: getting at the primitive data through the pointer
9 / 37
![Page 10: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/10.jpg)
Boxed valuesSo, what’s the problem?
I In tight loops, the constant boxing and unboxing is a majorbottleneck
I Especially true in numerical applications
I Need a strategy to avoid unnecessary boxing/unboxing
I Some strategies rely on type information
I Others rely on program analysis, apply equally well todynamically-typed languages
10 / 37
![Page 11: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/11.jpg)
Monomorphisation
I Possible solution: monomorphisation
I Duplicate and specialize all generic functions for each typeinstanciation
I No major increase in code size
I Not viable for OCaml /
11 / 37
![Page 12: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/12.jpg)
Type-directed unboxing
12 / 37
![Page 13: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/13.jpg)
Coercions
I Coercions between boxed and unboxed representationsinserted at type specialization points
I Generic code always operates on boxed values
I Monomorphic code can take advantage of unboxedrepresentations
I , Efficient register-based calling conventions
I / Does not support deep unboxing (e.g. arrays of unboxedelements)
13 / 37
![Page 14: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/14.jpg)
Run-time type inspection
I Run-time representation of static types maintained
I Extra arguments for polymorphic functions
I Extra fields for structures
I Generic code inspects those run-time type expressions
I , Supports arbitrary unboxing in data structures
I / Not very good with register-based calling conventions
14 / 37
![Page 15: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/15.jpg)
Tag-based unboxing
I Used in dynamically-typed languages
I Type information is attached to the data structure
I Small set of base types, encoded at the bit level
15 / 37
![Page 16: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/16.jpg)
Tag-based unboxingOCaml
I 1-bit tagging
I Two kinds of arrays
I Arrays of tagged ints or pointers
I Arrays of unboxed floats
I Arrays with a concrete type: generate code for accessingarrays of pointers or floats
I Arrays with statically unknown type: test tag at run-time, andif float array, perform unboxing of floats
16 / 37
![Page 17: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/17.jpg)
Type-directed unboxingoverhead
17 / 37
![Page 18: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/18.jpg)
Coercions
I Often, no overhead (boxing+unboxing would’ve happenedanyway)
I Some examples show long sequence of successiveunboxing+boxing before data is actually used
18 / 37
![Page 19: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/19.jpg)
Run-time type inspection
Can anyone guess what the sources of overhead for RTTI are?
19 / 37
![Page 20: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/20.jpg)
Run-time type inspection
I More arguments to pass
I Heap allocations to build tree of type expressions
I Testing the type expressions
I “Several techniques have been proposed to reduce overhead oftype building or type inspection, but not both.”
20 / 37
![Page 21: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/21.jpg)
Tag-based unboxing
I Shares some of the costs of RTTI, but not all
I In OCaml, tags are stored with GC information
I No overhead to function calls
I Run-time cost relatively small (one load, one compare)
I Extra conditional branches
I E.g.: OCaml 1.05: polymorphic array copy is 10x slower thanint array copy, and 8x slower than float array copy
I In OCaml 4.00, naıve polymorphic array copy is ∼2.5x slowerthan either int or float array copy
21 / 37
![Page 22: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/22.jpg)
Type-directed unboxing GCoverhead
22 / 37
![Page 23: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/23.jpg)
Overhead in GC
What might be a source of overhead (and headaches) with anunboxing strategy?
23 / 37
![Page 24: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/24.jpg)
Getting the roots in the stack
I Without unboxing, all values on the stack are either taggedints or pointers
I With unboxing, some values are unboxed ints or floats
I Need to distinguish between boxed and unboxed values
I One possibility (used by OCaml): maintain a table of thepointers in the frame
24 / 37
![Page 25: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/25.jpg)
Mixture of pointers and raw data in blocks
I With some unboxing strategies, heap blocks will containpointers interleaved with unboxed values
I E.g. heap block containing a string * float * int list
value
I The string and list are boxed
I The float is unboxed
I Maintain a table of the primitive types (pointer, int, float) inthe block header
25 / 37
![Page 26: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/26.jpg)
Untyped unboxing
26 / 37
![Page 27: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/27.jpg)
Local unboxing
I Boxing and unboxing that cancel each other out in the samefunction body are eliminated by a dataflow analysis
I How many boxing/unboxing operations in the followingexample?
l e t f ( a : f l o a t a r r a y ) ( x : f l o a t ) =l e t y = a . ( 0 ) ∗ . x i ny +. 1 . 0
I Simple and very effective on numerical code
I Could be extended to inter-procedural analysis
27 / 37
![Page 28: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/28.jpg)
Known functions and partial inlining
I Functions in ML are usually curried
l e t f a b c = a + b + c=>l e t f =
fun a −>fun b −>
fun c −> a+b+c
I Have two entry points: standard (curried) and quick (allarguments supplied)
I A control-flow analysis can determine if all arguments aresupplied, and use the quick entry point
I In OCaml test suite, 80% to 100% of all function calls use thequick entry point
28 / 37
![Page 29: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/29.jpg)
Experimental results
29 / 37
![Page 30: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/30.jpg)
Match-ups
I Gallium 1 vs Gallium 1
I One version is using coercion-based unboxing
I The other is using fully boxed, tagged data representations
I Gallium 2 vs OCaml
I Gallium 2: coercion-based, tag-based unboxing of float arrays
I OCaml: mostly-tagged data representation, local unboxing offloats, multiple function entry points, tag-based unboxing offloat arrays
30 / 37
![Page 31: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/31.jpg)
Gallium 1 vs Gallium 1
Test Unboxing Boxed Test type
takeushi 3.00 5.09 fun calls, int arithintegral 0.80 2.83 float arith, loopssumlist 3.60 3.45 lists, int arithsieve 1.00 0.94 int arith, lists, polymorphismboyer 1.80 2.76 fun calls, symbolsknuth-bendix 0.90 0.98 symbols, polymorphismquad quad succ 6.58 2.40 Church numbers
31 / 37
![Page 32: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/32.jpg)
Gallium 1 vs Gallium 1
32 / 37
![Page 33: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/33.jpg)
Gallium 1 vs Gallium 1
I Unboxing strategy yields a noticeable performance boost inmany tests
I quad quad succ shows off one of the performance overheadof coercion-based unboxing
33 / 37
![Page 34: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/34.jpg)
Gallium 2 vs OCamlTest Gallium 2 OCaml Description
bdd 19.0 12.3 term processing, hash tablesbdd * 17.8 11.0 bdd, bounds checking offboyer 0.52 0.62 term processing, fun callsfft 3.49 2.00 float arith, float arraysfft * 2.02 1.58 fft, bounds checking offfib 0.33 0.34 int arith, fun callsgenlex 0.69 0.76 lexing, parsing, symbolsknuth-bendix 3.00 2.47 term processing, fun callsmandelbrot 2.52 7.31 float arith, loopsnucleic 0.88 0.89 float arith, treesquad quad succ 0.53 0.12 Church numerals, polymorphismquicksort 1.44 0.65 int arrays, loopsquicksort * 0.54 0.43 quicksort, bounds checking offsieve 1.03 1.01 int arith, listssolitaire 1.51 0.56 arrays, loopssolitaire * 0.41 0.38 solitaire, bounds checking offtakeushi 0.41 0.39 int arith, fun calls
34 / 37
![Page 35: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/35.jpg)
Gallium 2 vs OCaml
35 / 37
![Page 36: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/36.jpg)
Gallium 2 vs OCaml
I Despite less sophisticated unboxing strategy, OCaml matchesand beats Gallium 2 in most tests
I Floating-point tests (fft, nucleic) show that the local unboxingstrategy of OCaml is just as effective as the more generalstrategy of Gallium 2.
I The only test (mandelbrot) where Gallium 2 is significantlyfaster is due to Gallium removing 2 levels of indirection whileOCaml removes only 1
36 / 37
![Page 37: The effectiveness of type-based unboxing](https://reader031.fdocuments.net/reader031/viewer/2022012407/616a294a11a7b741a34f7840/html5/thumbnails/37.jpg)
〈/presentation〉
37 / 37