Vectorization vs. Compilation in Query Execution · 3 Vectorization vs. Compilation Vectorization...

47
Vectorization vs. Compilation in Query Execution Juliusz Sompolski Peter Boncz Marcin Zukowski June 13th, 2011 DaMoN 2011, Athens, Greece

Transcript of Vectorization vs. Compilation in Query Execution · 3 Vectorization vs. Compilation Vectorization...

  • Vectorization vs. Compilationin Query Execution

    Juliusz SompolskiPeter BonczMarcin Zukowski

    June 13th, 2011DaMoN 2011, Athens, Greece

  • 2

    Interpreted DBMS

  • 3

    Vectorization vs. Compilation

    Vectorization

    CIDR 2005

    P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100:Hyper-Pipelining Query Execution. In Proc. CIDR,Asilomar, CA, USA, 2005.

  • 4

    Vectorization vs. Compilation

    JIT compilation

    Vectorization

    CIDR 2005

    P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100:Hyper-Pipelining Query Execution. In Proc. CIDR,Asilomar, CA, USA, 2005.

  • 5

    Vectorization vs. Compilation

    • Sure :-). These are orthogonal techniques, and they can be combined.

    • Our study: Is it worth combining them?– If you have vectorization (us!), should you do

    compilation?– If you have compilation, should you process

    data in vectors?• Our answer: Yes it is!

  • 6

    Compilation: Single-loop

    • Compilation as proposed so far is “single-loop” compilation.– Processing as in tuple-at-a-time system.

    for each tuple if(oid >= 100 && oid = 100 AND oid

  • 7

    Vectorization: Multi-loop

    • Vectorization is “multi-loop” by definition.– Basic operations performed vector-at-a-time.– Interpretation overhead amortized.– Materialization of each step’s result.

    while(tuples) Get vector of n tuples; for(i = 0,m=0; i= 100) sel[m++] = i; for(i = 0,k=0; i

  • 8

    Multi-loop compilation

    • Multi-loop compilation is often best!– Compiling small fragments takes less compilation

    time and is more reusable.– Sometimes benefits of a tight loop are bigger than

    materialization cost.while(tuples) Get vector of n tuples; for(i = 0,m=0; i= 100) sel[m++] = i; for(i = 0,k=0; i

  • 9

    Case studies

    • Projections

    • Selections

    • Hash lookups

  • 10

    • Projections

    • Selections

    • Hash lookups

    Case studies

    Multi-loop on modern hardware:

    Easier SIMD

    Avoids branch mispredictions

    Improves memoryaccess pattern

  • 11

    Hash lookup algorithm

    pos = B[hash_keys(probe_keys)] if (pos) { do { // pos == 0 reserved for miss. if (keys_equal(probe_keys, V[pos].keys)) { fetch_value_columns(V[pos]); break; // match } } while(pos = next in chain); // collision or miss }

  • 12

    Hash lookup algorithm

    pos = B[hash_keys(probe_keys)] if (pos) { do { // pos == 0 reserved for miss. if (keys_equal(probe_keys, V[pos].keys)) { fetch_value_columns(V[pos]); break; // match } } while(pos = next in chain); // collision or miss }

    Interpretation:•Type of keys.•Multi-attribute keys.•Type of fetched columns.•Number of fetched columns.

  • 13

    Single-loop hash lookup:avoid interpretation

    for (i=0; i

  • 14

    Single-loop hash lookup:dependencies

    for (i=0; i

  • 15

    Single-loop hash lookup:dependencies

    for (i=0; i

  • 16

    Single-loop hash lookup:dependencies

    for (i=0; i

  • 17

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 18

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 19

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 20

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 21

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 22

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 23

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 24

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 25

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 26

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 27

    for (i=0; i

  • 28

    for (i=0; i

  • 29

    for (i=0; i

  • 30

    for (i=0; i

  • 31

    for (i=0; i

  • 32

    for (i=0; i

  • 33

    Multi-loop hash lookup

    Check k1 for pos[] Recheck k2 for pos[]

    Fetch v1 for match[] Fetch v2 for match[]

    Fetch v3 for match[]Fetch new pos[] from next in miss[]Loop untilpos[] empty

    miss[]

    match[]

    Hash vector of k1 Rehash vector of k2 Fetch vector of pos[] from B

    Selectmiss

    match

    // base = &V[0].key1;for(i=0;i

  • 34

    Multi-loop hash lookup

    Check k1 for pos[] Recheck k2 for pos[]

    Fetch v1 for match[] Fetch v2 for match[]

    Fetch v3 for match[]Fetch new pos[] from next in miss[]Loop untilpos[] empty

    miss[]

    match[]

    Hash vector of k1 Rehash vector of k2 Fetch vector of pos[] from B

    Selectmiss

    match

    // base = &V[0].key1;for(i=0;i

  • 35

    Multi-loop hash lookup

    // base = &V[0].key1;for(i=0;i

  • 36

    Multi-loop hash lookup

    // base = &V[0].key1;for(i=0;i

  • 37

    Multi-loop hash lookup

    // base = &V[0].key1;for(i=0;i

  • 38

    Multi-loop hash lookup

    // base = &V[0].key1;for(i=0;i

  • 39

    Multi-loop hash lookup

    // base = &V[0].key1;for(i=0;i

  • 40

    Single-loop hash lookup

    for (i=0; i

  • 41

    for (i=0; i

  • 42

    Multi-loop compiled hash lookupHash/rehash and fetch vector of Pos[] from B

    For each element pos in Pos[]:Check keys of V[pos].if(match):

    fetch V[pos] val1, val2, val3 into resultelse:

    fetch V[pos] next into new Pos[]

    Repeat untilPos[] empty

  • 43

    Multi-loop compiled hash lookupHash/rehash and fetch vector of Pos[] from B

    For each element pos in Pos[]:Check keys of V[pos].if(match):

    fetch V[pos] val1, val2, val3 into resultelse:

    fetch V[pos] next into new Pos[]

    Repeat untilPos[] empty

    Independent memory accessesIn different loop iterations

    Reads tuple once.

  • 44

    Hash lookup benchmarks

    • Experiment 1:Probing with varying match-ratio.

    • Multi-loop compiled is most robust.

  • 45

    Hash lookup benchmarks

    • Experiment 2:Reduced size of B[ ] array = more hash collisions

    • Multi-loop compiled is most robust.

  • 46

    Conclusions

    • Multi-loop compilation is often the best solution!– Better than vectorization alone.– Better than compilation working tuple-at-a-

    time.• More examples and case studies proving

    this point in the paper.

  • 47

    Thank you!