Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA –...

42
Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    2

Transcript of Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA –...

Page 1: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Database scalability and indexes

Goetz Graefe

Hewlett-Packard Laboratories

Palo Alto, CA – Madison, WI

Page 2: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 2

Dimensions of scalability

• Data size – cost per terabyte ($/TB)

• Information complexity (database schema size)

• Operational scale (data sources & transformations)

• Multi-programming level (many queries)

• Concurrency (updates, roll-in load, roll-out purge)

• Query complexity (tables, operations, parameters)

• Representation (indexing) complexity

• Storage hierarchy (levels, staging)

• Hardware architecture (e.g., parallelism)

Page 3: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 3

Agenda

• Indexing taxonomy

• B-tree technology

Page 4: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 4

Page 5: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 5

Page 6: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 6

Page 7: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Balancing bandwidths• Disk, network, memory, CPU processing

– Decompression, predicate evaluation, copying

• Table scans– Row stores, column stores– NSM versus PAX versus ?

• Index scans– Range queries, look-ups, MDAM

• Intermediate results– Sort, hash join, hybrid hash join, etc.

April 18, 2023 Database scalability and indexes 7

How many disksper CPU core?

Flash devices ortraditional disks?

Page 8: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 8

Page 9: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Hardware support

• CPU caches– Alignment, data organization– Prefetch instructions

• Instructions for large data– Quadwords, etc.

• Native encoding– Avoid decimal numerics

• GPUs? FPGAs?

April 18, 2023 Database scalability and indexes 9

Binary search orinterpolation search?

Avoid XML?

Page 10: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 10

Page 11: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Read-ahead and write-behind

Buffer pool = latency × bandwidth

• Disk-order scans– Guided by allocation information

• Index-order scans– Guided by parent & grandparent levels– Avoid neighbor pointers in B-tree leaves

• Index-to-index navigation– Sort references prior to index nested loops join– Hint references from query execution to storage layer

April 18, 2023 Database scalability and indexes 11

More I/O requeststhan devices!

More I/O requeststhan devices!

Page 12: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 12

Page 13: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

“Fail fast” and fault isolation

• Local slow-down produces asymmetry– Weakest node imposes global slow-down

• Enable asynchrony in I/O and in processing

• Enable incremental load balancing– Schedule multiple work units per server– Largest first, assign work as servers free up

April 18, 2023 Database scalability and indexes 13

25 work units for 8 servers:S, J, etc. first – Q, Z, Y, X last

Page 14: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 14

Page 15: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Scheduling in query execution

• Admission control – too much concurrency

• Degree of parallelism – match available cores

• Pipelining of operations – avoid thrashing

• “Slack” between producers and consumers– Partitioning: output buffer per consumer– Merging: input buffer per producer– “Free” packets to enable asynchronous execution– 512×512×4×64 KB = 236 B = 16 GB

Lower memory need with more synchronization?

April 18, 2023 Database scalability and indexes 15

Page 16: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 16

Page 17: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Synchronization in communication

• “Slack” is a bad place to save memory!

• Demand-driven versus data-driven execution– Faster producer will starve for free packets– Faster consumer will starve for full packets– Slowest step in pipeline determines bandwidth

April 18, 2023 Database scalability and indexes 17

Page 18: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 18

Page 19: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Bad algorithms in query execution

• Query optimization versus query execution– Compile-time versus run-time– Anticipated sizes, memory availability, etc.

• Fast execution with perfect query optimization– Merge join: sorted indexes, sorted intermediate results– Hash join

• Robust execution by run-time adaptation– Index nested loops join– Requires some innovation …

April 18, 2023 Database scalability and indexes 19

Page 20: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 CIDR 2009 20

Query

• Varying predicate selectivity together or separately

• Forced plans – focus on robustness of execution – Resource management (memory allocation) – Index use, join algorithm, join order

select count (*) from lineitem where l_partkey >= :lowpart and l_shipdate >= :lowdate

Page 21: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 CIDR 2009 21

Physical database • Primary index on order key, line number

• 1-column (non-covering) secondary indexes – Foreign keys, date columns

• 2-column (covering) secondary indexes – Part key + ship date, ship date + part key

• Large plan space – Table scan – Single index + fetch from table – Join two indexes to cover the query – Exploit two-column indexes

Page 22: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Wildly different performance curves

April 18, 2023 Database scalability and indexes 22

Single-table execution times

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

800.00

900.00

1,000.00

Row count

Tim

e [

se

co

nd

s]

Scan plan Fetch plan Join plan Fetch 9115 Hash join

Merge join Join + fetch

Page 23: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 CIDR 2009 23

Observations • Table scan is very robust but not efficient

– Materialized views should enable fetching query results

• Traditional fetch is very efficient but not robust – Perhaps addressed with risk-based cost calculation

• Multi-index plans are efficient and robust – Independent of join order + method (in this experiment)

• Non-traditional fetch is quite robust – Asynchronous prefetch or read-ahead – Sorting record identifiers or keys in primary index – Sort effect seems limited at high end

Page 24: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 24

Page 25: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Hash join vs index nested loops join• In-memory is an index!

– Direct address calculation– Thread-private: memory allocation, concurrency control

• Traditional index nested loops join– Index search using comparisons and binary search– Shared pages in the buffer pool

• Improved index nested loops join– Prefetch & pin the index in the buffer pool– Replace page identifiers with in-memory pointers– Replace binary search with interpolation search

April 18, 2023 Database scalability and indexes 25

Page 26: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Index maintenance

• Data warehouse: fact table with 3-9 foreign keys– Non-clustered index per foreign key– Plus 1-3 date columns with non-clustered indexes– Plus materialized and indexed views

• Traditional bulk insertion (load, roll-in)– Per row: 4-12 index insertions, read-write 1 leaf each– Per disk: 200 I/Os per second, 10 rows/sec = 1 KB/sec

• Known techniques– Drop indexes prior to bulk insertion?– Deferred index & view maintenance?

April 18, 2023 Database scalability and indexes 26

Page 27: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 27

Partitioned B-trees

Traditional B-tree index

Partitioned B-tree …

… after merging a-j

a z

a za a azzz

a zk k kzzzkj

#1 #2 #3 #4

#4#3#2#1#0

Page 28: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 28

Algorithms

• Run generation– Quicksort or replacement selection (priority queue)– Exploit all available memory, grow & shrink as needed

• Merging– Like external merge sort, efficient on block-access– Exploit all available memory, grow & shrink as needed– Best case: single merge step

Page 29: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Concurrency control and recovery

April 18, 2023 Database scalability and indexes 29

“Must reads”for database geeks

Page 30: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Concurrency control and recovery

April 18, 2023 Database scalability and indexes 30

“Should reads”for database geeks

Page 31: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Goetz Graefe: Key-range locking

31

Tutorial on hierarchical locking

• More generally: multi-granularity locking

• Lock acquisition down a hierarchy – “Intention” locks IS and IX

• Standard example: file & page – T1 holds S lock on file

– T2 wants IS lock on file, S locks on some pages

– T3 wants X lock on file

– T4 wants IX lock on file,X locks on some pages

S X IS IX SIX

S ok ok

X

IS ok ok ok ok

IX ok ok

SIX ok

S X

S ok

X

Page 32: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Goetz Graefe: Key-range locking

32

Quiz

• Why are all intention locks compatible?

• Conflicts are decided more accurately at a finer granularity of locking.

Page 33: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Goetz Graefe: Key-range locking

33

SQL Server lock modes

Page 34: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Goetz Graefe: Key-range locking

34

Lock manager invocations

• Combine IS+S+Ø into SØ (“key shared, gap free”) Cut lock manager invocations by factor 2

• Strict application of standard techniques No new semantics

Automatic derivation S X IS IX

S ok ok

X

IS ok ok ok

IX ok ok

S X SØ ØS XØ ØX SX XS

S ok ok ok

X

SØ ok ok ok ok ok

ØS ok ok ok ok ok

XØ ok ok

ØX ok ok

SX ok

XS ok

Page 35: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Goetz Graefe: Key-range locking

35

Key deletion

• User transaction – Sets ghost bit in record header – Lock mode is XØ (“key exclusive, gap free”)

• System transaction – Verifies absence of locks & lock requests – Erases ghost record – No lock required, data structure change only– Absence of other locks is required

Page 36: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Goetz Graefe: Key-range locking

36

Key insertion after deletion

• Insertion finds ghost record – Clears ghost bit – Sets other fields as appropriate – Lock mode is XØ (“key exclusive, gap free”)

• Insertion reverses deletion

Page 37: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Goetz Graefe: Key-range locking

37

Key insertion

• System transaction creates a ghost record – Verifies absence of ØS lock on low gap boundary

(actually compatibility with ØX) – No lock acquisition required

• User transaction marks the record valid – Locking the new key in XØ (“key exclusive, gap free”) – High concurrency among user insertions

• No need for “creative” lock modes or durations

• Insertion mirrors deletion

Page 38: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Goetz Graefe: Key-range locking

38

Logging a deletion

• Traditional design – Small log record in user transaction – Full undo log record in system transaction

• Optimization – Single log record for entire system transaction – With both old record identifier and transaction commit – No need for transaction undo – No need to log record contents – Big savings in clustered indexes

Transaction …, Page …, erase ghost 2; commit!

Page 39: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Goetz Graefe: Key-range locking

39

Logging an insertion

• 1st design – Minimal log record for ghost creation – key value only – Full log record in user transaction for update

• 2nd design – Full user record created as ghost – full log record – Small log record in user transaction

• Bulk append– Use 1st design above – Run-length encoding of multiple new keys

Transaction …, Page …, create ghosts 4-8, keys 4711 (+1)

Page 40: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

Goetz Graefe: Key-range locking

40

Summary: key range locking

• “Radically old” design

• Sound theory – no “creative” lock modes – Strict application of multi-granularity locking – Automatic derivation of “macro” lock modes – Standard lock retention until end-of-transaction

• More concurrency than traditional designs – Orthogonality avoids missing lock modes

• Key insertion & deletion via ghost records – Insertion is symmetric to deletion – Efficient system transactions, including logging

Page 41: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 41

Like scalabledatabase indexing

Page 42: Database scalability and indexes Goetz Graefe Hewlett-Packard Laboratories Palo Alto, CA – Madison, WI.

April 18, 2023 Database scalability and indexes 42

Summary

• Re-think parallel data & algorithms:– Partitioning: load balancing– Pipelining: communication & synchronization– Local execution: algorithms & data structures!

• Re-think power efficiency– Algorithms & data structures!

• Database query & update processing– Re-think indexes & their implementation