BDCC:ExploitingFine-GrainedPersistent
MemoriesforOLAPPeterBoncz
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
NVRAM
• Systemintegration:– NVMe:blockdevicesonthePCIe bus– NVDIMM:persistentRAM,byte-levelaccess
• Lowlatency– LowerthanFlash,– closetoDRAM– Asymmetric(r<w)
• Fine-grainedaccess– 512byteforNMVe– NVDIMM:cache-line
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
NVRAM:DBimpact
• Backtothe5-minuterule:– Restoringoldbalanceoflatencyandbandwidth?
• ManychallengesinOLTP– indexstructures,(in-page)logging– ensureconsistency,preventleakage,controlwear
èwhataboutOLAP?Shouldwere-thinkwarehousestorageforlow-latencypersistentmemories?
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
VLDBJournal 2016Volume25,Issue3p.291- 316
BDCC: Bitwise Dimensional Co-Clustering
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCC:howtablesarestored
_bdcc_columnorderingè worksincolumnstores
partition1
partition2
partition3
partition10partition11
partition12
partition100partition101
_bdcc_
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BitwiseInterleaving=Z-Ordering
spacefillingcurve
Computationallycheaperthaneg HilbertCurve
Almostasniceproperties
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCC- DataOrder
• anybitinterleavingofdimensionspossible• round-robin=Z-order• major-minor =classicalMDindex(eg DB2)• anybitmix inbetween
• our automatic algorithmsuse• roundrobinbitinterleaving• clusteringdepthbasedoncolumndensities,typically32KB(SSD)and512KB(HDD)blocks
7
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCC- Whatisit?
• Multi-dimensional indexing– table indexing:notmulti-media(audio,image)indexinghereJ– limitedamountofdimensions(upto5..7)
• Multi-table clustering(co-clustering)– indexingondimensionsfromother tables..– ..reachableoverforeign-keyrelationships– andexploitingcommonindexingdimensionsamongtablesinoperators
• GroupingintoMILLIONS ofverysmallgroups– scattered accesspatternsè Flash IOfriendly!– clustering:becausemillionsnotpossiblewithpartitioning
• Column-store optimized
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCC- TheIdea
Q1
Q2
Q3
Q4
Howdoesthishelp:• selection?• orderby?• Aggregation?•FKjoin?
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
WhatBDCCgivesyouAccelerates
• MostSelections ->selectionpush-down,correlations• MostGroupings• AllForeignKeyJoins (nomatterifdimensionsareinvolved)
• evenremovesjoins,turningthemintoselections• CertainOrder-by
Mostlythroughstrongreductionofmemoryusagewhile
• Nostorageoverhead:everytuple storedonce• Bulkupdate-friendly• Easytoparallelize queryprocessing
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
TwoStagesoftheProject
• BitwiseDimensionalCo-Clustering(BDCC)– I/Olevelclusteringandindexing–QueryprocessingviaPartitionSplit,PartitionRestartpublishedinVLDBJ2016
• DeepDimensionalCo-Clustering(DDC)– additionalI/Oblockclustering–QueryprocessingviaDDC-Recluster()unpublishedyet..WIP
11
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCCStructures
• BDCCdimension– mappingtoconsecutiveintegers– balancingthroughhistogramsandHu-Tucker
• BDCCtable– re-orderedprimarycopy– additional_bdcc_orderattribute
• BDCCcounttable– summarytable(_bdcc_,_count_)– clusteraccessindex
12
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCCStructures
13
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCCStructures
14
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCCStructures
15
“DimensionUse”è“DimensionUse”è“DimensionUse”è
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCCStructures
16
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCCStructures
17
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
Example
“counttotalordereditemsfromGermanyperdayandsupplier”
SELECT o_orderdate, s_name, count(*)
FROM NATION, SUPPLIER, ORDERS, LINEITEM
WHERE n_nationkey=s_nationkey
AND s_suppkey=l_suppkey
AND l_orderkey=o_orderkey
AND n_name='Germany'
GROUP BY o_orderdate, s_name
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
RelationalAlgebraPlan
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
RelationalAlgebraPlan
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCC-scanScansaBDCCtable
InanydesireddimensionorderHere:1:orderdate2:customernation3:suppliernation
Atadesiredgranularityusingbitmasks
3+2+3bitssetè use8bits(256groups)
Pushesdownselections:[0,7]=all[0,3]=all[5,5]=germany
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCC-scan
• extracts _bdcc_ bits è _gid_ columnd3s3c3d2s2c2d1s1c1è d3d2d1c3c2s3s2s1
• delivers tuples ordered on _gid_• performs selection pushdown ([lo-hi])
Basic Idea:• BDCC-scan delivers sorted stream
but sorting is free! As fast as a normal scan• carefully controlled scatter access pattern
we clustered on |_bdcc_| bits, but can BDCC-scan with less
BDCC:BitwiseDimensionalCo-Clustering– GoogleTalk-- 23May2017-- PeterBoncz
BDCCFetchScan
23
• usescount-table tofindtheneeded_bdcc_ranges• fetchestuple rangesinaparticularorder• returnsanascending_gid_columninthetuples
BDCC:BitwiseDimensionalCo-Clustering– GoogleTalk-- 23May2017-- PeterBoncz
BDCC- QueryProcessing
• Partition-wise operatorexecution– hashbasedjoin,grouping/aggregation– bettercacheutilization
• SandwichOperatorsè PartitionSplit & PartitionRestart– sidewaysinformationpassing:PartitionRestart.cross-partition?(_gid_change)
è HashAggr/Join.flush()&PartitionSplit.next-partition()
24
BDCC:BitwiseDimensionalCo-Clustering– GoogleTalk-- 23May2017-- PeterBoncz
BDCC- PerformanceSandwichOperators
• Micro-Benchmarks(TPC-HSF10,LINEITEM-ORDERS)
25
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
RelationalAlgebraPlan
SelectionPushdown+DimensionJoinElimination
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
RelationalAlgebraPlan
SelectionPushdown+DimensionJoinElimination
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
Co-ClusteringClose-upPartdimension
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
Co-ClusteringClose-upPartdimension
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
CommonPath =Co-ClusteringPartdimension
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
CommonDimension =AcceleratedJoin
Partdimension
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCC:AllFKJoinsAccelerated!Partdimension
Datedimension
Nationdimension
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCC- SchemaDesign
• Semi-automatic• Input:CREATEINDEX()andFOREIGNKEY()
• Schematraversalalongforeignkeypaths• propagationof„Index“dimensions• weightedaccordingtoFKpaths
• automatic creationofdimensionsandtables• roundrobinbitinterleaving• clusteringdepthbasedoncolumndensities,typically32KB(SSD)and512KB(HDD)blocks
34
BDCC:BitwiseDimensionalCo-Clustering– GoogleTalk-- 23May2017-- PeterBoncz
BDCC- Optimizer
• IDU:InterestingDimensionUses• alldimensionsdeterminedbyjoin,sortoraggregationattribute
• IDO:InterestingDimensionOrders• alldimensionorderpermutationsofeachIDU
• MDO:MaximalDimensionOrders• PruningofdominatedsortordersofIDOs
• MDOsrepresent„interestingorders“forenumeration
35
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCCPerformance• TPC-HSF100executiontimeforBDCC,coldbufferpool
36
muchbetterpowerscoreswithmuchlessmemory
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCCPerformance• TPC-HSF1000executiontimeforBDCC,coldbufferpool
37
muchbetterpowerscoreswithmuchlessmemory
BDCC:BitwiseDimensionalCo-Clustering– GoogleTalk-- 23May2017-- PeterBoncz
BDCC- Updates
• BatchUpdateSupport• in-memorybuffer• „log-structuredmerge“
38
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
BDCCUpdates• TPC-HSF100updateset
39
•60%bulkappendspeedupcomp.toclustertrees(orderedprojections,usingPDTs)
• formanyupdatesets,BDCConlymergeswithpreviousupdatesinsteadofPDTmergewithfulltable
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
DeepDimensionalClustering(DDC)
• Idea:– Make_bdcc_ haveasmanybitsaspossible– ForI/O(BDCC-scan)onlyusethemajorbits(groupsof~32KB)– Note,insidethe32KBtuple block,thereismoreclustering
• Insideacachelinetuples tendtobelongtothesamegroup
– Idea:exploitthislocality(thesedeepbits)inoperators• Forreallycheapcachepartitioning• makejoinscache-conscious again
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
DDCExtensions
41
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
DDCPerformance
42
BDCC
DDC
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
Conclusion
• BDCC&DDC– cleverorderingoftables,andco-ordering oftables– millionsoftinygroups(NVRAMfriendly!)– Allthegoodiesinonego:
• fastselections(evencross-tablepropagation)• fastjoins,fastgroupbys,fastsorts(littleRAMneeded)
– Sidewaysinfopassingsandwichoperators• Noneedfornewjoin/aggr operators
– QOPTframeworkthatextendsinterestingorders– UpdatableusingLSMideas– dataisstoredonlyonce
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
DimensionConstruction
Dimension=setofbins
• Range-Binningofadomain
• Histogram-basedapproach– Needsfrequencyinformation
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
AssigningBinNumbers:NaïveWay
• Skew/frequentvalues(èsingle-valuebins)value frequency code c2 c1(null) .70 000 00 0HBO .15 001 00 0
Bachelor .08 010 01 0Master .06 011 01 0
PhD .01 100 10 1
value frequency code(null) .70 000
Polytech .15 001Bachelor .08 010Master .06 011
PhD .01 100
value frequency code c2(null) .70 000 00
Polytech .15 001 00Bachelor .08 010 01Master .06 011 01
PhD .01 100 10
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
Hu-TuckerBinning
• Frequency-basedBinNumberAssignment
Hu-Tucker =OrderRespectingHuffman Coding
value frequency code c3 c2 c1(null) .70 0000 000 00 0HBO .15 1000 100 10 1
Bachelor .08 1100 110 11 1Master .06 1110 111 11 1
PhD .01 1111 111 11 1
value frequency code c3 c2(null) .70 0000 000 00HBO .15 1000 100 10
Bachelor .08 1100 110 11Master .06 1110 111 11
PhD .01 1111 111 11
value frequency code c3(null) .70 0000 000HBO .15 1000 100
Bachelor .08 1100 110Master .06 1110 111
PhD .01 1111 111
value frequency code(null) .70 0000
Polytech .15 1000Bachelor .08 1100Master .06 1110
PhD .01 1111
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
Hu TuckerDimensionBinning
butwhyisthisrelevant?
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
Variety inDataDensityofColumns• l_linestatus 0.25 b/tuple• l_comment 30 b/tuple
Factor120difference
WhatistheoptimalBDCbinsize?- Dependsondiskblock size- Dependsoncolumndensity
Whattodoifaqueryaccessesmultiplecolumnsofverydifferent densities?
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
GranularityTuninginBDCC
1. Isanissueduringtablecreation– Adimensionisusedinmultipletables– eachtableneedsadifferentgranularity
2. Isanissueduringqueryexecution– Tableisclusteredatsomegranularity– Givenasetofcolumns toscan:
atwhatgranularitytoscanthetable?
BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris
Z-OrderingforColumnStoresthereisacolumn-storespecificargumentforbitinterleaving,also:• supposeBDCC-scan(T,C1) isefficientat8 bits,needingsortedaccesstosupplier(s)• supposeBDCC-scan(T,C2) thatselectsothercolumnsC2 thatareonaveragemuch
smallerthanthoseinC1,isefficientonlyupto5 bitsgranularity
Takeaway:columnstoresneedavariableaccessgranularity• Major-minorclusteringleavestheminordimensionunusableforthincolumns(C2)• Bit-interleaving(Z-ordering)allowsthincolumnscanstoprofitfromalldimensions
: d3d2d1c3c2c1s3s2s1 : d3s3c3d2s2c2d1s1c1
BDCC-scan(T,C1) d3d2d1c3c2c1s3s2s1 d3s3c3d2s2c2d1s1c1
BDCC-scan(T,C2) d3d2d1c3c2c1.L.… d3s3c3d2s2c2d1s1c1
Bit5Bit8 Bit5Bit8FastI/Oaccessuntil..
Major-minorclustering Z-Ordering_key_shape:
Top Related