1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9...
-
Upload
anis-miles -
Category
Documents
-
view
239 -
download
1
Transcript of 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9...
![Page 1: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/1.jpg)
1
Scalable Distributed Scalable Distributed Data StructuresData Structures
SState-of-the-arttate-of-the-art Part 1Part 1
Scalable Distributed Scalable Distributed Data StructuresData Structures
SState-of-the-arttate-of-the-art Part 1Part 1
Witold LitwinWitold Litwin
Paris 9Paris 9
[email protected]@dauphine.fr
![Page 2: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/2.jpg)
2
PlanPlanPlanPlan
What are SDDSs ?What are SDDSs ? Why they are needed ?Why they are needed ? Where are we in 1996 ?Where are we in 1996 ?
– Existing SDDSsExisting SDDSs– Gaps & On-going workGaps & On-going work
ConclusionConclusion– Future workFuture work
![Page 3: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/3.jpg)
3
What is an SDDSWhat is an SDDSWhat is an SDDSWhat is an SDDS
A new type of data structureA new type of data structure– Specifically for Specifically for multicomputersmulticomputers
Designed for Designed for high-performancehigh-performance files : files :– horizontalhorizontal scalability to very large sizes scalability to very large sizes
» larger than any single-site filelarger than any single-site file
– parallel and distributed processing parallel and distributed processing » especially in (distributed) RAMespecially in (distributed) RAM
– access time better than for any disk fileaccess time better than for any disk file– 200 200 s under NT (100 Mb/s net, 1KB records)s under NT (100 Mb/s net, 1KB records)
– distributed autonomous clientsdistributed autonomous clients
![Page 4: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/4.jpg)
4
Killer appsKiller appsKiller appsKiller apps Storage serversStorage servers
– software & hardware scalable & HA serverssoftware & hardware scalable & HA servers– commodity component based commodity component based
» Do-It-Yourself-RAIDDo-It-Yourself-RAID
Object storage serversObject storage servers Object-relational databasesObject-relational databases WEB serversWEB servers
– like Inktomilike Inktomi
Video serversVideo servers Real-time systemsReal-time systems HP Scientific data processingHP Scientific data processing
![Page 5: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/5.jpg)
5
MulticomputersMulticomputersMulticomputersMulticomputers A collection of loosely coupled computersA collection of loosely coupled computers
– common and/or preexisting hardwarecommon and/or preexisting hardware– share nothing architectureshare nothing architecture– message passing through message passing through high-speedhigh-speed net ( net (Mb/s)Mb/s)
Network Network multicomputersmulticomputers– use general purpose netsuse general purpose nets
» LANs: Ethernet, Token Ring, Fast Ethernet, SCI, FDDI...LANs: Ethernet, Token Ring, Fast Ethernet, SCI, FDDI...» WANs: ATM...WANs: ATM...
SwitchedSwitched multicomputers multicomputers– use a bus, or a switchuse a bus, or a switch
» e.g., IBM-SP2, Parsytece.g., IBM-SP2, Parsytec
![Page 6: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/6.jpg)
6
Client Server
Network multicomputer
![Page 7: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/7.jpg)
7
Why multicomputers ?Why multicomputers ?Why multicomputers ?Why multicomputers ? Potentially unbeatable price-performance ratioPotentially unbeatable price-performance ratio
– Much cheaper and more powerful than supercomputersMuch cheaper and more powerful than supercomputers» 1500 WSs at HPL with 500+ GB of RAM & TBs of disks1500 WSs at HPL with 500+ GB of RAM & TBs of disks
Potential computing powerPotential computing power– file sizefile size– access and processing timeaccess and processing time– throughputthroughput
For more pro & cons :For more pro & cons :– Bill Gates at Microsoft Scalability DayBill Gates at Microsoft Scalability Day– NOW project (UC Berkeley)NOW project (UC Berkeley)– Tanenbaum: "Distributed Operating Systems", Prentice Hall, 1995Tanenbaum: "Distributed Operating Systems", Prentice Hall, 1995– www.microoft.com White Papers from Business Syst. Div. www.microoft.com White Papers from Business Syst. Div.
![Page 8: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/8.jpg)
8
Why SDDSsWhy SDDSsWhy SDDSsWhy SDDSs
Multicomputers need data structures and file Multicomputers need data structures and file systemssystems
Trivial extensions of traditional structures are Trivial extensions of traditional structures are not bestnot best
hot-spotshot-spots scalabilityscalability parallel queriesparallel queries distributed and autonomous clientsdistributed and autonomous clients distributed RAM & distance to datadistributed RAM & distance to data
![Page 9: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/9.jpg)
9
Distance to dataDistance to data(Jim Gray)(Jim Gray)
Distance to dataDistance to data(Jim Gray)(Jim Gray)
100 ns
1 sec
10 msec
RAM
distant RAM(gigabit net)
local disk
100 sec distant RAM(Ethernet)
![Page 10: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/10.jpg)
10
Distance to dataDistance to dataDistance to dataDistance to data
100 nsec
1 sec
10 msec
RAM
distant RAM(gigabit net)
local disk
100 sec distant RAM(Ethernet)
1 min
![Page 11: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/11.jpg)
11
Distance to dataDistance to dataDistance to dataDistance to data
100 ns
1 sec
10 msec
RAM
distant RAM(gigabit net)
local disk
100 sec distant RAM(Ethernet)
1 min
10 min
![Page 12: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/12.jpg)
12
Distance to dataDistance to dataDistance to dataDistance to data
100 ns
1 sec
10 msec
RAM
distant RAM(gigabit net)
local disk
100 sec distant RAM(Ethernet)
1 min
10 min
2 hours
![Page 13: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/13.jpg)
13
Distance to dataDistance to dataDistance to dataDistance to data
100 ns
1 sec
10 msec
RAM
distant RAM(gigabit net)
local disk
100 sec distant RAM(Ethernet)
1 min
10 min
8 days
lune
2 hours
![Page 14: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/14.jpg)
14
Economy etc.Economy etc.Economy etc.Economy etc. Price of RAM storage dropped in 1996 Price of RAM storage dropped in 1996
almost almost 1010 times times !!– $10 for 16 MB (production price)$10 for 16 MB (production price)– $30-40 for 16 MB RAM (end user price)$30-40 for 16 MB RAM (end user price)
» $47 for 32 MB (Fry’s price, Aug. 1997)$47 for 32 MB (Fry’s price, Aug. 1997)
– $1000 for 1GB$1000 for 1GB RAM storage is eternal (no mech. parts)RAM storage is eternal (no mech. parts) RAM storage can grow incrementallyRAM storage can grow incrementally NT plans for 64b addressing for VLMNT plans for 64b addressing for VLM MS plans for VLM-DBMSMS plans for VLM-DBMS
![Page 15: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/15.jpg)
15
What is an SDDSWhat is an SDDSWhat is an SDDSWhat is an SDDS A A scalablescalable data structure where: data structure where: Data are on Data are on serversservers
– always available for accessalways available for access
Queries come from autonomous Queries come from autonomous clientsclients– available for access only on their initiativeavailable for access only on their initiative
There is no centralized directoryThere is no centralized directory Clients may make Clients may make addressing errorsaddressing errors
» Clients have less or more adequate Clients have less or more adequate image image of the actual file structureof the actual file structure
Servers are able to Servers are able to forwardforward the queries to the correct address the queries to the correct address– perhaps in several messagesperhaps in several messages
Servers may send Servers may send Image Adjustment MessagesImage Adjustment Messages» Clients do not make same error twiceClients do not make same error twice
![Page 16: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/16.jpg)
16
An SDDSAn SDDSAn SDDSAn SDDS
Clients
growth through splits under inserts
Servers
![Page 17: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/17.jpg)
17
An SDDSAn SDDSAn SDDSAn SDDS
Clients
growth through splits under inserts
Servers
![Page 18: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/18.jpg)
18
An SDDSAn SDDSAn SDDSAn SDDS
Clients
growth through splits under inserts
Servers
![Page 19: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/19.jpg)
19
An SDDSAn SDDSAn SDDSAn SDDS
Clients
growth through splits under inserts
Servers
![Page 20: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/20.jpg)
20
An SDDSAn SDDSAn SDDSAn SDDS
Clients
growth through splits under inserts
Servers
![Page 21: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/21.jpg)
21
An SDDSAn SDDSAn SDDSAn SDDS
Clients
![Page 22: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/22.jpg)
22
Clients
An SDDSAn SDDSAn SDDSAn SDDS
![Page 23: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/23.jpg)
23
Clients
IAM
An SDDSAn SDDSAn SDDSAn SDDS
![Page 24: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/24.jpg)
24
Clients
An SDDSAn SDDSAn SDDSAn SDDS
![Page 25: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/25.jpg)
25
Clients
An SDDSAn SDDSAn SDDSAn SDDS
![Page 26: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/26.jpg)
26
Performance measuresPerformance measuresPerformance measuresPerformance measures
Storage costStorage cost– load factorload factor
» same definitions as for the traditional DSssame definitions as for the traditional DSs
Access costAccess cost messagingmessaging
– number of messages (rounds)number of messages (rounds)» network independentnetwork independent
– access timeaccess time
![Page 27: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/27.jpg)
27
Access performance measuresAccess performance measuresAccess performance measuresAccess performance measures
Query costQuery cost– key searchkey search
» forwarding costforwarding cost– insertinsert
» split costsplit cost– deletedelete
» merge costmerge cost– Parallel search, range search, partial match search, bulk Parallel search, range search, partial match search, bulk
insert... insert... Average & worst-case costsAverage & worst-case costs Client image convergence costClient image convergence cost New or less active client costsNew or less active client costs
![Page 28: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/28.jpg)
28
Known SDDSsKnown SDDSsKnown SDDSsKnown SDDSs
DS
Classics
![Page 29: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/29.jpg)
29
Known SDDSsKnown SDDSsKnown SDDSsKnown SDDSs
Hash
SDDS(1993)
LH* DDH
Breitbart & al
DS
Classics
![Page 30: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/30.jpg)
30
Known SDDSsKnown SDDSsKnown SDDSsKnown SDDSs
Hash
SDDS(1993)
1-d tree
LH* DDH
Breitbart & al RP* Kroll & Widmayer
DS
Classics
![Page 31: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/31.jpg)
31
Known SDDSsKnown SDDSsKnown SDDSsKnown SDDSs
Hash
SDDS(1993)
1-d tree
LH* DDH
Breitbart & al RP* Kroll & Widmayer
m-d trees
k-RP*dPi-tree
DS
Classics
![Page 32: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/32.jpg)
32
Known SDDSsKnown SDDSsKnown SDDSsKnown SDDSs
Hash
SDDS(1993)
1-d tree
LH* DDH
Breitbart & al RP* Kroll & Widmayer
m-d trees
DS
Classics
H-Avail.
LH*m, LH*gSecurity
LH*s
k-RP*dPi-tree
![Page 33: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/33.jpg)
33
Known SDDSsKnown SDDSsKnown SDDSsKnown SDDSs
Hash
SDDS(1993)
1-d tree
LH* DDH
Breitbart & al RP* Kroll & Widmayer
m-d trees
DS
Classics
H-Avail.
LH*m, LH*gSecurity
LH*s
k-RP*dPi-tree
s-availabilityLH*sa
![Page 34: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/34.jpg)
34
LH* LH* ((A classic)A classic)LH* LH* ((A classic)A classic)
Allows for the primary key (OID) based hash filesAllows for the primary key (OID) based hash files– generalizes the LH addressing schemageneralizes the LH addressing schema
Load factor 70 - 90 %Load factor 70 - 90 % At most 2 forwarding messagesAt most 2 forwarding messages
– regardless of the size of the fileregardless of the size of the file
In practice, 1 m/insert and 2 m/search on the In practice, 1 m/insert and 2 m/search on the averageaverage
4 messages in the worst case4 messages in the worst case Search time of 1 ms (10 Mb/s net), of 150 ms Search time of 1 ms (10 Mb/s net), of 150 ms
(100 Mb/s net) and of 30 us (Gb/s net)(100 Mb/s net) and of 30 us (Gb/s net)
![Page 35: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/35.jpg)
35
Overview of LHOverview of LHOverview of LHOverview of LH Extensible hash algorithmExtensible hash algorithm
– used, e.g., used, e.g., » Netscape browser (100M copies)Netscape browser (100M copies)
» LH-Server by AR (700K copies sold)LH-Server by AR (700K copies sold)– tought in most DB and DS classestought in most DB and DS classes– address space expandsaddress space expands
» to avoid overflows & access performance to avoid overflows & access performance deterioration deterioration
the file has buckets with capacity the file has buckets with capacity bb >> 1 >> 1 Hash by division Hash by division hhii : c -> c : c -> c mod 2 mod 2i i NN provides the address provides the address
h (c)h (c) of key of key cc.. Buckets split through the replacement of hBuckets split through the replacement of hii with with h h ii+1 +1 ; ; i = i =
0,1,..0,1,.. On the average, On the average, b/2 b/2 keys move towards new bucketkeys move towards new bucket
![Page 36: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/36.jpg)
36
Overview of LHOverview of LHOverview of LHOverview of LH
Basically, a split occurs when some bucket Basically, a split occurs when some bucket m m overflowsoverflows
One splits bucket One splits bucket n, n, pointed by pointer pointed by pointer nn..– usually usually m m nn
n n évolue : 0, 0,1, 0,1,..,2, 0,1..,3, 0,..,7, 0,..,2évolue : 0, 0,1, 0,1,..,2, 0,1..,3, 0,..,7, 0,..,2i i NN, 0.., 0.. One consequence => no index One consequence => no index
– characteristic of other EH schemescharacteristic of other EH schemes
![Page 37: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/37.jpg)
37
LH File EvolutionLH File EvolutionLH File EvolutionLH File Evolution
351271524
h0 ; n = 0
N = 1b = 4i = 0
hh00 : c -> : c -> 2200
0
![Page 38: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/38.jpg)
38
LH File EvolutionLH File EvolutionLH File EvolutionLH File Evolution
351271524
h1 ; n = 0
N = 1b = 4i = 0
hh11 : c -> : c -> 2211
0
![Page 39: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/39.jpg)
39
LH File EvolutionLH File EvolutionLH File EvolutionLH File Evolution
1224
h1 ; n = 0
N = 1b = 4i = 1
hh11 : c -> : c -> 2211
0
35715
1
![Page 40: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/40.jpg)
40
LH File EvolutionLH File EvolutionLH File EvolutionLH File Evolution
32581224
N = 1b = 4i = 1
hh11 : c -> : c -> 2211
0
211135715
1
hh11 hh11
![Page 41: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/41.jpg)
41
LH File EvolutionLH File EvolutionLH File EvolutionLH File Evolution
321224
N = 1b = 4i = 1
hh22 : c -> : c -> 2222
0
211135715
1
58
2
hh22 hh11 hh22
![Page 42: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/42.jpg)
42
LH File EvolutionLH File EvolutionLH File EvolutionLH File Evolution
321224
N = 1b = 4i = 1
hh22 : c -> : c -> 2222
0
33211135715
1
58
2
hh22 hh11 hh22
![Page 43: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/43.jpg)
43
LH File EvolutionLH File EvolutionLH File EvolutionLH File Evolution
321224
N = 1b = 4i = 1
hh22 : c -> : c -> 2222
0
3321
1
58
2
hh22 hh22 hh22
1135715
3
hh22
![Page 44: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/44.jpg)
44
LH File EvolutionLH File EvolutionLH File EvolutionLH File Evolution
321224
N = 1b = 4i = 2
hh22 : c -> : c -> 2222
0
3321
1
58
2
hh22 hh22 hh22
1135715
3
hh22
![Page 45: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/45.jpg)
45
LH File EvolutionLH File EvolutionLH File EvolutionLH File Evolution
EtcEtc– One starts One starts hh3 3 then then hh4 4 ......
The file can expand as much as needed The file can expand as much as needed
– without too many overflows everwithout too many overflows ever
![Page 46: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/46.jpg)
46
Addressing AlgorithmAddressing AlgorithmAddressing AlgorithmAddressing Algorithm
a <- h (i, c) a <- h (i, c)
if if n n = 0 alors exit = 0 alors exit
elseelse
if a < n then a <- h (i+1, c) ;if a < n then a <- h (i+1, c) ;
endend
![Page 47: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/47.jpg)
47
LH*LH*LH*LH*
Property of LH :Property of LH :– Given Given j = i j = i or or j = i j = i + 1, key + 1, key c c is in bucket is in bucket m m iffiff
hhj j ((cc) = ) = m ; j = i m ; j = i ou ou j = i j = i + 1+ 1» Verify yourselfVerify yourself
Ideas for LH* :Ideas for LH* :– LH addresing rule = global rule for LH* fileLH addresing rule = global rule for LH* file– every bucket at a serverevery bucket at a server– bucket level bucket level j j in the header in the header – Check the LH property when the key comes form a Check the LH property when the key comes form a
clientclient
![Page 48: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/48.jpg)
48
LH* : file structure LH* : file structure LH* : file structure LH* : file structure
j = 4
0
j = 4
1
j = 3
2
j = 3
7
j = 4
8
j = 4
9
n = 2 ; i = 3
n' = 0, i' = 0 n' = 3, i' = 2 Coordinator
Client Client
servers
![Page 49: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/49.jpg)
49
LH* : file structureLH* : file structureLH* : file structureLH* : file structure
j = 4
0
j = 4
1
j = 3
2
j = 3
7
j = 4
8
j = 4
9
n = 2 ; i = 3
n' = 0, i' = 0 n' = 3, i' = 2 Coordinator
Client Client
servers
![Page 50: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/50.jpg)
50
LH* : splitLH* : splitLH* : splitLH* : split
j = 4
0
j = 4
1
j = 3
2
j = 3
7
j = 4
8
j = 4
9
n = 2 ; i = 3
n' = 0, i' = 0 n' = 3, i' = 2 Coordinator
Client Client
servers
![Page 51: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/51.jpg)
51
LH* : splitLH* : splitLH* : splitLH* : split
j = 4
0
j = 4
1
j = 3
2
j = 3
7
j = 4
8
j = 4
9
n = 2 ; i = 3
n' = 0, i' = 0 n' = 3, i' = 2 Coordinator
Client Client
servers
![Page 52: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/52.jpg)
52
LH* : splitLH* : splitLH* : splitLH* : split
j = 4
0
j = 4
1
j = 4
2
j = 3
7
j = 4
8
j = 4
9
n = 3 ; i = 3
n' = 0, i' = 0 n' = 3, i' = 2 Coordinator
Client Client
servers
j = 4
10
![Page 53: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/53.jpg)
53
LH* Addressing SchemaLH* Addressing SchemaLH* Addressing SchemaLH* Addressing Schema
Client Client – computes the LH address computes the LH address m m of of c c using its image, using its image, – send send c c to bucket to bucket mm
ServerServer– Server Server a a getting key getting key cc, , a a = = m m in particular,in particular, computes :computes :
a' := ha' := hjj (c) ; (c) ;
if if a' = a a' = a thenthen accept accept c ; c ;
else else a'' := ha'' := hj - j - 11 (c) ;(c) ;
if if a'' > a a'' > a and and a'' a'' < < a' a' then then a' a' := := a'' ; a'' ;
send send c c to bucket to bucket a' ;a' ;
![Page 54: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/54.jpg)
54
LH* Addressing SchemaLH* Addressing SchemaLH* Addressing SchemaLH* Addressing Schema
Client Client – computes the LH address computes the LH address m m of of c c using its image, using its image,
– send send c c to bucket to bucket mm
ServerServer– Server Server a a getting key getting key cc, , a a = = m m in particular,in particular, computes :computes :
a' := ha' := hjj (c) ; (c) ;
if if a' = a a' = a thenthen accept accept c ; c ;
else else a'' := ha'' := hj - j - 11 (c) ;(c) ;
if if a'' > a a'' > a and and a'' a'' < < a' a' then then a' a' := := a'' ; a'' ;
send send c c to bucket to bucket a' ;a' ; See [LNS93] for the (long) proof
Simple ?
![Page 55: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/55.jpg)
55
Client Image AdjustementClient Image AdjustementClient Image AdjustementClient Image Adjustement The IAM consists of address The IAM consists of address a a where the client sent where the client sent c c and of and of
j j ((aa))– i' i' is presumed is presumed i i in client's in client's image.image.– n' n' is preumed value of pointer is preumed value of pointer n n in client's in client's image.image.
– initially, initially, i' i' = = n' n' = 0.= 0.
if if j j > > i' i' then then i' i' := := j - j - 1, 1, n' n' := := a +a +1 ; 1 ;
if if n' n' 2^2^i' i' then then n' n' = 0, = 0, i' i' := := i' i' +1 ;+1 ;
The algo. garantees that client image is within the file The algo. garantees that client image is within the file [LNS93][LNS93]– if there is no file contractions (merge)if there is no file contractions (merge)
![Page 56: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/56.jpg)
56
LH* : addressingLH* : addressingLH* : addressingLH* : addressing
j = 4
0
j = 4
1
j = 4
2
j = 3
7
j = 4
8
j = 4
9
n = 3 ; i = 3
n' = 0, i' = 0 n' = 3, i' = 2 Coordinateur
Client Client
servers
j = 4
10
15
![Page 57: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/57.jpg)
57
LH* : addressingLH* : addressingLH* : addressingLH* : addressing
j = 4
0
j = 4
1
j = 4
2
j = 3
7
j = 4
8
j = 4
9
n = 3 ; i = 3
n' = 0, i' = 0 n' = 3, i' = 2 Coordinateur
Client Client
servers
j = 4
10
15
![Page 58: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/58.jpg)
58
LH* : addressingLH* : addressingLH* : addressingLH* : addressing
j = 4
0
j = 4
1
j = 4
2
j = 3
7
j = 4
8
j = 4
9
n = 3 ; i = 3
n' = 0, i' = 3 n' = 3, i' = 2 Coordinateur
Client Client
servers
j = 4
10
15
j = 3
![Page 59: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/59.jpg)
59
LH* : addressingLH* : addressingLH* : addressingLH* : addressing
j = 4
0
j = 4
1
j = 4
2
j = 3
7
j = 4
8
j = 4
9
n = 3 ; i = 3
n' = 0, i' = 0 n' = 3, i' = 2 Coordinateur
Client Client
servers
j = 4
10
9
![Page 60: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/60.jpg)
60
LH* : addressingLH* : addressingLH* : addressingLH* : addressing
j = 4
0
j = 4
1
j = 4
2
j = 3
7
j = 4
8
j = 4
9
n = 3 ; i = 3
n' = 0, i' = 0 n' = 3, i' = 2 Coordinateur
Client Client
servers
j = 4
10
9
![Page 61: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/61.jpg)
61
LH* : addressingLH* : addressingLH* : addressingLH* : addressing
j = 4
0
j = 4
1
j = 4
2
j = 3
7
j = 4
8
j = 4
9
n = 3 ; i = 3
n' = 0, i' = 0 n' = 3, i' = 2 Coordinateur
Client Client
servers
j = 4
10
9
![Page 62: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/62.jpg)
62
LH* : addressingLH* : addressingLH* : addressingLH* : addressing
j = 4
0
j = 4
1
j = 4
2
j = 3
7
j = 4
8
j = 4
9
n = 3 ; i = 3
n' = 1, i' = 3 n' = 3, i' = 2 Coordinateur
Client Client
servers
j = 4
10
9
j = 4
![Page 63: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/63.jpg)
63
ResultResultResultResult
The distributed file can grow to even whole The distributed file can grow to even whole Internet so that :Internet so that :– every insert and search are done in four every insert and search are done in four
messages (IAM included)messages (IAM included)– in general an insert is done in one message in general an insert is done in one message
and search in two messagesand search in two messages– proof in [LNS 93]proof in [LNS 93]
![Page 64: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/64.jpg)
64
10,000 inserts
Global cost
Client's cost
![Page 65: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/65.jpg)
65
![Page 66: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/66.jpg)
66
![Page 67: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/67.jpg)
67
Inserts by two clients
![Page 68: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/68.jpg)
68
Parallel QueriesParallel QueriesParallel QueriesParallel Queries A query A query QQ for all buckets of file for all buckets of file FF with independent local with independent local
executionsexecutions– everyevery buckets should get buckets should get Q Q exactly once exactly once
The basis for function shippingThe basis for function shipping– fundamental for high-perf. DBMS appl.fundamental for high-perf. DBMS appl.
Send Mode :Send Mode :– multicastmulticast
» not always possible or convenientnot always possible or convenient
– unicastunicast» client may not know all the serversclient may not know all the servers
» severs have to forward the querysevers have to forward the query
– how ??how ??Image
File
Q
![Page 69: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/69.jpg)
69
LH* Algorithm for LH* Algorithm for Parallel QueriesParallel Queries(unicast)(unicast)
LH* Algorithm for LH* Algorithm for Parallel QueriesParallel Queries(unicast)(unicast)
Client sends Client sends Q Q to every bucket to every bucket a a in the imagein the image The message with The message with QQ has the has the message level message level jj' :' :
– initialy initialy j' =j' = i' i' if if n' n' i'i' elseelse j' = j' = i' + i' + 11
– bucket bucket a a (of level (of level j j ) copies ) copies Q Q to all its children using to all its children using the alg. :the alg. :
while while j' j' < < j j dodo
j' j' := := j' j' + 1+ 1
forward (forward (QQ, , j' j' ) à case ) à case a a + 2 + 2 j' - j' - 1 1 ; ;
endwhileendwhile Prove it !Prove it !
![Page 70: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/70.jpg)
70
Termination of Termination of Parallel QueryParallel Query (multicast or unicast)(multicast or unicast)
Termination of Termination of Parallel QueryParallel Query (multicast or unicast)(multicast or unicast)
How client How client C C knows that last reply came ?knows that last reply came ? Deterministic Solution (expensive)Deterministic Solution (expensive)
– Every bucket sends its Every bucket sends its j, m j, m and selected records if anyand selected records if any
» m m is its (logical) addressis its (logical) address
– The client terminates when it has every The client terminates when it has every m m fullfiling the fullfiling the condition ;condition ;
» m = m = 0,1..., 2 0,1..., 2 i i + n + n wherewhere– i i = min (= min (jj)) and and n n = min (= min (mm) where ) where j = i j = i
i+1 i i+1n
![Page 71: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/71.jpg)
71
Termination of Termination of Parallel QueryParallel Query (multicast or unicast)(multicast or unicast)
Termination of Termination of Parallel QueryParallel Query (multicast or unicast)(multicast or unicast)
Probabilistic Termination ( may need less messaging)Probabilistic Termination ( may need less messaging)
– all and only buckets with selected records replyall and only buckets with selected records reply
– after each reply after each reply C C reinitialises a time-out reinitialises a time-out TT
– C C terminates when terminates when T T expiresexpires Practical choice of Practical choice of T is T is network and query dependentnetwork and query dependent
– ex. 5 times Ethernet everage retry timeex. 5 times Ethernet everage retry time
» 1-2 msec ?1-2 msec ?– experiments neededexperiments needed
Which termination is finally more useful in practice ?Which termination is finally more useful in practice ?
– an open probleman open problem
![Page 72: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/72.jpg)
72
LH* variantsLH* variantsLH* variantsLH* variants
With/without load (factor) controlWith/without load (factor) control With/without the (split) coordinatorWith/without the (split) coordinator
– the former one was discussedthe former one was discussed
– the latter one is a token-passing schemathe latter one is a token-passing schema
» bucket with the token is next to splitbucket with the token is next to split– if an insert occurs, and file overload is guessedif an insert occurs, and file overload is guessed
– several algs. for the decisionseveral algs. for the decision
– use cascading splitsuse cascading splits
![Page 73: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/73.jpg)
73
Load factor for uncontrolled splitting
![Page 74: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/74.jpg)
74
Load factor for different load control strategies and threshold t = 0.8
![Page 75: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/75.jpg)
75
![Page 76: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/76.jpg)
76
LH* for LH* for switched multicomputersswitched multicomputersLH* for LH* for switched multicomputersswitched multicomputers
LH*LH*LHLH
– implemented on Parsytec machineimplemented on Parsytec machine» 32 Power PCs32 Power PCs» 2 GB of RAM (128 GB / CPU)2 GB of RAM (128 GB / CPU)
– uses uses » LH for the bucket managementLH for the bucket management» conurrent LH* splitting (described later on)conurrent LH* splitting (described later on)
– access times : < 1 msaccess times : < 1 ms Presented at EDBT-96Presented at EDBT-96
![Page 77: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/77.jpg)
77
LH* with presplittingLH* with presplittingLH* with presplittingLH* with presplitting
(Pre)splits are done "internally" immediately when an (Pre)splits are done "internally" immediately when an overflow occursoverflow occurs
Become visible to clients, only when LH* split should be Become visible to clients, only when LH* split should be normally performednormally performed
AdvantagesAdvantages– less overflows on sitesless overflows on sites
– parallel splitsparallel splits
DrawbacksDrawbacks– Load factorLoad factor
– Possibly longer forwardingsPossibly longer forwardings
Analysis remains to be doneAnalysis remains to be done
![Page 78: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/78.jpg)
78
LH* with concurrent LH* with concurrent splittingsplitting
LH* with concurrent LH* with concurrent splittingsplitting
Inserts and searches can be done Inserts and searches can be done concurrently with the splitting in progressconcurrently with the splitting in progress– used by LH*used by LH*LHLH
AdvantagesAdvantages– obviousobvious– and see EDBT-96and see EDBT-96
DrawbacksDrawbacks– + alg. complexity+ alg. complexity
![Page 79: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/79.jpg)
79
Research FrontierResearch FrontierResearch FrontierResearch Frontier Actual implementationActual implementation
– the SDDS protocolsthe SDDS protocols» Reuse the MS CFIS protocolReuse the MS CFIS protocol» + record types, forwarding, splitting, IAMs...+ record types, forwarding, splitting, IAMs...
– system architecturesystem architecture» client, server, sockets, UDP, TCP/IP, NT, Unix...client, server, sockets, UDP, TCP/IP, NT, Unix...» ThreadsThreads
Actual performanceActual performance» 250 us per search250 us per search
– 1 KB records, 100 mb AnyLan Ethernet1 KB records, 100 mb AnyLan Ethernet
– 40 times faster than a disk40 times faster than a disk
– e.g. response time of a join improves from 1m to 1.5 s.e.g. response time of a join improves from 1m to 1.5 s.
![Page 80: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/80.jpg)
80
Research FrontierResearch FrontierResearch FrontierResearch Frontier
Use within a DBMS Use within a DBMS » scalable AMOS, DB2 Parallel, Accessscalable AMOS, DB2 Parallel, Access
– replace the traditional disk access methodsreplace the traditional disk access methods» DBMS is the single SDDS clientDBMS is the single SDDS client
– LH* and perhaps other SDDSsLH* and perhaps other SDDSs
– use function shippinguse function shipping– use from multiple distributed SDDS clientsuse from multiple distributed SDDS clients
» concurrency, transactions, recovery...concurrency, transactions, recovery...
Other applicationsOther applications– A scalable WEB server (like INKTOMI)A scalable WEB server (like INKTOMI)
![Page 81: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/81.jpg)
81
TraditionalTraditionalTraditionalTraditional
DBMSDBMS
FMSFMS
![Page 82: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/82.jpg)
82
SDDS 1st stageSDDS 1st stageSDDS 1st stageSDDS 1st stage
DBMSDBMS
S S SS
Client
40 - 80 times faster recordaccess
Memorymappedfiles
![Page 83: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/83.jpg)
83
SDDS 2nd stageSDDS 2nd stageSDDS 2nd stageSDDS 2nd stage
DBMSDBMS
S S SS
Client
40 - 80 times faster recordaccess
n times faster non-keysearch
![Page 84: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/84.jpg)
84
SDDS 3rd stageSDDS 3rd stageSDDS 3rd stageSDDS 3rd stage
DBMSDBMS
S S SS
Client
DBMSDBMS
S
Client
40 - 80 times faster recordaccess
n times faster non-keysearch
larger fileshigher throughput
![Page 85: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/85.jpg)
85
ConclusionConclusionConclusionConclusion
Since their inception, in 1993, SDDS were Since their inception, in 1993, SDDS were subject to important research effortsubject to important research effort
In a few years, several schemes appearedIn a few years, several schemes appeared– with the basic functions of the traditional fileswith the basic functions of the traditional files
» hash, primary key ordered, multi-attribute k-d hash, primary key ordered, multi-attribute k-d accessaccess
– providing for much faster and larger filesproviding for much faster and larger files– confirming inital expectationsconfirming inital expectations
![Page 86: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/86.jpg)
86
Future workFuture workFuture workFuture work Deeper analysisDeeper analysis
– formal methods, simulations & experimentsformal methods, simulations & experiments Prototype implementationPrototype implementation
– SDDS protocol (on-going in Paris 9)SDDS protocol (on-going in Paris 9) New schemesNew schemes
– High-Availability & SecurityHigh-Availability & Security– R* - trees ?R* - trees ?
Killer appsKiller apps– large storage server & object serverslarge storage server & object servers– object-relational databasesobject-relational databases
» Schneider, D & al (COMAD-94)Schneider, D & al (COMAD-94)– video serversvideo servers– real-timereal-time– HP scientific data processingHP scientific data processing
![Page 87: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/87.jpg)
87
ENDEND(Part 1)(Part 1)
ENDEND(Part 1)(Part 1)
Thank you for your attentionThank you for your attention
Witold [email protected]@cs.berkeley.edu
![Page 88: 1 Scalable Distributed Data Structures S tate-of-the-art Part 1 Witold Litwin Paris 9 litwin@dauphine.fr.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649f455503460f94c6728e/html5/thumbnails/88.jpg)
88