files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data...
Transcript of files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data...
![Page 1: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/1.jpg)
Query processing on raw files
Vítor Uwe Reus
![Page 2: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/2.jpg)
Outline
1. Introduction
2. Adaptive Indexing
3. Hybrid MapReduce
4. NoDB
5. Summary
![Page 3: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/3.jpg)
Outline
1. Introduction
2. Adaptive Indexing
3. Hybrid MapReduce
4. NoDB
5. Summary
![Page 4: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/4.jpg)
Information Storing
Sometimes human-readable, open format
Not physically optimized for querying
Might be useful in some cases
Raw Files
![Page 5: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/5.jpg)
![Page 6: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/6.jpg)
Big Data
Traditional DBMS may not be a good option
Internet-scale business
Scientific data
![Page 7: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/7.jpg)
The fourth paradigm
For scientific discovery
Experimental
Theoretical
Computational (simulations)
Data driven
![Page 8: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/8.jpg)
Interoperability
![Page 9: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/9.jpg)
Interoperability
Information interoperability
Application interoperability
![Page 10: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/10.jpg)
Human-sourced Information
![Page 11: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/11.jpg)
How to query raw files?
![Page 12: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/12.jpg)
State of the art
Raw file as storage
A-priori loading
![Page 13: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/13.jpg)
Raw file parsing
AWK
Oracle external table
MySQL CSV engine
MapReduce
Read entire data all times
No indexing features
![Page 14: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/14.jpg)
A-priori loading
Load into a DBMS and then queryBenefit from indexes
TimeLabor intensive
Loading scripts, schemasData duplication
Big dataVersioning
![Page 15: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/15.jpg)
Workload behavior
![Page 16: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/16.jpg)
Load time vs Query time
![Page 17: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/17.jpg)
Hybrid querying techniques
![Page 18: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/18.jpg)
Outline
1. Introduction
2. Adaptive Indexing
3. Hybrid MapReduce
4. NoDB
5. Summary
![Page 19: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/19.jpg)
Automatic tuning based on workload
Keep an auxiliary structure
Can benefit raw file parsing
Database Cracking
Adaptive Merging
Adaptive indexing
![Page 20: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/20.jpg)
Database cracking
Physical reorganization of columns
Implemented on MonetDB
A column store, but can be generalized (raw)
![Page 21: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/21.jpg)
Database Cracking
Cracking a column
![Page 22: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/22.jpg)
Database Cracking
Column A → Copy to cracker column ACRK
AVL tree indexing
Refinement
![Page 23: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/23.jpg)
Tuple reconstruction
Fast if columns are in same order
Cracking compromises original positions
Cracker columns: Value selection
Original columns: Tuple reconstruction
![Page 24: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/24.jpg)
Adaptive merging
Incremental index creation as in cracking
Partitioned B-trees
Focus on merging instead of partitioning
![Page 25: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/25.jpg)
Merging vs cracking
Typical result of merging compared to cracking*In this case, all queries focus on the same 106 keys in the center of the domain
![Page 26: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/26.jpg)
Merging vs cracking
Cracking MergingConverge Stable FasterStorage AVL B-TreeData is Partitioned ...and Sortedas in.. Quick Sort Merge Sort
![Page 27: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/27.jpg)
Outline
1. Introduction
2. Adaptive Indexing
3. Hybrid MapReduce
4. NoDB
5. Summary
![Page 28: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/28.jpg)
Hybrid MapReduce
What is needed
![Page 29: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/29.jpg)
HadoopDB
MapReduce using a DBMS instead of HDFS
![Page 30: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/30.jpg)
![Page 31: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/31.jpg)
![Page 32: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/32.jpg)
![Page 33: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/33.jpg)
![Page 34: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/34.jpg)
![Page 35: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/35.jpg)
![Page 36: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/36.jpg)
![Page 37: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/37.jpg)
![Page 38: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/38.jpg)
![Page 39: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/39.jpg)
![Page 40: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/40.jpg)
![Page 41: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/41.jpg)
![Page 42: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/42.jpg)
SMS Planner
SQL MapReduce SQL
![Page 43: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/43.jpg)
Hive query processor
1. Convert HiveQL query to AST
2. Get schema from catalog
3. Create a Query Plan
4. Optimize
5. Converted plan to one or more MR Jobs
![Page 44: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/44.jpg)
SMS Planner
1. Convert HiveQL query to AST
Update Catalog with DB information
2. Get schema from catalog
3. Create a Query Plan
4. Optimize
Reconstruc some SQL to push it to the DB
5. Converted plan to one or more MR Jobs
![Page 45: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/45.jpg)
SMS Planner
![Page 46: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/46.jpg)
HadoopDB Performance
Group By2,500,000 unique groups over 20gb of data
Join134,000 joined records
over 20gb of date
![Page 47: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/47.jpg)
HadoopDB loading times
![Page 48: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/48.jpg)
HadoopDB
☺ Good performance
☺ Scalable
☺ Fault tolerant
☺ Heterogeneous node compatible
☺ Make any DBMS a distributed system
☹ Data Loader: All a-priori loading problems
![Page 49: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/49.jpg)
Invisible loading
Load DBMS with data from Hadoop at run-time
Invisibility objective
Minimal human effort
Minimal increase in response time
Use a DBMS as a cache for the raw data
![Page 50: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/50.jpg)
Invisible loading
Use code for tuple parsing and extraction to invisibly load the parsed tuples into a DBMS
Read Write
![Page 51: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/51.jpg)
Invisible loading
On next data access, it can be read from DBMS
Read
![Page 52: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/52.jpg)
Invisible loading: Parser
Parser extends inputFormat
getAttribute(int index)
Code for tuple parsing and extraction
Map takes a Parser as input
![Page 53: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/53.jpg)
Invisible loading
☺ Incremental data reorganization
☺ Almost no overhead on MR Jobs
☺ Optimizes future access speeds
☹ Data duplication (No GC)
![Page 54: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/54.jpg)
Outline
1. Introduction
2. Adaptive Indexing
3. Hybrid MapReduce
4. NoDB
5. Summary
![Page 55: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/55.jpg)
New DBMS paradigm
Do not require data loading
Maintains feature set of modern DBMS
Replaces physical storage with raw files
NoDB
![Page 56: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/56.jpg)
PostgresRaw
NoDB Implementation
Replaces TableScan Operator
CSV Files
Optimizations
![Page 57: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/57.jpg)
PostgresRaw Optimizations
Selective...a. Tokenizingb. Parsingc. Tuple formation
Indexing
Auto Tuning
Caching
Statistics
![Page 58: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/58.jpg)
a. Selective tokenizing
111;222;"third";garbage;...
Supposing we want attributes 1 and 3
We can stop tokenizing at the third
Saves CPU time
![Page 59: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/59.jpg)
b. Selective parsing
111;222;"third";garbage;...In memory:
111 6F Parsed to int222 32 32 32 Keep as string"third" 74 68 69 72 64
Also: delayed parsing
![Page 60: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/60.jpg)
c. Selective tuple formation
111;222;"third";garbage;...
(111, "third")
Final tuple containing only attributes 1 and 3
CPU bound
![Page 61: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/61.jpg)
Indexing
Year; Make; Model; Liters
1997; BMW; E89; 2,34
2011; Mercedes; SLS; 2
Looks nice :)
![Page 62: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/62.jpg)
Indexing
NOT :(
Year;Make;Model;Liters¶1997;BMW;E89;2,34¶2011;Mercedes;SLS;2
Sequentially reading each time is not an option
SolutionKeep an index of the already used attributesSkip file reading to this positions
![Page 63: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/63.jpg)
Indexing
Positional MapDynamically created according to queries
Year;Make;Model;Liters¶1997;BMW;E89;2,34¶2011;Mercedes;SLS;2
Tuple 1 Tuple 2 Tuple 3Attribute 1 Attribute 3 Attribute 1 Attribute 3 Attribute 1 Attribute 3
0 10 23 32 41 55
![Page 64: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/64.jpg)
Updates
First case, no positions change
Year;Make;Model;Liters¶1989;BBB;CCC;4,44¶2011;Mercedes;SLS;2
Tuple 1 Tuple 2 Tuple 3Attribute 1 Attribute 3 Attribute 1 Attribute 3 Attribute 1 Attribute 3
0 10 23 32 41 55
![Page 65: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/65.jpg)
Updates
Second case, positions change.First option, update index.
Year;Make;Model;Liters¶1989;B;C;4,44¶2011;Mercedes;SLS;2
Tuple 1 Tuple 2 Tuple 3Attribute 1 Attribute 3 Attribute 1 Attribute 3 Attribute 1 Attribute 3
0 10 23 32 30 (-2) 41 37 (-4) 55 51 (-4)
![Page 66: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/66.jpg)
Updates
Second case, positions change.Second option, throw it partially (or fully) away.
Year;Make;Model;Liters¶1989;B;C;4,44¶2011;Mercedes;SLS;2
Index will automatically reconstruct itself
Tuple 1 Tuple 2Attribute 1 Attribute 3 Attribute 1
0 10 23
![Page 67: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/67.jpg)
Traditional optimizations
Caching
Statistics
![Page 68: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/68.jpg)
NoDB Performance Compared
![Page 69: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/69.jpg)
NoDB
☺ Great DBMS + Raw hybrid
☺ Competitive performance with traditional DBs
☺ Eliminates loading times
☺ Queries get faster with time
☹ Updates
![Page 70: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/70.jpg)
Outline
1. Introduction
2. Adaptive Indexing
3. Hybrid MapReduce
4. NoDB
5. Summary
![Page 71: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/71.jpg)
Summary
Mature solutions: high load or query timeNo index → High query timeLoad all data → High delay (load time)
Hybrid solutionsBring indexes to in-situ processingAdaptive indexingHadoopDBNoDB
![Page 72: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/72.jpg)
Remember..
![Page 73: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/73.jpg)
Conclusions
![Page 74: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/74.jpg)
References1. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and Alexander Rasin.
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proceedings of the VLDB Endowment,
2. Azza Abouzied, Daniel J. Abadi, and Avi Silberschatz. Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference on Extending Database Technology, pages 1–10, 2013.
3. Renata Borovica, Stratos Idreos, and Anastasia Ailamaki. NoDB : Efficient Query Execution on Raw Data Files Categories and Subject Descriptors. pages 241–252.
4. Goetz Graefe and Harumi Kuno. Adaptive indexing for relational keys. 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pages 69–74, 2010.
5. Felix Halim, S Idreos, P Karras, and RHC Yap. Stochastic database cracking: Towards robust adaptive indexing in main-memory column-stores. Proceedings of the VLDB Endowment (PVLDB),
6. Tony Hey, Stewart Tansley, and Kristin Tolle, editors. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, Washington, 2009.
7. Stratos Idreos, Ioannis Alagiannis, Ryan Johnson, and Anastasia Ailamaki. Here are my data files. here are my queries. where are my results. Proceedings of 5th Biennial Conference on Innovative Data Systems Research, pages 57–68, 2011.
8. Christopher Olston, Benjamin Reed, Ravi Kumar, and Andrew Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing.
9. Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. Hive - A Ware-housing Solution Over a Map-Reduce Framework. PVLDB
![Page 75: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/75.jpg)
Questions?
![Page 76: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/76.jpg)
Thank you!
![Page 77: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/77.jpg)
MapReduce
Can be classified as distributed raw file parsing
![Page 78: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/78.jpg)
Adaptive merging
![Page 79: files Query processing on raw - - TU Kaiserslautern · Invisible loading: access-driven data transfer from raw files into database systems. Proceedings of the 16th International Conference](https://reader033.fdocuments.net/reader033/viewer/2022042016/5e746a4fa880364a695a8e6b/html5/thumbnails/79.jpg)
Database Cracking