Parquet performance tuning: the missing guide
-
Upload
ryan-blue -
Category
Technology
-
view
1.694 -
download
4
Transcript of Parquet performance tuning: the missing guide
![Page 1: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/1.jpg)
Parquet performance tuning:The missing guide
Ryan BlueStrata + Hadoop World NY 2016
![Page 2: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/2.jpg)
● Big data at Netflix● Parquet format background● Optimization basics● Stats and dictionary filtering● Format 2 and compression● Future work
Contents.
![Page 3: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/3.jpg)
Big data at Netflix.
![Page 4: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/4.jpg)
Big data at Netflix.
40+ PB DW Read 3PB Write 300TB600B Events
![Page 5: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/5.jpg)
Strata San Jose results.
![Page 6: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/6.jpg)
Metrics dataset.Based on Atlas, Netflix’s telemetry platform.
● Performance monitoring backend and UI
● http://techblog.netflix.com/2014/12/introducing-atlas-netflixs-primary.html
Example metrics data.
● Partitioned by day, and cluster
● Columns include metric time, name, value, and host
● Measurements for each minute are stored in a Parquet table
![Page 7: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/7.jpg)
Parquet format background.
![Page 8: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/8.jpg)
Parquet data layout.ROW GROUPS.
● Data needed for a group of rows to be reassembled
● Smallest task or input split size
● Made of COLUMN CHUNKS
COLUMN CHUNKS.
● Contiguous data for a single column
● Made of DATA PAGES and an optional DICTIONARY PAGE
DATA PAGES.
● Encoded and compressed runs of values
![Page 9: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/9.jpg)
Row groups.
... F
A B C D
a1 b1 c1 d1
... ... ... ...
aN bN cN dN
... ... ... ...
HDFS block
![Page 10: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/10.jpg)
Column chunks and pages.
... F
dict
![Page 11: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/11.jpg)
Read less data.Columnar organization.
● Encoding: make the data smaller
● Column projection: read only the columns you need
Row group filtering.
● Use footer stats to eliminate row groups
● Use dictionary pages to eliminate row groups
Page filtering.
● Use page stats to eliminate pages
![Page 12: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/12.jpg)
Basics.
![Page 13: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/13.jpg)
Setup.Parquet writes:
● Version 1.8.1 or later – includes fix for incorrect statistics, PARQUET-251
● 1.9.0 due in October
Reads:
● Presto: Used 0.139
● Spark: Used version 1.6.1 reading from Hive
● Pig: Used parquet-pig 1.9.0 for predicate push-down
![Page 14: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/14.jpg)
Pig configuration.-- enable pushdown/filteringset parquet.pig.predicate.pushdown.enable true;
-- enables stats and dictionary filteringset parquet.filter.statistics.enabled true;set parquet.filter.dictionary.enabled true;
![Page 15: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/15.jpg)
Spark configuration.// turn on Parquet push-down, stats filtering, and dictionary filteringsqlContext.setConf("parquet.filter.statistics.enabled", "true")sqlContext.setConf("parquet.filter.dictionary.enabled", "true")sqlContext.setConf("spark.sql.parquet.filterPushdown", "true")
// use the non-Hive read pathsqlContext.setConf("spark.sql.hive.convertMetastoreParquet", "true")
// turn off schema merging, which turns off push-downsqlContext.setConf("spark.sql.parquet.mergeSchema", "false")sqlContext.setConf("spark.sql.hive.convertMetastoreParquet.mergeSchema", "false")
![Page 16: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/16.jpg)
Writing the data.Spark:
sqlContext
.table("raw_metrics")
.write.insertInto("metrics")
Pig:
metricsData = LOAD 'raw_metrics'
USING SomeLoader;
STORE metricsData INTO 'metrics'
USING ParquetStorer;
![Page 17: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/17.jpg)
Writing the data.Spark:
sqlContext
.table("raw_metrics")
.write.insertInto("metrics")
Pig:
metricsData = LOAD 'raw_metrics'
USING SomeLoader;
STORE metricsData INTO 'metrics'
USING ParquetStorer;
OutOfMemoryErroror
ParquetRuntimeException
![Page 18: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/18.jpg)
Writing too many files.Data doesn’t match partitioning.
● Tasks write a file per partition
Symptoms:
● OutOfMemoryError
● ParquetRuntimeException: New Memory allocation 1047284 bytes is smaller than the
minimum allocation size of 1048576 bytes.
● Successfully write lots of small files, slow split planning
Task 1 part=1/
part=2/
Task 2 part=3/
part=4/
Task 3 part=.../
![Page 19: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/19.jpg)
Account for partitioning.Spark.
sqlContext
.table("raw_metrics")
.sort("day", "cluster")
.write.insertInto("metrics")
Pig.
metrics = LOAD 'raw_metrics'
USING SomeLoader;
metricsSorted = ORDER metrics
BY day, cluster;
STORE metricsSorted INTO 'metrics'
USING ParquetStorer;
![Page 20: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/20.jpg)
Filter to select partitions.Spark.
val partition = sqlContext
.table("metrics")
.filter("day = 20160929")
.filter("cluster = 'emr_adhoc'")
Pig.
metricsData = LOAD 'metrics'
USING ParquetLoader;
partition = FILTER metricsData BY
date == 20160929 AND
cluster == 'emr_adhoc'
![Page 21: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/21.jpg)
Stats filters.
![Page 22: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/22.jpg)
Sample query.Spark.
val low_cpu_count = partition
.filter("name =
'system.cpu.utilization'")
.filter("value < 0.8")
.count()
Pig.
low_cpu = FILTER partition BY
name == 'system.cpu.utilization' AND
value < 0.8;
low_cpu_count = FOREACH
(GROUP low_cpu ALL) GENERATE
COUNT(name);
![Page 23: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/23.jpg)
My job was 5 minutes faster!
![Page 24: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/24.jpg)
Did it work?● Success metrics: S3 bytes read, CPU time spent
S3N: Number of bytes read: 1,366,228,942,336 CPU time spent (ms): 280,218,780
● Filter didn’t work. Bytes read shows the entire partition was read.
● What happened?
![Page 25: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/25.jpg)
Inspect the file.● Stats show what happened:
Row group 0: count: 84756 845.42 B records
type encodings count avg size nulls min / max
name BINARY G _ 84756 61.52 B 0 "A..." / "z..."
...
Row group 1: count: 84756 845.42 B records
type encodings count avg size nulls min / max
name BINARY G _ 85579 61.52 B 0 "A..." / "z..."
● Every row group matched the query
![Page 26: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/26.jpg)
Add query columns to the sort.Spark.
sqlContext
.table("raw_metrics")
.sort("day", "cluster", "name")
.write.insertInto("metrics")
Pig.
metrics = LOAD 'raw_metrics'
USING SomeLoader;
metricsSorted = ORDER metrics
BY day, cluster, name;
STORE metricsSorted INTO 'metrics'
USING ParquetStorer;
![Page 27: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/27.jpg)
Inspect the file, again.● Stats are fixed:
Row group 0: count: 84756 845.42 B records
type encodings count avg size nulls min / max
name BINARY G _ 84756 61.52 B 0 "A..." / "F..."
...
Row group 1: count: 85579 845.42 B records
type encodings count avg size nulls min / max
name BINARY G _ 85579 61.52 B 0 "F..." / "N..."
...
Row group 2: count: 86712 845.42 B records
type encodings count avg size nulls min / max
name BINARY G _ 86712 61.52 B 0 "N..." / "b..."
![Page 28: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/28.jpg)
Dictionary filters.
![Page 29: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/29.jpg)
Dictionary filtering.Dictionary is a compact list of all the values.
● Search term missing? Skip the row group
● Like a bloom filter without false positives
When dictionary filtering helps:
● When a column is sorted in each file, not globally sorted – one row group matches
● When filtering an unsorted column
dict dict dict
![Page 30: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/30.jpg)
Dictionary filtering overhead.Read overhead.
● Extra seeks
● Extra page reads
Not a problem in practice.
● Reading both dictionary and row group resulted in < 1% penalty
● Stats filtering prevents unnecessary dictionary reads
dict dict dict
![Page 31: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/31.jpg)
Works out of the box, right?Nope.
● Only works when columns are completely dictionary-encoded
● Plain-encoded pages can contain any value, dictionary is no help
● All pages in a chunk must use the dictionary
Dictionary fallback rules:
● If dictionary + references > plain encoding, fall back
● If dictionary size is too large, fall back (default threshold: 1 MB)
![Page 32: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/32.jpg)
Fallback to plain encoding.parquet-tools dump -d utc_timestamp_ms TV=142990 RL=0 DL=1 DS: 833491 DE:PLAIN_DICTIONARY ---------------------------------------------------------------------------- page 0: DLE:RLE RLE:BIT_PACKED V:RLE SZ:72912 page 1: DLE:RLE RLE:BIT_PACKED V:RLE SZ:135022 page 2: DLE:RLE RLE:BIT_PACKED V:PLAIN SZ:1048607 page 3: DLE:RLE RLE:BIT_PACKED V:PLAIN SZ:1048607 page 4: DLE:RLE RLE:BIT_PACKED V:PLAIN SZ:714941
What’s happening:
● Values repeat, but change over time
● Dictionary gets too large, falls back to plain encoding
● Dictionary encoding is a size win!
![Page 33: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/33.jpg)
Avoid encoding fallback.Increase max dictionary size.
● 2-3 MB usually worked
● parquet.dictionary.page.size
Decrease row group size.
● 24, 32, or 64 MB
● parquet.block.size
● New dictionary for each row group
● Also lowers memory consumption!
Run several tests to find the right configuration (per table).
![Page 34: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/34.jpg)
Row group size.Other reasons to decrease row group size:
● Reduce memory consumption – but not to avoid write-side OOM
● Increase number of tasks / parallelism
![Page 35: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/35.jpg)
Results!
![Page 36: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/36.jpg)
Results (from Pig).CPU and wall time dropped.
● Initial: CPU Time: 280,218,780 ms Wall Time: 15m 27s
● Filtered: CPU Time: 120,275,590 ms Wall Time: 9m 51s
● Final: CPU Time: 9,593,700 ms Wall Time: 6m 47s
Bytes read is much better.
● Initial: S3 bytes read: 1,366,228,942,336 (1.24 TB)
● Filtered: S3 bytes read: 49,195,996,736 (45.82 GB)
![Page 37: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/37.jpg)
Filtered vs. final time.Row group filtering is parallel.
● Split planning is independent of stats (or else is a bottleneck)
● Lots of very small tasks: read footer, read dictionary, stop processing
Combine splits in Pig/MR for better time.
● 1 GB splits tend to work well
![Page 38: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/38.jpg)
Other work.
![Page 39: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/39.jpg)
Format version 2.What’s included:
● New encodings: delta-integer, prefix-binary
● New page format to enable page-level filtering
New encodings didn’t help with Netflix data.
● Delta-integer didn’t help significantly, even with timestamps (high overhead?)
● Not large enough prefixes in URL and JSON data
Page filtering isn’t implemented (yet).
![Page 40: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/40.jpg)
Brotli compression.● New compression library, from Google
● Based on LZ77, with compatible license
Faster compression, smaller files, or both.
● brotli-5: 19.7% smaller, 2.7% slower – 1 day of data from Kafka
● brotli-4: 14.8% smaller, 12.5% faster – 1 hour, 4 largest Parquet tables
● brotli-1: 8.1% smaller, 28.3% faster – JSON-heavy dataset
![Page 41: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/41.jpg)
Brotli compression. (continued)
![Page 42: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/42.jpg)
Future work.
![Page 43: Parquet performance tuning: the missing guide](https://reader034.fdocuments.net/reader034/viewer/2022052117/589b9f4b1a28abd63e8b5d7d/html5/thumbnails/43.jpg)
Future work.Short term:
● Release Parquet 1.9.0
● Test Zstd compression
● Convert embedded JSON to Avro – good preliminary results
Long-term:
● New encodings: Zig-zag RLE, patching, and floating point decomposition
● Page-level filtering