On random sampling over Joins
description
Transcript of On random sampling over Joins
![Page 1: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/1.jpg)
SURAJIT CHAUDHURIRAJEEV MOTWANIVIVEK NARASAYYA
On random sampling over Joins
Presented by : Srikantha Nema
![Page 2: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/2.jpg)
Outline
Semantics of SampleDifficulty of join SamplingAlgorithms for SamplingSampling strategiesNew strategies for join SamplingExperimental evaluationConclusions
![Page 3: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/3.jpg)
Terminologies
SAMPLE(R, f) is an SQL operation
When a query Q is evaluated, we obtain relation R
f is a fraction of a relation R
![Page 4: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/4.jpg)
Semantics of Sample
Sampling with Replacement (WR)
Sampling without Replacement (WoR)
Independent Coin Flips (CF)
![Page 5: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/5.jpg)
Difficulty of Join Sampling
,,,...,,,,,,, 23212011 kbabababaBAR
kcacacacaCAR ,,....,,,,,,, 12111022
),( 21 fRRSAMPLE
),(),( 2211 fRSAMPLEfRSAMPLE ?
![Page 6: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/6.jpg)
Classification of Join Sampling problem
Case A No information is available for either or
Case B No information is available for but indexes and
/or statistics are available for Case C
Indexes/statistics are available for and
1R 2R
1R2R
1R 2R
![Page 7: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/7.jpg)
Algorithms for Sampling
Unweighted Sequential WR Sampling Black-Box U1 Black-Box U2
Weighted Sequential WR Sampling Black-Box WR1 Black-Box WR2
![Page 8: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/8.jpg)
Unweighted Sequential WR Sampling
Black-Box U2
Black-Box U1
![Page 9: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/9.jpg)
Weighted Sequential Sampling
Black-Box WR1
Black-Box WR2
![Page 10: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/10.jpg)
Sampling Strategies (old)
Strategy Naïve-Sample
Strategy Olken-Sample
![Page 11: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/11.jpg)
New strategies for join Sampling
Strategy Stream-Sample
Strategy Group-Sample
Strategy Frequency-Partition-Sample
![Page 12: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/12.jpg)
Strategy Frequency-Partition-Sample
![Page 13: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/13.jpg)
Experimental Evaluation 1
![Page 14: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/14.jpg)
Experimental Evaluation 2
![Page 15: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/15.jpg)
Experimental Evaluation 3
![Page 16: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/16.jpg)
Conclusions
Difficulty of join samplingClassification of the problem into 3 casesStrategies for join samplingNew schemes for sequential random
sampling for uniform and weighted samplingMore efficient strategies can be developed
for the case of single joinMore work needed to understand the
problem of sampling the result of join trees
![Page 17: On random sampling over Joins](https://reader035.fdocuments.net/reader035/viewer/2022062520/5681626a550346895dd2dc3b/html5/thumbnails/17.jpg)
Thank You