Fast GiST Index Build
-
Upload
alexander-korotkov -
Category
Documents
-
view
226 -
download
0
Transcript of Fast GiST Index Build
-
8/14/2019 Fast GiST Index Build
1/28
Fast GiST index build
Alexander KorotkovPostgreSQL Conference Europe !""# A$sterda$
-
8/14/2019 Fast GiST Index Build
2/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""
%rdinal GiST index build
We have to insert index tuples one by one
I1 I2
I3 I4 I5 I6
L1 L2 L9 L3L14 L5 L6 L16L7 L8 L10
L9, L10, L11, L12,
L13, L14, L15, L16
L11 L12L15 L13L4
-
8/14/2019 Fast GiST Index Build
3/28
&'en so$et'ing go (rong (it' ordinalGiST index build algorit'$)
-
8/14/2019 Fast GiST Index Build
4/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""*
+ndex too large to fit into cac'e
Significant fraction of index tuple inserts causerandom IO
I1 I2
I3 I4 I5 I6
L1 L2 L9 L3 L5 L6 L7 L8 L10
L9, L10,L11, L12,
L13, L14, L15, L16
L11L4
Cac'e
L14 L16L12L15 L13
,ando$ read
-
8/14/2019 Fast GiST Index Build
5/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""-
.ig' concurrenc/
If even index can fit to entire cache, otherbackends also use cache. So, even not toolarge index can be out of cache.
I1 I2
I3 I4 I5 I6
L1 L2 L3 L5 L6 L7 L8L4
I1 I2
L1 L2 L3 L4 L5 L6
L4 L1 L2 L6 L3L5
GiST index
%t'er useful stuff
Cac'e
-
8/14/2019 Fast GiST Index Build
6/28
&'at 'elps to orginal GiST index buildalgorit'$)
-
8/14/2019 Fast GiST Index Build
7/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""0
.ig' overlap
gistc'ooseselects the first path with zeropenalty if any!.
I1
I2I3
I4
I5
I6
L9L1
L2
L3
L4
L5
L6
L7
L8
L16
L15
L14
L13
L12
L11
L10
-
8/14/2019 Fast GiST Index Build
8/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""1
%rdered datasets
Ordered dataset cause inserts to be in the recentlyaccessed part of index. "hat#s very good forcaching.
I1 I2
I3 I4 I5 I6
L1 L2 L9 L3L10 L5 L6 L16L7 L8 L15
L9, L10, L11, L12,
L13, L14, L15, L16
L11 L13L12 L14L4
-
8/14/2019 Fast GiST Index Build
9/28
T'e buffering G+ST index build tec'ni2ue
-
8/14/2019 Fast GiST Index Build
10/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""!
General idea
$emember index tuples which are going tospecific part of tree, and then process pack ofthem later.
I1 I2
I3 I4 I5 I6
L1 L2 L9 L3L14 L5 L6 L16L7 L8 L10
L9, L11, L14, L15
L11 L12L15 L13L4
L10, L12, L13, L16
-
8/14/2019 Fast GiST Index Build
11/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""""
3uffering algorit'$
%uffering algorithm is a recursive application ofthat idea.
&age %uffer
&age &age
&age %uffer &age %uffer
&age &age&age &age
&age %uffer &age %uffer &age %uffer &age %uffer
&age &age &age &age &age &age &age &age
'........
'...........
'..................
'.'.
&age
&age &age &age &age &age&age&age&age'...................
level
step
levelstep
levelstep
-
8/14/2019 Fast GiST Index Build
12/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""
3uffer e$pt/ing
(owest level overflowed buffer can be emptied toleaf pages.
I1 I2
I3 I4 I5 I6
L1 L2 L9 L3L14 L5 L6 L16L7 L8 L10
L9, L10, L11, L12, L13, L14, L15, L16
L11 L12L15 L13L4
-
8/14/2019 Fast GiST Index Build
13/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""4
3uffer e$pt/ing
)igher level overflowed buffers can be emptiedto lower level buffers.
I1 I2
I3 I4 I5 I6
I7 I8
L9
I9
L14
I11 I12
L16
I13 I14
L10
L9, L10, L11, L12, L13, L14, L15, L16
L11 L12L15 L13
I10
-
8/14/2019 Fast GiST Index Build
14/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""*
Page splitting
When page is splitting, attached buffer is splittingtoo.
I1 I2
I1 I2 L9, L14, L11, L15
L9, L10, L11, L12, L13, L14, L15, L16I3 I4
I4, I5
I3 I4 I5 L12, L13, L10, L16
-
8/14/2019 Fast GiST Index Build
15/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""-
Final buffers e$pt/ing
When all the tuples are inserted, final buffersemptying stage is starting. *ll even non+overflowed buffers are emptying in up+to+down
manner.
-
8/14/2019 Fast GiST Index Build
16/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""5
3uffer si6e and level step selection
Subtree of level step height should fits tocache. "herefore, operations inside subtree areIO efficient.
%uffer size should be comparable with size ofsubtree. "hus, IO would be comparable withsize of inserted data.
-
8/14/2019 Fast GiST Index Build
17/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""0
7arlena data
(evel step and buffer size parameters aredepending on index tuple size.
(evel step is determined for worst case.
Subtree should fits to cache, even if all varlenatuples are of minimal size.
%uffer size is in runtime tuning based on
average size of inserted index tuples.
-
8/14/2019 Fast GiST Index Build
18/28
Anal/sis
-
8/14/2019 Fast GiST Index Build
19/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""8
&'en does buffering 'elp)
Somebody may be dissapointed, but bufferinghelps only if bottleneck of index build is IO itcould be rather dramatic help!.
iS" is also -& expensive in comparison with,for example, %tree, because many penalty andconsistent calls it have to do that calls for eachindex tuple in page which is in use!. %ufferingdoesn#t do any help with that.
-
8/14/2019 Fast GiST Index Build
20/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""!
9ode splitting algorit'$ tradeoff
With buffering build we can now be sure thateven large index tree with low overlaps doesn#tre/uire enormous time to construct.
With buffering build, it#s a good time todescrease overlaps by new node splittingalgorithm.
-
8/14/2019 Fast GiST Index Build
21/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !"""
:ouble sorting node splitting
Is based on more comprehensive considerationof splits along axis.
)as complexity On0logn!!, n 1 tuples count.
On large datasets it shows much better index/uality, because of less overlap page accessescan be less in times or even dozens of times2!
-
8/14/2019 Fast GiST Index Build
22/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""
+% vs CP; tradeoff
%uffering techni/ue is optimizing IO, but itcosts additional -& load.
When index fits to cache, buffering is 3ust
waste of -&.
-
8/14/2019 Fast GiST Index Build
23/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""4
+% vs CP; tradeoff
4buffering4 parameter of iS" index
on 1 try to use buffering anyway if haveenough of $*5!
off 1 don#t use buffering anyway
auto default! 1 try to switch to buffering whenindex size exceeds effective6cache6size
-
8/14/2019 Fast GiST Index Build
24/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""*
Testing
Dataset Split method Build method Actual time Search time
Uniform New linear regular 17 h 39 m 1
buffering 3 h 23 m 0.90
Double sorting regular 9 d 10 h 0.089
buffering 4 h 11 m 0.089
USN!2"ordered#
New linear regular $% m 1
buffering 1 h 27 m 0.4%
Double sorting regular 4$ m 0.37
buffering 1 h 29 m 0.3%
USN!2"shuffled#
New linear regular 10 h 12 m 1
buffering 3 h 1% m 0.89
Double sorting regular 8 d 20 h 0.072
buffering 4 h 20 m 0.0%7
"est setup with 7b of $*5 builds index on 8995 of row.
l i
-
8/14/2019 Fast GiST Index Build
25/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""-
Anal/sis
%uffering techini/ue:
*ccelerate build of non+overlapping trees onshuffed data in dozens of times2
5ight accelerate build of even high+overlappingtrees depending of OS cache strategy!
*dds some slowdown to index build on well+
ordered datasets.
F t k
-
8/14/2019 Fast GiST Index Build
26/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""5
Future (ork
Improve automatic switching to buffering buildmode detect concurrent load and ordereddatasets!
;ecrease -& usage
-
8/14/2019 Fast GiST Index Build
27/28
Fast GiST index build# Alexander Korotkov# PostgreSQL Conference Europe !""0
Ackno(ledge$ent
Oleg and "eodor for giving direction andadvices
)eikki (innakages for mentoring and his work
on this patch So- for funding of this pro3ect
-
8/14/2019 Fast GiST Index Build
28/28
T'ank /ou for attention