Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf ·...

53
BadgerDB::Btree Goal: Build key components of a RDBMS First hand experience building the internals of a simple database system And have some fun doing so! Two parts Buffer manager [ ] B+tree (Due Date : Mar 27 by 2PM) First class day after the Spring break All projects are individual assignments 2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 1

Transcript of Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf ·...

Page 1: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

BadgerDB::Btree• Goal:BuildkeycomponentsofaRDBMS

– Firsthandexperiencebuildingtheinternalsofasimpledatabasesystem

– Andhavesomefundoingso!

• Twoparts– Buffermanager[✔]– B+tree(DueDate:Mar27by2PM)

• FirstclassdayaftertheSpringbreak

Allprojectsareindividualassignments

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 1

Page 2: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

StructureofDatabase

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 2

Queryoptimizerandexecution

Relationaloperators

Fileandaccessmethods

Buffermanager

I/Omanager

Page 3: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Planfortoday

• ReviewofC++templatesandhelpfulfunctions– memset,memcpy andreinterpret_cast

• B+tree:insertion• <break>• BadgerDB::Btree

– Projectspecifications– Code

• Q&A

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 3

Page 4: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

C++templates

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 4

Page 5: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Humanmiseryofduplicatecodes

• SupposeyouwriteafunctionprintData:

• lateryouwanttoprintdoubleandstd::string

void printData(int value) {std::cout << "The value is "<< value;

}

void printData(double value) {std::cout << "The value is "<< value;

}

void printData(std::string value) { std::cout << "The value is "<< value;

}2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 5

Page 6: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

AndStroustrup said- lettherebetemplates

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 6

template<typename T>void printData(T value) {

std::cout << "The value is ” << value;}

Page 7: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

AndStroustrup said- lettherebetemplates

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 7

template<typename T>void printData(T value) {

std::cout << "The value is ” << value;}

Page 8: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Templatesemantics

• Thesyntaxissimple:template< typename name ó class name >

• Functiontemplates• Classtemplates

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 8

Page 9: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Functiontemplates

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 9

template<typename T>void func() {}

int main() { func<int>();func<double>();

}

Page 10: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Functiontemplates

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 10

template<typename T>void func() {}

int main() { func<int>();func<double>();

}

template<typename T>void func(T value) {} template<typename T, typename U>T func2(U value) {

return T(value); }

int main() { // T=intfunc(3);// T=doublefunc(3.5);

// T=int, U=double func2(3.5);// T=std::vector, U=intfunc2<std::vector> (5); // specify both T and U // T=std::vector, U=intfunc2<std::vector, int>(5.7);

}

Page 11: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Classtemplates• Alsoworksonstructs

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 11

template<typename T, int i>struct FixedArray {

T data[i]; };

FixedArray<int, 3> a; // array of 3 integers

Page 12: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Classtemplates• Alsoworksonstructs

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 12

template<typename T, int i>struct FixedArray {

T data[i]; };

FixedArray<int, 3> a; // array of 3 integers

template<typename T>class MyClass { };

template<typename T1, typename T2=int> class MyClass{};

// specify all parameters MyClass<double, std::string> mc1;

// default value for T2MyClass<int> mc4;

Page 13: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Templaterequirements• Templatesimplicitlyimposerequirementsontheirparameters

• TypeThastobe:– Copy-Constructible if

T a(b);– Assignablei.e. definesoperator=() if:

a=b;– etc

• Forthisproject:operationssuchasa < b couldmeandifferentforint andstd::string

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 13

Page 14: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Pointersandarrays

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 14

Page 15: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Pointersandarrays

• arraysworkverymuchlikepointerstotheirfirstelementsint myarray [20];int * mypointer;

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 15

Page 16: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Pointersandarrays

• arraysworkverymuchlikepointerstotheirfirstelementsint myarray [20];int * mypointer;

Canyoudo?mypointer = myarray;

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 16

Page 17: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Pointersandarrays• arraysworkverymuchlikepointerstotheirfirstelementsint myarray [20];int * mypointer;

Canyoudo?mypointer = myarray;

Can you do?myarray = mypointer;

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 17

Page 18: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Pointersandarrays• arraysworkverymuchlikepointerstotheirfirstelementsint myarray [20];int * mypointer;

Canyoudo?mypointer = myarray; ç Yes

Can you do?myarray = mypointer; ç No

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 18

Page 19: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Example// more pointers#include <iostream>using namespace std;int main () {

int numbers[5];int * p;p = numbers; *p = 10;p++; *p = 20;p = &numbers[2]; *p = 30;p = numbers + 3; *p = 40; p = numbers; *(p+4) = 50;for (int n=0; n<5; n++)

cout << numbers[n] << ", ";return 0;

}2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 19

Page 20: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Example// more pointers#include <iostream>using namespace std;int main () {

int numbers[5];int * p;p = numbers; *p = 10;p++; *p = 20;p = &numbers[2]; *p = 30;p = numbers + 3; *p = 40; p = numbers; *(p+4) = 50;for (int n=0; n<5; n++)

cout << numbers[n] << ", ";return 0;

}

Prints: 10, 20, 30, 40, 50,

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 20

Page 21: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Pointersandstringliteral

• const char*foo="hello";

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 21

Page 22: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Pointersandstringliteral

• const char*foo="hello";

• Foocontainsthevalue1702,andnot'h',nor"hello“• Whatistheoutputof?

*(foo+4)foo[4]

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 22

Page 23: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

C/C++helpfulfuctions

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 23

Page 24: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

memset• void * memset ( void * ptr, int value, size_t num );• Fillblockofmemory

– Setsthefirst num bytesoftheblockofmemorypointedby ptr tothespecified value (interpretedasan unsignedchar)

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 24

int main () { char str[] = "almost every programmer should know memset!"; memset (str,'-',6); print(str);return 0;

}

Page 25: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

memcpy• void*memcpy (void*dest,const void*source,size_t num );• Copyblockofmemory

– Copiesthevaluesof num bytesfromthelocationpointedtoby source directlytothememoryblockpointedtoby destination.

• std::memcpy ismeanttobethefastestlibraryroutineformemory-to-memorycopy(usuallymoreefficientthan std::strcpy)

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 25

struct { char name[40]; int age; } person, person_copy;

int main () { char myname[] = ”Bucky Badger"; /* using memcpy to copy string: */memcpy ( person.name, myname, strlen(myname)+1 ); /* using memcpy to copy structure: */ memcpy ( &person_copy, &person, sizeof(person) ); return 0;

}

Page 26: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

reinterpret_cast<new_type>(expr)• reinterpret_cast convertsanypointertypetoanyotherpointer

type,evenofunrelatedclasses.• Allpointerconversionsareallowed:neitherthecontentpointed

northepointertypeitselfischecked.

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 26

int main () { int i = 7;int* p1 = reinterpret_cast<int*>(&i);assert(p1 == &i);// type aliasing through pointerchar* p2 = reinterpret_cast<char*>(&i);// type aliasing through referencereinterpret_cast<unsigned int&>(i) = 42; std::cout << i << '\n';

}

Page 27: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

B+tree

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 27

Page 28: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

B+tree

• Occupancy(d)– Minimum50%occupancy(exceptforroot)– Eachnodecontainsd<=m<=2dentries.

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 281/29/17 CS 564: Database Management Systems, Jignesh M. Patel 5

(Ubiquitous)B+Tree• Height-balanced(dynamic)treestructure• Insert/deleteatlogF Ncost(F=fanout,N=#leafpages)• Minimum50%occupancy(exceptforroot).

Eachnodecontainsd <=m <=2d entries.Theparameterd iscalledtheorder ofthetree.

• Supportsequalityandrange-searchesefficiently.

Index Entries(Direct search)

Data Entries

Data EntriesEntries in the leaf pages:

(search key value, recordid)

Index EntriesEntries in the index (i.e. non-leaf) pages:

(search key value, pageid)

1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 5

(Ubiquitous)B+Tree• Height-balanced(dynamic)treestructure• Insert/deleteatlogF Ncost(F=fanout,N=#leafpages)• Minimum50%occupancy(exceptforroot).

Eachnodecontainsd <=m <=2d entries.Theparameterd iscalledtheorder ofthetree.

• Supportsequalityandrange-searchesefficiently.

Index Entries(Direct search)

Data Entries

Data EntriesEntries in the leaf pages:

(search key value, recordid)

Index EntriesEntries in the index (i.e. non-leaf) pages:

(search key value, pageid)

1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 5

(Ubiquitous)B+Tree• Height-balanced(dynamic)treestructure• Insert/deleteatlogF Ncost(F=fanout,N=#leafpages)• Minimum50%occupancy(exceptforroot).

Eachnodecontainsd <=m <=2d entries.Theparameterd iscalledtheorder ofthetree.

• Supportsequalityandrange-searchesefficiently.

Index Entries(Direct search)

Data Entries

Data EntriesEntries in the leaf pages:

(search key value, recordid)

Index EntriesEntries in the index (i.e. non-leaf) pages:

(search key value, pageid)

Page 29: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 291/29/17 CS 564: Database Management Systems, Jignesh M. Patel 9

B+-Tree:InsertingaDataEntry• FindcorrectleafL.• PutdataentryontoL.

– IfLhasenoughspace,done!– Else,mustsplit L(intoLandanewnodeL2)

• Redistributeentriesevenly,copyup middlekey.• InsertindexentrypointingtoL2intoparentofL.

• Thiscanhappenrecursively– Tosplitnon-leafnode,redistributeentriesevenly,but

pushingup themiddlekey.(Contrastwithleafsplits.)• Splits“grow”tree;rootsplitincreasesheight.

– Treegrowth:getswider oroneleveltallerattop.

Page 30: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 301/29/17 CS 564: Database Management Systems, Jignesh M. Patel 10

Inserting8*intoB+TreeRoot

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

Entry to be inserted in parent nodeCopied up (and continues to appear in the leaf)

2* 3* 5* 7* 8*

5

Page 31: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 311/29/17 CS 564: Database Management Systems, Jignesh M. Patel 10

Inserting8*intoB+TreeRoot

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

Entry to be inserted in parent nodeCopied up (and continues to appear in the leaf)

2* 3* 5* 7* 8*

5

Page 32: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 32

1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 11

Inserting8*intoB+Tree

Insert in parent node.Pushed up (and only appears once in the index)

5 24 30

17

13

Page 33: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 331/29/17 CS 564: Database Management Systems, Jignesh M. Patel 12

2* 3*

Root17

24 30

14*16* 19*20*22* 24*27*29* 33*34*38*39*

135

7*5* 8*

Inserting8*intoB+Tree

• Rootwassplit:heightincreasesby1• Couldavoidsplitbyre-distributingentrieswithasibling

– Sibling:immediatelytoleftorright,andsameparent

Page 34: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

5minsbreak

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 34

https://www.youtube.com/watch?v=AxSdWhkMB_A

Page 35: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

BadgerDB:Btree

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 35

Page 36: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Afteryouuntar theproject:• btree.h:Addyourownmethodsandstructuresasyouseefitbut

don’tmodifythepublicmethodsthatwehavespecified.• btree.cpp:Implementthemethodswespecifiedandanyothers

youchoosetoadd.• file.h(cpp):ImplementsthePageFile andBlobFile classes.• main.cpp:Usetotestyourimplementation.Addyourowntests

hereorinaseparatefile.ThisfilehascodetoshowhowtousetheFileScan andBTreeIdnex classes.

• page.h(cpp):ImplementsthePageclass.• buffer.h(cpp),bufHashTbl.h(cpp):Implementationofthebuffer

manager.• Exceptions/*:Implementationofexceptionclassesthatyou

mightneed.• Makefile – makefile forthisproject.

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 36

Page 37: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 371/29/17 CS 564: Database Management Systems, Jignesh M. Patel 7

B+-treePageFormatLe

af P

age

R1 K 1 R2 K 2 K n P n+1

data entries

record 1 record 2

Next PagePointer

Rn

record n

P0

Prev PagePointer

Non

-leaf

Pa

geP1 K 1 P 2 K 2 P 3 K m P m+1

index entries

Pointer to apage with Values < K1

Pointer to a pagewith values s.t.K1≤ Values < K2

Pointer to apage with values ≥Km

Pointer to a pagewith values s.t., K2≤ Values < K3

Pm

Page 38: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Index• theindex willstoredataentriesintheform<key, rid> pair• storedinafilethatisseparatefromthedatafile• i.e.theindexfile“pointsto”thedatafilewheretheactualrecords

arestored

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 38

Page 39: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

• storesalltherelations(actualdata)aswedidinthebuffermanagerassignment

• Youdon’tactuallyusethisoneforthisproject

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 39

File

PageFile BlobFile• pagesinthefilearenotlinkedby

prevPage/nextPage links• treatsthepagesasblobsof8KB

sizei.e doesnotrequirethesepagestobevalidobjectsofthePageclass

• usetheBlobFile tostoretheB+indexfile

• everypageinthefileisanodefromtheB+tree

• wecanmodifythesepagestosuittheparticularneedsoftheB+treeindex

Page 40: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

FileScan class• TheFileScan classisusedtoscanrecordsinafile.• FileScan(const std::string&relationName,BufMgr *bufMgr)

– TheconstructortakestherelationName andbuffermanagerinstance• ~FileScan()

– Shutsdown thescanandunpinsanypinnedpages.• void scanNext(RecordId& outRid)

– Returns(viatheoutRid parameter)theRecordId ofthenextrecordfromtherelationbeingscanned.ItthrowsEndOfFileException()whentheendofrelationisreached.

• std::string getRecord() – Returnsapointertothe“current”record.Therecordistheoneina

precedingscanNext()call.• void markDirty()

– Youdon’tneedthisforthisassignment

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 40

Page 41: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

BadgerDB:B+Tree IndexSimplifications:• assumethatallrecordsinafilehavethesamelength(soforagivenattributeitsoffsetintherecordisalwaysthesame).

• onlyneedstosupportsingle-attributeindexing• theindexedattributemaybeoneofthreedatatypes:integer,double,orstring

• inthecaseofastring,youcanusethefirst10charactersasthekeyintheB+-tree

• wewillneverinserttwodataentriesintotheindexwiththesamekeyvalue2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 41

Page 42: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

B+Tree Index:Constructor

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 42

const string&relationName

Thenameoftherelationonwhichtobuildtheindex.Theconstructorshouldscanthisrelation(usingFileScan)andinsertentriesforallthetuplesinthisrelationintotheindex

String&outIndexName

Thenameoftheindexfile;determinethisnameintheconstructorasshownabove,andreturnthename.

BufMgr *bufMgrIn Theinstanceoftheglobalbuffermanager.

const intattrByteOffset

Thebyteoffsetoftheattributeinthetupleonwhichtobuildtheindex.Forinstance,ifwearestoringthefollowingstructureasarecordintheoriginalrelation:

And,wearebuildingtheindexoverthedoubled,thentheattrByteOffset valueis0+offsetof(RECORD,i),whereoffsetof istheoffsetpositionprovidedbythestandardC++library“offsetoff”.

const DatatypeattrType

Thedatatypeoftheattributeweareindexing.NotethattheDatatypeenumeration{INTEGER,DOUBLE,STRING}isdefinedinbtree.h

If the index file already exists, open the file. Else, create a new index fileParameters:

Page 43: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

B+Tree Index:insertEntry

const void*key Apointertothevalue(integer/double/string)wewanttoinsert.

const RecordId &rid Thecorrespondingrecordidofthetupleinthebaserelation.

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 43

insertEntryinserts a new entry into the index using the pair <key, rid>.Input to this function:

Page 44: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

B+Tree Index:insertEntry

const void*key Apointertothevalue(integer/double/string)wewanttoinsert.

const RecordId &rid Thecorrespondingrecordidofthetupleinthebaserelation.

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 44

insertEntryinserts a new entry into the index using the pair <key, rid>.Input to this function:

You will be spending bulk of your time /code in this method

Page 45: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

B+Tree Index:startScan

const void*lowValue

Thelowvaluetobetested.

const OperatorlowOp

Theoperationtobeusedintestingthelowrange.YoushouldonlysupportGTandGTEhere;anythingelseshouldthrowBadOpcodesException.

const void*highValue

Thehighvaluetobetested.

const OperatorhighOp

Theoperationtobeusedintestingthehighrange.YoushouldonlysupportLTandLTEhere;anythingelseshouldthrowBadOpcodesException.

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 45

startScanThis method is used to begin a “filtered scan” of the index. For e.g. if the method is called using arguments (“a”,GT,”d”,LTE), the scan should seek all entries greater than “a” and less than or equal to “d”.Input to this function:

Page 46: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

B+Tree Index:scanNext

• scanNext• fetchestherecordidofthenexttuplethatmatchesthescan

criteria.Ifthescanhasreachedtheend,thenitshouldthrowtheexceptionIndexScanCompletedException

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 46

RecordId &outRid

outputvaluethisistherecordidofthenextentrythatmatchesthescanfiltersetinstartScan.

Page 47: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

B+Tree Index:endScan

• endScan• terminatesthecurrentscanandunpins allthepagesthathavebeenpinnedforthepurposeofthescan

• throwsScanNotInitializedException ifcalledbeforeasuccessfulstartScan call.

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 47

Page 48: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Implementationnotes• callthebuffermanagertoread/writepages• don’tkeepthepagespinnedinthebufferpoolunlessyouneedto

• Forthescanmethods,youwillneedtorememberthe“state”ofthescanspecifiedduringthestartScan

• insert doesnot needtoredistributeentries• Attheleaflevel,youdonotneedtostorepointerstobothsiblings.Theleafnodesonlypointtothe“next”(theright)sibling

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 48

Page 49: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

FAQs• HowdoIgetstarted?

iftheindexfiledoesnotexist:createnewBlobFile.allocate newmetapageallocate newrootpagepopulate'IndexMetaInfo'withtherootpage numscanrecordsandinsertintotheBTree

elsereadthefirstpagefromthefile- whichisthemetanodegettherootpagenum fromthemetanodereadtherootpage(bufManager->readPage(file,

rootpageNum,out_root_page)onceyouhavetherootnode,youcantraversedownthetree

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 49

Page 50: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

FAQs

• howtocheckwhetheranindexfileexists?– Seefile.h:staticboolexists(const std::string&filename)

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 50

Page 51: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

FAQs

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 51

• How do I write a node to disk as a Page? e.g. how to write IndexMetaInfonode to the file?

Þ You first need to allocate a Page using the bufferManager.Page* metaPage;bufManager->allocatePage(..., metaPage);

Then you can either cast it as IndexMetaInfo* and update it's parameter. Another way is to first create and populate MetaIndexInfo node.Then you can allocate a new Page* using bufferManager as above. Then use 'memcpy' to copy tothe new Page:

memcpy(metaPage, &metaInfo,sizeof(IndexMetaInfo));

But you do not need to write it back as page to disk explicitly. Buffer Manager does it for you.Remember project 2?

Page 52: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

FAQs

• howtoconvertPage - thatyoureadfromthefiletoNode?– youcancaste.g.usingreinterpret_cast

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 52

Page 53: Btree project4 discussionpages.cs.wisc.edu/~jignesh/cs564/notes/BadgerDB_Btree_project.pdf · 2/23/171/29/17 CS 564: Database Management Systems, Udip Pant and Jignesh PatelCS 564:

Suggestions

• Startearly– 1000+linesofcodes

• Trytofinishbeforethespringbreak– NoTAhoursduringthebreak

• Makeincrementalprogress– Testaggressively

2/23/17 CS 564: Database Management Systems, Udip Pant and Jignesh Patel 53