C11 Hashing

21
Homework 3 Due Thursday Sept 30 CLRS 8-3 (sorting variable-length items) CLRS 9-2 (weighted median) CLRS 11-1 (longest probe bound for hashing) 1

description

C11 Hashing

Transcript of C11 Hashing

Homework3DueThursdaySept30CLRS8-3(sortingvariable-lengthitems)CLRS9-2(weightedmedian)CLRS11-1(longestprobeboundforhashing)1Chapter11: HashingWeuseatableofsizem _ nandselectafunctionh : U ?m, whichwecall ahashfunction.Weputanelementwithkeyktothesloth(k), wherecollisionisresolvedbychainingtheelementswiththesamehashvalue.2LoadFactorToanalyzeeciencyofhashingweusetheloadfactor, , whichistheaveragenumberofelementsinaslot. Thisisaquantitythatchangesovertimeasthetableacquiresorloseselements.Whatistheloadfactorofanm-slothashtableholdingqobjects?4Fundamental operationsinhashingInsertionInsertthegivenitemwithkeyksomewhereinthelistatthesloth(k).Whereinthelistshouldtheitembeinserted?Andhowdoesthestrategyinuencetherunningtime?5Itshouldgoatthebeginningofthelist.Thenthetimeforinsertionisconstantexcludingthetimeforevaluationthehashfunction.Ifall theelementshappentohavethesamehashvalue,thenthetimeforinsertionisproportiontothenumberofelementsinthetable, againexcludingtimeforevaluationthehashfunction.6DeletionandSearchingTondordeleteanelementwithkeyk, wescanthelistatsloth(k)tondit.Theworst-casescenarioinsearchinganddeletioniswhentheitemisattheveryendofthelist.7SelectionoftheHashFunctionTheperformanceofdynamictableoperationsisdependentonthechoiceofh.Supposethat, foreachofthethreeoperations, selectionofthetargetelementissubjecttoaprobabilitydistributionP. Thatis, foreachkeyx, 0 x n 1, theprobabilitythatthekeyxisselectedforanoperationisP(x).Ideal hashingcanbeachievedwhenthehashfunctionhasapropertysuchthatforall y,0 y m 1, x:h(x)=y P(x) =1m. Suchasituationiscalledsimpleuniformhashing.Whatistheexpectednumberofelementsinaslotundersimpleuniformhashing?8Undersimpleuniformhashing, foreachslot,theprobabilitythatthetargetelementisassignedtotheslotis1m. Ifthereareqelementsinthetable, thenforeveryslottheexpectednumberofelementsintheslotisq/m, whichistheloadfactor. TheexpecttimeforsearchinginalistoflengthLisL/2forsuccessful searchandLforunsuccessfulsearch. So, wehavethefollowingtheorem.TheoremA Ifhiscomputableinaconstanttimesearchingundersimpleuniformhashingtakes(1 +)ontheaverage.Unfortunately, designingasimpleuniformhashfunctionisusuallyimpossiblebecausePisnotknown.9HeuristicsforHashFunctions1. ThedivisionmethodForall k, h(k) = k mod m.Itoftenhappensthatthekeysarecharacterstringsinterpretedinradix2p. Then m = 2pmapstwokeyswiththesamelastcharactertothesamehashvalue, and m = 2p1mapstwokeyscomposedofthesamesetofcharacterstothesamehashvalue.Aheuristicchoiceformisaprimefarapartfromanypowersof2, e.g. theprimeclosestto2p/3.2. ThemultiplicationmethodForall k, h(k) = ]m(kA ]kA|)|,whereA (0.. 1)isaconstant.Itisknownthatthevalueofmisnotcritical.10Universal HashingSupposethatasituationinwhichanapplicationthatemployshashingisrepeatedlyexecutedandinwhichthehashfunctionisselectedfromapool ofhashfunctionsateachexecution.Let 1bethepool ofhashfunctions.Wesaythat 1isuniversal if, forall keysxandy, x ,= y, itholdsthat(*) |h 1 [ h(x) = h(y)|=Hm.Supposethatateachexecutionthehashfunctionhischosenfrom 1uniformlyatrandom. Then, forall pairs(x, y), x ,= y, theprobabilitythath(x) = h(y)is1/m.11Usefulnessof Universal HashingTheoremB Let 1beauniversal familyofhashfunctions. LetSbeanonemptysetofkeyshavingcardinalityatmostm. LetxbeanykeyinS. Forh 1chosenuniformlyatrandom, theexpectednumberofcollisionsinSwithxislessthan1.ProofLetEbetheexpectednumberinquestion. ThenE=

hH

yS,x=y (h, x, y)|1|,where(h, x, y) = 1ifh(x) = h(y)and0otherwise. Thisquantityisequal to

yS,x=y

hH(h, x, y)|1|.By(*), thisis

yS,x=y1mm 1m< 1.12DesigningUniversal HashFunctionsChooseaprimepgreaterthanall keysk.Choosea 1...p 1Chooseb 0...p 1ha,b(k) = ((ak +b) (modp)) (modm)LemmaC Theclass 1p,misuniversal.13UniversalityoftheFamilyTheclass 1p,misuniversal.ProofFortwodistinctkeysk ,= l:r = (ak +b) (modp)s = (al +b) (modp)r s = a(k l) (modp)r ,= sFurthermorewecansolveforaandb:a = ((r s)((k l)1(modp))) (modp)b = (r ak) (modp)Sothereisaone-to-onecorrespondencebetweenpairs(a, b)and(r, s). Ifwechoose(a, b)uniformlyatrandom, (r, s)areuniformlydistributed.14(r, s)areuniformlydistributed.Collisionwhenr = s mod m. Givenr, thenumberofcollidingsisatmostp/m| 1 (p +m 1)m1= (p 1)/mPrr = s mod m (p 1)/mp 1= 1/m15OpenAddressingOpenaddressingisanalternativetochaining,wherecollisionisresolvedbyputtingtheelementintoanopenslot.Todothisweassigntoeachkeyasequenceofaddressestosearchforanopenslot.Formally, weextendthehashfunctiontoonethattakestwoinputs, namelyamappingfromU?mto ?m, whereforeachk Utheslotsh(k, 0), . . . , h(k, m 1)areexaminedinthisorderandtherstopenoneisusedtostorek. Thesequenceh(k, 0), . . . , h(k, m 1))iscalledtheprobesequencefork. Wedesignthateachprobesequenceisapermutationof ?m.16DeletionwithOpenAddressingWecannotsimplydeleteanelement. Whendeletinganelementwestoreintheslotaspecial valueDELETEDtosignifythatakeyhasbeendeleted. Thismeansthatthecomputationtimefordeletiondependsonnottheloadfactorintheoriginal sensebutontheloadfactorthatevencountstheslotsthathavetheDELETEDag.CanwestoreaniteminaslotwithDELETEDlabel?17InsertionwithOpenAddressingToinsertanelementwithkeyk, weputitintherstopen(eithercompletelyemptyorhavingDELETED)slotintheprobesequencefork.SearchingwithOpenAddressingSearchingissubjecttotheprobesequenceofthekey. Itgoesonuntil eitherthekeyisfoundoracompletelyopenslotisencountered.18Threeprobesequenceschemes1. Linearprobing: Deneh(k, i) = (h

(k) +i) mod m,whereh

isanordinaryhashfunctionfromUto ?m.2. Quadraticprobing: Deneh(k, i) = (h

(k) +c1i +c2i2) mod m,whereh

isanordinaryhashfunctionandc1, c2 , 0 (modm).3. Doublehashing: Picktwoordinaryhashfunctionsh1, h2ofUto ?m. Deneh(k, i) = (h1(k) +ih2(k)) mod m.19PrimaryclusteringPrimaryclusteringisasituationinwhichthereisalonglineofoccupiedslots. PrimaryclusteringisobservedtypicallyinlinearprobingInlinearprobing, ifeveryotherslotisoccupied, thentheaverageunsuccessfulsearchtakes1.5probes. Ontheotherhand,ifthereisaclusterofonehalfoftheslots,thentheaveragenumberofprobesis1m

m2+m/2

i=1i =12+12m

m2

m2+1

=m8 +34.20AnalysisofOpen-AddressHashingLetmbethenumberofslotsandletnbethenumberofoccupiedslots, includingthosethatholdtheDELETEDlabel. Let=nm.TheoremD Supposethatall theprobesequences(all m!permutations)areequallylikelytooccurandthat< 1. Then, inanopen-addresshashing, theexpectednumberofprobesinanunsuccessful searchis 11.22ProofForeachi 0, denepi(respectively,qi)tobetheprobabilitythatexactly(respectively, atleast)iprobesaremadebeforendinganopenslot. Theexpectednumberofprobesis1 +

ni=1ipi.Forall i, 1 i n, qi =

nj=ipi. So,n

i=1ipi =n

i=1qi.Notethatqi=

nm

n 1m 1

n i +1m i +1

nm

i= i.So,theexpectednumberofprobesisatmost1 +

ni=1i

i=0i=11.23