Lucene Search Essentials: Scorers, Collectors and Custom Queries
-
Upload
lucenerevolution -
Category
Technology
-
view
1.341 -
download
1
description
Transcript of Lucene Search Essentials: Scorers, Collectors and Custom Queries
Scorers, Collectors and Custom Queries
Mikhail Khludnev
Custom Queries
Custom Queries
Custom Queries
http://nlp.stanford.edu/IR-book/
Custom Queries
http://nlp.stanford.edu/IR-book/
Custom Queries
Match Spotting
http://nlp.stanford.edu/IR-book/
Custom Queries ..hm what for ?
qf=STYLE TYPE
denim dress
qf=STYLE TYPE
denim dress
DisjunctionMaxQuery((
(STYLE:denim OR TYPE:denim) |
(STYLE:dress OR TYPE:dress)
))
qf=STYLE TYPEdenim dress
( DisjunctionMaxQuery((
STYLE:denim | TYPE:denim ))
)OR( DisjunctionMaxQuery((
STYLE:dress | TYPE::dress ))
)
Custom Queries
Inverted Index
T[0] = "it is what it is"T[1] = "what is it"T[2] = "it is a banana"
"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}
T[0] = "it is what it is"T[1] = "what is it"T[2] = "it is a banana"
"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1} postings list
term dictionary
"a" "banana""is"→"t""what"
{2}{2}{0, 1, 2}{0, 1, 2}{0, 1}
index/_1.tis
index/_1.frq
http://www.lib.rochester.edu/index.cfm?PAGE=489
What is a Scorer?
"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}
"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}
"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}
while(
(doc = nextDoc())!=NO_MORE_DOCS){
println("found "+ doc +
" with score "+score());
}
2783 issues
Note: Weight is omitted for sake of compactness
Custom Queries
http://nlp.stanford.edu/IR-book/
Doc-at-time search
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
what OR is OR a OR banana
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
what OR is OR a OR banana
"is": {0, 1, 2}
"what": {0, 1}
"a": {2}
"banana": {2}
"it": {0, 1, 2}
"is": {0, 1, 2}
"what": {0, 1}
"a": {2}
"banana": {2}
collect(0)score():2
Collector
"is": {0, 1, 2}
"what": {0, 1}
"a": {2}
"banana": {2}
docID×score0×2
"is": {0, 1, 2}
"what": {0, 1}
"a": {2}
"banana": {2}
collect(1)score():2
Collector0×2
"is": {0, 1, 2}
"what": {0, 1}
"a": {2}
"banana": {2}
Collector0×21×2
"is": {0, 1, 2}
"a": {2}
"banana": {2}
"what": {0, 1}collect(2)score():3
Collector0×21×2
Term-at-time search"lorem" "ipsum" "dolor""sit" "amet" "consectetur"
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
what OR is OR a OR banana
Accumulator... 0×1 ... 1×1 ...
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
Accumulator... 0×2 ... 1×2 ... 2×1 ...
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
Accumulator... 0×2 ... 1×2 ... 2×2 ...
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
Accumulator... 0x2 ... 1x2 ... 2x3 ...
Accumulator... 0×2 ... 1×2 ... 2×3 ...
Collector2×30×21×2
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
http://nlp.stanford.edu/IR-book/
"lorem" "ipsum" "dolor""sit" "amet" "consectetur"
O(n)
1×97×92×72×59×56×4......≤4......
k
n
http://en.wikipedia.org/wiki/Binary_heap
6×4
log k 9×5 2×4
2×7 7×9 1×9
...
...≤4......
n
q
p
what OR is OR a OR banana
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
doc at time term at time
complexity
memory
doc at time term at time
complexity O(p + n log k)
memory
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"what": {0, 1}
q1
1 2
2
doc at time term at time
complexity O(p log q + n log k) O(p + n log k)
memory
doc at time term at time
complexity O(p log q + n log k) O(p + n log k)
memory q + k
doc at time term at time
complexity O(p log q + n log k) O(p + n log k)
memory q + k n
BooleanScorer
×1
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
Hashtable[2]
org.apache.lucene.search.BooleanScorer
×1 0 1
chunk
x2
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
org.apache.lucene.search.BooleanScorer
x2 0 1
chunk
org.apache.lucene.search
Collector0×21×2×2 ×2
0 1
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
org.apache.lucene.search
Collector0×21×2×1
0 1
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
org.apache.lucene.search
Collector0×21×2×2
0 1
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
org.apache.lucene.search
Collector0×21×2×3
0 1
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
org.apache.lucene.search
Collector2×30×21×2
×3
0 1
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
Linked Open Hash [2K]
×1 ×1 ×5 ×2 ×2
0 1 2 3 4 5 6 7
×3
new BooleanScorer
new BooleanScorer2
//term-at-time
//doc-at-time
if ( collector.acceptsDocsOutOfOrder() && topScorer &&
required.size() == 0 && minNrShouldMatch == 1) {
else
q=village operations years disaster visit
q=village operations years disaster visit etc map seventieth peneplains tussock sir memory character campaign author public wonder forker middy vocalize enable race object signal symptom deputy where typhous rectifiable polygamous originally look generation ultimately reasonably ratio numb apposing enroll manhood problem suddenly definitely corp event material affair diploma would dimout speech notion engine artist hotel text field hashed rottener impeding i cricket virtually valley sunday rock come observes gallnuts vibrantly prize involve
q=+village +operations +years +disaster +visit
Conjunction(+, MUST)
"a": {2,3}
"banana": {2,3}
"is": {0, 1, 2, 3}
"it": {0, 1, 3}
"what": {0, 1, 3}
what AND is AND a AND it
"a": {2,3}
"banana": {2,3}
"is": {0, 1, 2, 3}
"it": {0, 1, 3}
"what": {0, 1, 3}
"a": {2,3}
"banana": {2,3}
"is": {0, 1, 2, 3}
"it": {0, 1, 3}
"what": {0, 1, 3}
"a": {2,3}
"banana": {2,3}
"is": {0, 1, 2, 3}
"it": {0, 1, 3}
"what": {0, 1, 3}
"a": {2,3}
"banana": {2,3}
"is": {0, 1, 2, 3}
"it": {0, 1, 3}
"what": {0, 1, 3}
"a": {2,3}
"banana": {2,3}
"is": {0, 1, 2, 3}
"it": {0, 1, 3}
"what": {0, 1, 3}Collector
3 x 4
http://www.flickr.com/photos/fatniu/184615348/
Ω(n q + n log k)
Wrap-up● doc-at-time vs term-at-time
● conjunction and leapfrog
complexity O(n)
memory O(const)
Custom Queries
http://nlp.stanford.edu/IR-book/
Custom Queries
● Sample Coverage Query
● Deeply Branched vs Flat
● minShouldMatch
● Filtering
● Performance Problem
"silver" "jeans" "dress"
silver jeans dress
Note: "foo bar" is not a phrase query, just a string
"silver" "jeans" "dress""silver jeans dress"
silver jeans dress
"silver" "jeans" "dress""silver jeans dress""silver jeans" "dress""silver" "jeans dress"
silver jeans dress
"silver" "jeans" "dress""silver jeans dress""silver jeans" "dress""silver" "jeans dress"
"silver" "dress""silver jeans" "jeans""silver jeans""jeans" "dress"
silver jeans dress
Note: "foo bar" is not a phrase query, just a string
boolean verifyMatch(){ int sumLength=0; for(Scorer child:getChildren()){ if(child.docID()==docID()){ TermQuery tq=child.weight.query; sumLength += tq.term.text.length; } } return sumLength>=expectedLength;}
Deeply Branched vs Flat
(+"silver jeans" +"dress")ORmax
(+"silver jeans dress")ORmax
(+"silver" +((+"jeans" +"dress")
ORmax +"jeans dress"
) )
ORmax is DisjunctionMaxQuery
(+"silver jeans" +"dress")ORmax
(+"silver jeans dress")ORmax
(+"silver" +((+"jeans" +"dress")
ORmax +"jeans dress"
) )
ORmax is DisjunctionMaxQuery
(+"silver jeans" +"dress")ORmax
(+"silver jeans dress")ORmax
(+"silver" +((+"jeans" +"dress")
ORmax +"jeans dress"
) )
ORmax is DisjunctionMaxQuery
("silver jeans" "dress")ORmax
("silver jeans dress")ORmax
("silver" (("jeans" "dress")
ORmax "jeans dress"
) )
ORmax is DisjunctionMaxQuery
B:"silver jeans dress" ORmaxT:"silver jeans dress" ORmaxS:"silver jeans dress"
B:"silver" ORmaxT:"silver" ORmaxS:"silver"
+B:"jeans dress" ORmaxT:"jeans dress" ORmaxS:"jeans dress"
+
ORmax
ORmax
ORmax
B:"silver jeans" ORmaxT:"silver jeans" ORmaxS:"silver jeans"
+B:"dress" ORmaxT:"dress" ORmaxS:"dress"
+
B:"jeans" ORmaxT:"jeans" ORmaxS:"jeans"
+B:"dress" ORmaxT:"dress" ORmaxS:"dress"
+
B - BRANDT - TYPES - STYLE
B:"silver" T:"silver" S:"silver"
B:"jeans" T:"jeans" S:"jeans"
B:"dress" T:"dress" S:"dress"
B:"silver jeans" T:"silver jeans" S:"silver jeans"
B:"silver jeans dress" T:"silver jeans dress"
S:"silver jeans dress"
B:"jeans dress" T:"jeans dress" S:"jeans dress"
Steadiness problemAFAIK 3.x only.
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{2,3, 27,31,..}
{..., 30,37,..}
3
3 20
3 30 30
{..., 30, 31,32,..}{..., 20, 27,32,..}
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{2,3, 27,31,..}
{..., 30,37,..}
5
7 20
27 30 30
{..., 30, 31,32,..}{..., 20, 27,32,..}
3docID=
3.x
minShouldMatch
straight jeans
silver jeans
silver jeans straight
jeans
silver
minShouldMatch=2
straight silver jeans
int nextDoc() {while(true) {
while (subScorers[0].docID() == doc) { if (subScorers[0].nextDoc() != NO_DOCS) { heapAdjust(0); } else { .... } } ... if (nrMatchers >= minimumNrMatchers) { break; }
}return doc;
}
org.apache.lucene.search.DisjunctionSumScorer
Let’s filter!btw, what it is?
RANDOM_ACCESS_FILTER_STRATEGY
LEAP_FROG_FILTER_FIRST_STRATEGY
LEAP_FROG_QUERY_FIRST_STRATEGY
QUERY_FIRST_FILTER_STRATEGY
http://localhost:8983/solr/collection1/select
?q=village operations years disaster visit etc map
seventieth peneplains tussock sir memory character
campaign author public wonder forker middy vocalize
enable race object signal symptom deputy where
generation ultimately reasonably ratio numb apposing
enroll manhood problem suddenly definitely corp event
gallnuts vibrantly prize involve explanation module&
qf=text_all&defType=edismax&
http://localhost:8983/solr/collection1/select
?q=village operations years disaster visit etc map
seventieth peneplains tussock sir memory character
campaign author public wonder forker middy vocalize
enable race object signal symptom deputy where
generation ultimately reasonably ratio numb apposing
enroll manhood problem suddenly definitely corp event
gallnuts vibrantly prize involve explanation module&
qf=text_all&defType=edismax&
fq= id:yes_49912894 id:nurse_30134968&
http://localhost:8983/solr/collection1/select
?q=village operations years disaster visit etc map
seventieth peneplains tussock sir memory character
campaign author public wonder forker middy vocalize
enable race object signal symptom deputy where
generation ultimately reasonably ratio numb apposing
enroll manhood problem suddenly definitely corp event
gallnuts vibrantly prize involve explanation module&
qf=text_all&defType=edismax&
fq= id:yes_49912894 id:nurse_30134968&
mm=32&
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{ 20,27,31,..}
mm=3 { 30,37,..}
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{ 20,27,31,..}
mm=3 { 30,37,..}
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{ 20,27,31,..}
mm=3 { 30,37,..}
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{ 20,27,31,..}
mm=3 { 30,37,..}
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{ 20,27,31,..}
mm=3 { 30,37,..}
Custom Queries
Match Spotting
http://nlp.stanford.edu/IR-book/
BRAND:"silver jeans" TYPE:"dress" STYLE:"white"
BRAND:"alfani" TYPE:"dress" STYLE:"silver","jeans"
BRAND:"chaloree" TYPE:"dress" STYLE:"silver"
BRAND:"style&co" TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans" TYPE:"dress" STYLE:"black"
BRAND:"silver jeans" TYPE:"dress" STYLE:"white"
BRAND:"silver jeans" TYPE:"jacket" STYLE: "black"
BRAND:"angie" TYPE:"dress" STYLE:"silver","jeans"
BRAND:"chaloree" TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans" TYPE:"dress" STYLE:"blue"
BRAND:"dotty" TYPE:"dress" STYLE:"silver","jeans"
BRAND:"chaloree" STYLE:"jeans" "dress"
BRAND:"silver jeans" TYPE:"dress" STYLE:"white"
BRAND:"alfani" TYPE:"dress" STYLE:"silver","jeans"
BRAND:"chaloree" TYPE:"dress" STYLE:"silver"
BRAND:"style&co" TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans" TYPE:"dress" STYLE:"black"
BRAND:"silver jeans" TYPE:"dress" STYLE:"white"
BRAND:"silver jeans" TYPE:"jacket" STYLE: "black"
BRAND:"angie" TYPE:"dress" STYLE:"silver","jeans"
BRAND:"chaloree" TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans" TYPE:"dress" STYLE:"blue"
BRAND:"dotty" TYPE:"dress" STYLE:"silver","jeans"
BRAND:"chaloree" STYLE:"jeans" "dress"
silver jeans dress
BRAND:"silver jeans" TYPE:"dress" STYLE:"white"
BRAND:"alfani" TYPE:"dress" STYLE:"silver","jeans"
BRAND:"chaloree" TYPE:"dress" STYLE:"silver"
BRAND:"style&co" TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans" TYPE:"dress" STYLE:"black"
BRAND:"silver jeans" TYPE:"dress" STYLE:"white"
BRAND:"silver jeans" TYPE:"jacket" STYLE: "black"
BRAND:"angie" TYPE:"dress" STYLE:"silver","jeans"
BRAND:"chaloree" TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans" TYPE:"dress" STYLE:"blue"
BRAND:"dotty" TYPE:"dress" STYLE:"silver","jeans"
BRAND:"chaloree" STYLE:"jeans" "dress"
BRAND:"silver jeans" TYPE:"dress"
TYPE:"dress" STYLE:"silver","jeans"
TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans" TYPE:"dress"
BRAND:"silver jeans" TYPE:"dress"
TYPE:"dress" STYLE:"silver","jeans"
TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans" TYPE:"dress"
TYPE:"dress" STYLE:"silver","jeans"
BRAND:"silver jeans" TYPE:"dress"
TYPE:"dress" STYLE:"silver","jeans"
TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans" TYPE:"dress"
BRAND:"silver jeans" TYPE:"dress"
TYPE:"dress" STYLE:"silver","jeans"
TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans" TYPE:"dress"
TYPE:"dress" STYLE:"silver","jeans"
BRAND:"silver jeans" TYPE:"dress" (4)TYPE:"dress" STYLE:"silver","jeans"
TYPE:"jeans dress" STYLE:"silver"
TYPE:"dress" STYLE:"silver","jeans"
TYPE:"jeans dress" STYLE:"silver"
TYPE:"dress" STYLE:"silver","jeans"
BRAND:"silver jeans" TYPE:"dress" (4)TYPE:"dress" STYLE:"silver","jeans"
TYPE:"jeans dress" STYLE:"silver"
TYPE:"dress" STYLE:"silver","jeans"
TYPE:"jeans dress" STYLE:"silver"
TYPE:"dress" STYLE:"silver","jeans"
BRAND:"silver jeans" TYPE:"dress" (4)
TYPE:"dress" STYLE:"silver","jeans" (3)
TYPE:"jeans dress" STYLE:"silver"
TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans" TYPE:"dress" (4)
TYPE:"dress" STYLE:"silver","jeans" (3)
TYPE:"jeans dress" STYLE:"silver" (2)
BRAND:"silver jeans" TYPE:"dress" (4)
TYPE:"dress" STYLE:"silver","jeans" (3)
TYPE:"jeans dress" STYLE:"silver" (2)
silver jeans dress
BRAND:"silver jeans" TYPE:"dress" (4)
TYPE:"dress" STYLE:"silver","jeans" (3)
TYPE:"jeans dress" STYLE:"silver" (2)
silver jeans dress
Scorers, Collectors and Custom Queries
http://google.com/+MikhailKhludnev
http://goo.gl/7LJFi
Appendixes● Drill Sideways Facets● Collectors
Appendix D
Drill Sideways Facets
+CATEGORY: Denim +FIT: Straight +WASH: Dark&B
+CATEGORY: Denim +FIT: Straight +WASH: Dark&B
+CATEGORY: Denim +WASH: Dark&B
+CATEGORY: Denim +FIT: Straight +WASH: Dark&B
+CATEGORY: Denim +WASH: Dark&B
+CATEGORY: Denim +FIT: Straight
+CATEGORY: Denim FIT: Straight WASH: Dark&Black ... /minShouldMatch=Ndrilldowns-1
+CAT: Denim
FIT: Straight
WASH: Dark
+CAT: Denim
FIT: Straight
WASH: Dark
totalHits3
near miss2
near miss2
+CAT: Denim
FIT: Straight
WASH: Dark
totalHits3
near miss2
near miss2
+CAT: Denim
FIT: Straight
WASH: Dark
totalHits3
near miss2
near miss2
Doc at timebase query is highly selective
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
TopDocsCollector
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
TopDocsCollector
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
TopDocsCollector
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
TopDocsCollector
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
TopDocsCollector
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
TopDocsCollector
Term at timedrilldown queries are highly selective
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
hits 1
miss Fit
hits 1
miss Fit
hits 1
miss Fit
hits 1
miss Fit
hits 1
miss Fit
1 2 7 11 12 13 1510
8 9...
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
hits 1
miss Fit
hits 1
miss Fit
hits 1
miss Fit
hits 2
miss no
1 2 7 11 12 13 1510
hits 1
miss Wash
hits 1
missWash
8 9...
hits 1
miss Wash
hits 2
miss no
hits 1
miss Wash
+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...
hits 1
miss Wash Cat
hits 1
miss FitCat
hits 1
miss Wash Cat
hits 1
miss Fit Cat
hits 2
miss Fit
hits 2
miss Cat
1 2 7 11 12 13 1510
hits 1
missWash Cat
8 9...
hits 3
miss
hits 2
miss Wash
hits 1
miss Wash Cat
hits 1
miss FitCat
hits 1
miss Wash Cat
hits 1
miss Fit Cat
hits 2
miss Fit
hits 2
miss Cat
1 2 7 11 12 13 1510
hits 1
missWash Cat
8 9...
hits 3
miss no
hits 2
miss Wash
hits 2
miss Fit
1 2 7 11 12 13 15108 9...
hits 3
miss no
hits 2
miss Wash
TopDocsCollector
TopDocsCollector
hits 2
miss Fit
1 2 7 11 12 13 15108 9...
hits 3
miss no
hits 2
miss Wash
TopDocsCollector
hits 2
miss Fit
1 2 7 11 12 13 15108 9...
hits 3
miss no
hits 2
miss Wash
Collector
DocSetCollector TopDocsCollector
TopFieldCollector
TopScoreDocsCollector
long [952045] = { 0, 0, 0, 0, 2050, 0, 0, 8, 0, 0, 0,... }
int [2079] = {4, 12, 45, 67, 103, 673, 5890, 34103,...}
int [100] = {8947, 7498,1, 230, 2356, 9812, 167,....}
DocSet or DocList?
DocList/TopDoc DocSet
Sizek
(numHits or rows)
N(maxDocs)
Ordered by score or field docID
Out-of-order collecting allows*
almost could allow
(No)
?×4 6×4
9×5 2×4
2×7 7×9 1×9
http://www.flickr.com/photos/jbagley/4303976811/sizes/o/
class OutOfOrderTopScoreDocCollector
boolean acceptsDocsOutOfOrder(){ return true; } .. void collect(int doc) { float score = scorer.score(); ... if (score == pqTop.score && doc > pqTop.doc) { ...}
UML
http://www.flickr.com/photos/kristykay/2922670979/lightbox/