Writing custom queries scorers’ diversity and traps (lucene internals)(2)

171
WRITING CUSTOM QUERIES: SCORERS' DIVERSITY AND TRAPS Mikhail Khludnev Principal Engineer, eCommerce Search Team [email protected] http://goo.gl/7LJFi

description

Presented by Mikhail Khludnev, Grid Dynamics Lucene has number of built-in queries, but sometimes developer needs to write own queries that might be challenging. We’ll start from the basics: learn how Lucene searches, look into few build-in queries implementations, and learn two basic approaches for query evaluation. Then I share experience which my team got when built one eCommerce Search platform, we’ll look at sample custom query or even a few ones, and talk about potential problems and caveats on that way.

Transcript of Writing custom queries scorers’ diversity and traps (lucene internals)(2)

Page 1: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

WRITING CUSTOM QUERIES: SCORERS' DIVERSITY AND TRAPS

Mikhail KhludnevPrincipal Engineer, eCommerce Search [email protected]

http://goo.gl/7LJFi

Page 2: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Custom Queries

Page 3: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Custom Queries

Page 4: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Custom Queries

http://nlp.stanford.edu/IR-book/

Page 5: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Custom Queries

http://nlp.stanford.edu/IR-book/

Page 6: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Custom Queries

Match Spotting

http://nlp.stanford.edu/IR-book/

Page 7: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Custom Queries

Page 8: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Inverted Index

Page 9: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

T[0] = "it is what it is"T[1] = "what is it"T[2] = "it is a banana"

Page 10: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}

T[0] = "it is what it is"T[1] = "what is it"T[2] = "it is a banana"

Page 11: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1} postings list

Page 12: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

What is a Scorer?

Page 13: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}

Page 14: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}

Page 15: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}

Page 16: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 17: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 18: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

while(

(doc = nextDoc())!=NO_MORE_DOCS){

println("found "+ doc +

" with score "+score());

}

Page 19: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 20: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Note: Weight is omitted for sake of compactness

Page 21: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 22: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Custom Queries

http://nlp.stanford.edu/IR-book/

Page 23: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Doc-at-time search

Page 24: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

what OR is OR a OR banana

Page 25: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

what OR is OR a OR banana

Page 26: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"is": {0, 1, 2}

"what": {0, 1}

"a": {2}

"banana": {2}

"it": {0, 1, 2}

Page 27: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"is": {0, 1, 2}

"what": {0, 1}

"a": {2}

"banana": {2}collect(0)score():2

Collector

Page 28: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"is": {0, 1, 2}

"what": {0, 1}

"a": {2}

"banana": {2}

docID×score0×2

Page 29: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"is": {0, 1, 2}

"what": {0, 1}

"a": {2}

"banana": {2}collect(1)score():2

Collector0×2

Page 30: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"is": {0, 1, 2}

"what": {0, 1}

"a": {2}

"banana": {2}

Collector0×21×2

Page 31: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"is": {0, 1, 2}

"a": {2}

"banana": {2}

"what": {0, 1}collect(2)score():3

Collector0×21×2

Page 32: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"is": {0, 1, 2}

"a": {2}

"banana": {2}

"what": {0, 1}

Collector2×30×21×2

Page 33: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Term-at-time searchsee Appendix

Page 34: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 35: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

doc at time term at time

complexity O(p log q + n log k) O(p + n log k)

memory q + k n

Page 36: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

q=village operations years disaster visit etc map seventieth peneplains tussock sir memory character campaign author public wonder forker middy vocalize enable race object signal symptom deputy where typhous rectifiable polygamous originally look generation ultimately reasonably ratio numb apposing enroll manhood problem suddenly definitely corp event material affair diploma would dimout speech notion engine artist hotel text field hashed rottener impeding i cricket virtually valley sunday rock come observes gallnuts vibrantly prize involve

Page 37: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

q=village operations years disaster visit

Page 38: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

q=+village +operations +years +disaster +visit

Page 39: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Conjunction(+, MUST)

Page 40: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}

what AND is AND a AND it

Page 41: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}

Page 42: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}

Page 43: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}

Page 44: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}

Page 45: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}

Collector3 x 4

Page 46: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

http://www.flickr.com/photos/fatniu/184615348/

Page 47: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Ω(n q + n log k)

Page 48: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Wrap-up● doc-at-time vs term-at-time

● leapfrog

Page 49: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Custom Queries

http://nlp.stanford.edu/IR-book/

Page 50: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Custom Queries

● HelloWorld

● Deeply Branched vs Flat

● Steadiness Problem

● minShouldMatch Performance Problem

● Filtering Performance Problem

Page 51: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"silver" "jeans" "dress"

silver jeans dress

Note: "foo bar" is not a phrase query, just a string

Page 52: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"silver" "jeans" "dress""silver jeans dress"

silver jeans dress

Page 53: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"silver" "jeans" "dress""silver jeans dress""silver jeans" "dress""silver" "jeans dress"

silver jeans dress

Page 54: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"silver" "jeans" "dress""silver jeans dress""silver jeans" "dress""silver" "jeans dress"

"silver" "dress""silver jeans" "jeans""silver jeans""jeans" "dress"

silver jeans dress

Note: "foo bar" is not a phrase query, just a string

Page 55: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 56: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 57: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 58: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

boolean verifyMatch(){ int sumLength=0; for(Scorer child:getChildren()){ if(child.docID()==docID()){ TermQuery tq=child.weight.query; sumLength += tq.term.text.length; } } return sumLength>=expectedLength;}

Page 59: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Deeply Branched vs Flat

Page 60: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

(+"silver jeans" +"dress")ORmax

(+"silver jeans dress")ORmax

(+"silver" +((+"jeans" +"dress")

ORmax +"jeans dress"

) )

ORmax is DisjunctionMaxQuery

Page 61: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

(+"silver jeans" +"dress")ORmax

(+"silver jeans dress")ORmax

(+"silver" +((+"jeans" +"dress")

ORmax +"jeans dress"

) )

ORmax is DisjunctionMaxQuery

Page 62: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

(+"silver jeans" +"dress")ORmax

(+"silver jeans dress")ORmax

(+"silver" +((+"jeans" +"dress")

ORmax +"jeans dress"

) )

ORmax is DisjunctionMaxQuery

Page 63: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

("silver jeans" "dress")ORmax

("silver jeans dress")ORmax

("silver" (("jeans" "dress")

ORmax "jeans dress"

) )

ORmax is DisjunctionMaxQuery

Page 64: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

B:"silver jeans dress" ORmaxT:"silver jeans dress" ORmaxS:"silver jeans dress"

B:"silver" ORmaxT:"silver" ORmaxS:"silver"

+B:"jeans dress" ORmaxT:"jeans dress" ORmaxS:"jeans dress"

+

ORmax

ORmax

ORmax

B:"silver jeans" ORmaxT:"silver jeans" ORmaxS:"silver jeans"

+B:"dress" ORmaxT:"dress" ORmaxS:"dress"

+

B:"jeans" ORmaxT:"jeans" ORmaxS:"jeans"

+B:"dress" ORmaxT:"dress" ORmaxS:"dress"

+

B - BRANDT - TYPES - STYLE

Page 65: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

B:"silver" T:"silver" S:"silver"

B:"jeans" T:"jeans" S:"jeans"

B:"dress" T:"dress" S:"dress"

B:"silver jeans" T:"silver jeans" S:"silver jeans"

B:"silver jeans dress" T:"silver jeans dress"

S:"silver jeans dress"

B:"jeans dress" T:"jeans dress" S:"jeans dress"

Page 66: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 67: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Steadiness problemAFAIK 3.x only.

Page 68: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{2,3, 27,31,..}

{..., 30,37,..}

3

3 20

3 30 30

{..., 30, 31,32,..}{..., 20, 27,32,..}

Page 69: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{2,3, 27,31,..}

{..., 30,37,..}

5

7 20

27 30 30

{..., 30, 31,32,..}{..., 20, 27,32,..}

3docID=

3.x

Page 70: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 71: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

straight jeans

silver jeans

silver jeans straight

jeans

silver

minShouldMatch=2

straight silver jeans

Page 72: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

int nextDoc() {while(true) {

while (subScorers[0].docID() == doc) { if (subScorers[0].nextDoc() != NO_DOCS) { heapAdjust(0); } else { .... } } ... if (nrMatchers >= minimumNrMatchers) { break; }

}return doc;

}

org.apache.lucene.search.DisjunctionSumScorer

Page 73: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{ 20,27,31,..}

mm=3 { 30,37,..}

Page 74: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{ 20,27,31,..}

mm=3 { 30,37,..}

Page 75: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{ 20,27,31,..}

mm=3 { 30,37,..}

Page 76: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{ 20,27,31,..}

mm=3 { 30,37,..}

Page 77: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{ 20,27,31,..}

mm=3 { 30,37,..}

Page 78: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 79: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Filtering

Page 80: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 81: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 82: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

RANDOM_ACCESS_FILTER_STRATEGY

LEAP_FROG_FILTER_FIRST_STRATEGY

LEAP_FROG_QUERY_FIRST_STRATEGY

QUERY_FIRST_FILTER_STRATEGY

Page 83: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

minShouldMatch meets Filters

Page 84: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

http://localhost:8983/solr/collection1/select

?q={!cache=false}village AND village operations years disaster visit etc

map seventieth peneplains tussock sir memory character campaign author

public wonder forker middy vocalize enable race object signal symptom

deputy where typhous rectifiable polygamous originally look generation

ultimately reasonably ratio numb apposing enroll manhood problem

suddenly definitely corp event material affair diploma would dimout speech

notion engine artist hotel text field hashed rottener impeding i cricket

virtually valley sunday rock come observes gallnuts vibrantly prize involve

explanation module&

qf=text_all&

defType=edismax&

mm=32&

fq= id:yes_49912894 id:nurse_30134968

Page 85: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 86: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

CONFERENCE PARTYThe Tipsy Crow: 770 5th AveStarts after Stump The ChumpYour conference badge getsyou in the door

TOMORROW Breakfast starts at 7:30Keynotes start at 8:30

CONTACTMikhail [email protected]

http://goo.gl/7LJFi

Page 87: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Appendixes● Term-at-time search in Lucene/Solr● Derivation of the search complexity● Match Spotting● Drill Sideways Facets

Page 88: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Appendix B

Term-at-time Search in Lucene

Page 89: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

what OR is OR a OR banana

Page 90: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Accumulator... 0×1 ... 1×1 ...

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 91: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Accumulator... 0×2 ... 1×2 ... 2×1 ...

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 92: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Accumulator... 0×2 ... 1×2 ... 2×2 ...

Page 93: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Accumulator... 0x2 ... 1x2 ... 2x3 ...

Page 94: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Accumulator... 0×2 ... 1×2 ... 2×3 ...

Collector2×30×21×2

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 95: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BooleanScorer2

Page 96: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

×1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Hashtable[2]

org.apache.lucene.search.BooleanScorer

×1 0 1

chunk

Page 97: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

x2

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

org.apache.lucene.search.BooleanScorer

x2 0 1

chunk

Page 98: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

org.apache.lucene.search

Collector0×21×2×2 ×2

0 1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 99: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

org.apache.lucene.search

Collector0×21×2×1

0 1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 100: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

org.apache.lucene.search

Collector0×21×2×2

0 1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 101: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

org.apache.lucene.search

Collector0×21×2×3

0 1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 102: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

org.apache.lucene.search

Collector2×30×21×2

×3

0 1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 103: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

new BooleanScorer

new BooleanScorer2

//term-at-time

//doc-at-time

if ( collector.acceptsDocsOutOfOrder() && topScorer &&

required.size() == 0 && minNrShouldMatch == 1) {

else

Page 104: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Linked Open Hash [2K]

×1 ×1 ×5 ×2 ×2

0 1 2 3 4 5 6 7

×3

Page 105: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Collector

DocSetCollector TopDocsCollector

TopFieldCollector

TopScoreDocsCollector

Page 106: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

long [952045] = { 0, 0, 0, 0, 2050, 0, 0, 8, 0, 0, 0,... }

int [2079] = {4, 12, 45, 67, 103, 673, 5890, 34103,...}

int [100] = {8947, 7498,1, 230, 2356, 9812, 167,....}

DocSet or DocList?

Page 107: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

DocList/TopDocs DocSet

Size

Ordered by

Out-of-order collecting

k(numHits/

rows)

N(maxDocs)

score or field

docID

allows* almost could allow (No)

Page 108: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

?×4 6×4

9×5 2×4

2×7 7×9 1×9

Page 109: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

http://www.flickr.com/photos/jbagley/4303976811/sizes/o/

Page 110: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

class OutOfOrderTopScoreDocCollector

boolean acceptsDocsOutOfOrder(){ return true; } .. void collect(int doc) { float score = scorer.score(); ... if (score == pqTop.score && doc > pqTop.doc) { ...}

Page 111: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 112: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Appendix B

Derivation of the Search Complexity

Page 113: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

1×97×92×72×59×56×4

...

...≤4......

k

n

Page 115: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

6×4

log k 9×5 2×4

2×7 7×9 1×9

...

...≤4......

n

Page 116: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

q

p

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

what OR is OR a OR banana

Page 117: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

doc at time term at time

complexity O(p + n log k)

memory

Page 118: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

q

p

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

what OR is OR a OR banana

1

1 2

2

Page 119: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

doc at time term at time

complexity O(p log q + n log k) O(p + n log k)

memory

Page 120: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Custom Queries

Match Spotting

http://nlp.stanford.edu/IR-book/

Appendix C

Page 121: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"alfani" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"dress" STYLE:"silver"

BRAND:"style&co" TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" STYLE:"black"

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"silver jeans" TYPE:"jacket" STYLE: "black"

BRAND:"angie" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" STYLE:"blue"

BRAND:"dotty" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" STYLE:"jeans" "dress"

Page 122: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"alfani" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"dress" STYLE:"silver"

BRAND:"style&co" TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" STYLE:"black"

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"silver jeans" TYPE:"jacket" STYLE: "black"

BRAND:"angie" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" STYLE:"blue"

BRAND:"dotty" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" STYLE:"jeans" "dress"

silver jeans dress

Page 123: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"alfani" TYPE:"dress" STYLE:"silver","jeans"BRAND:"chaloree" TYPE:"dress" STYLE:"silver"

BRAND:"style&co" TYPE:"jeans dress" STYLE:"silver"BRAND:"silver jeans" TYPE:"dress" STYLE:"black"

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"silver jeans" TYPE:"jacket" STYLE: "black"

BRAND:"angie" TYPE:"dress" STYLE:"silver","jeans"BRAND:"chaloree" TYPE:"jeans dress" STYLE:"silver"BRAND:"silver jeans" TYPE:"dress" STYLE:"blue"

BRAND:"dotty" TYPE:"dress" STYLE:"silver","jeans"BRAND:"chaloree" STYLE:"jeans" "dress"

Page 124: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BRAND:"silver jeans" TYPE:"dress" TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"BRAND:"silver jeans" TYPE:"dress" BRAND:"silver jeans" TYPE:"dress"

TYPE:"dress" STYLE:"silver","jeans"TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" TYPE:"dress" STYLE:"silver","jeans"

Page 125: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BRAND:"silver jeans" TYPE:"dress" TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"BRAND:"silver jeans" TYPE:"dress"

BRAND:"silver jeans" TYPE:"dress"

TYPE:"dress" STYLE:"silver","jeans"TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" TYPE:"dress" STYLE:"silver","jeans"

Page 126: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BRAND:"silver jeans" TYPE:"dress" (4)TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"

TYPE:"dress" STYLE:"silver","jeans"TYPE:"jeans dress" STYLE:"silver"

TYPE:"dress" STYLE:"silver","jeans"

Page 127: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BRAND:"silver jeans" TYPE:"dress" (4)TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"

TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"

TYPE:"dress" STYLE:"silver","jeans"

Page 128: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BRAND:"silver jeans" TYPE:"dress" (4)TYPE:"dress" STYLE:"silver","jeans" (3)

TYPE:"jeans dress" STYLE:"silver"

TYPE:"jeans dress" STYLE:"silver"

Page 129: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BRAND:"silver jeans" TYPE:"dress" (4)TYPE:"dress" STYLE:"silver","jeans" (3)

TYPE:"jeans dress" STYLE:"silver" (2)

Page 130: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BRAND:"silver jeans" TYPE:"dress" (4)TYPE:"dress" STYLE:"silver","jeans" (3)TYPE:"jeans dress" STYLE:"silver" (2)

silver jeans dress

Page 131: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

BRAND:"silver jeans" TYPE:"dress" (4)TYPE:"dress" STYLE:"silver","jeans" (3)TYPE:"jeans dress" STYLE:"silver" (2)

silver jeans dress

Page 132: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Appendix D

Drill Sideways Facets

Page 133: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 134: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 135: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 136: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 137: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 138: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)
Page 139: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CATEGORY: Denim +FIT: Straight +WASH: Dark&B

Page 140: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CATEGORY: Denim +FIT: Straight +WASH: Dark&B

+CATEGORY: Denim +WASH: Dark&B

Page 141: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CATEGORY: Denim +FIT: Straight +WASH: Dark&B

+CATEGORY: Denim +WASH: Dark&B

+CATEGORY: Denim +FIT: Straight

Page 142: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CATEGORY: Denim FIT: Straight WASH: Dark&Black ... /minShouldMatch=Ndrilldowns-1

Page 143: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT: Denim

FIT: Straight

WASH: Dark

Page 144: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT: Denim

FIT: Straight

WASH: Dark

totalHits3

near miss2

near miss2

Page 145: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT: Denim

FIT: Straight

WASH: Dark

totalHits3

near miss2

near miss2

Page 146: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT: Denim

FIT: Straight

WASH: Dark

totalHits3

near miss2

near miss2

Page 147: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Doc at timebase query is highly selective

Page 148: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

Page 149: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

Page 150: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

Page 151: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 152: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 153: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 154: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 155: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 156: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 157: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Term at timedrilldown queries are highly selective

Page 158: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

hits 1

miss Fit

hits 1

miss Fit

hits 1

miss Fit

hits 1

miss Fit

hits 1

miss Fit

1 2 7 11 12 13 1510

8 9...

Page 159: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

hits 1

miss Fit

hits 1

miss Fit

hits 1

miss Fit

hits 2

miss no

1 2 7 11 12 13 1510

hits 1

miss Wash

hits 1

missWash

8 9...

hits 1

miss Wash

hits 2

miss no

hits 1

miss Wash

Page 160: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

hits 1

miss Wash Cat

hits 1

miss FitCat

hits 1

miss Wash Cat

hits 1

miss Fit Cat

hits 2

miss Fit

hits 2

miss Cat

1 2 7 11 12 13 1510

hits 1

missWash Cat

8 9...

hits 3

miss

hits 2

miss Wash

Page 161: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

hits 1

miss Wash Cat

hits 1

miss FitCat

hits 1

miss Wash Cat

hits 1

miss Fit Cat

hits 2

miss Fit

hits 2

miss Cat

1 2 7 11 12 13 1510

hits 1

missWash Cat

8 9...

hits 3

miss no

hits 2

miss Wash

Page 162: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

hits 2

miss Fit

1 2 7 11 12 13 15108 9...

hits 3

miss no

hits 2

miss Wash

TopDocsCollector

Page 163: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

TopDocsCollector

hits 2

miss Fit

1 2 7 11 12 13 15108 9...

hits 3

miss no

hits 2

miss Wash

Page 164: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

TopDocsCollector

hits 2

miss Fit

1 2 7 11 12 13 15108 9...

hits 3

miss no

hits 2

miss Wash

Page 165: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Overflow

Page 166: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

UML

http://www.flickr.com/photos/kristykay/2922670979/lightbox/

Page 167: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

Custom Queries ..hm what for ?

Page 168: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

qf=STYLE TYPE

denim dress

Page 169: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

qf=STYLE TYPE

denim dress

DisjunctionMaxQuery((

(STYLE:denim OR TYPE:denim) |

(STYLE:dress OR TYPE:dress)

))

Page 170: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)

qf=STYLE TYPEdenim dress

( DisjunctionMaxQuery((

STYLE:denim | TYPE:denim ))

)OR( DisjunctionMaxQuery((

STYLE:dress | TYPE::dress ))

)

Page 171: Writing custom queries  scorers’ diversity and traps (lucene internals)(2)