CouchDB Day NYC 2017: Full Text Search

8
CouchDB Developer Day Full-Text Search Lab

Transcript of CouchDB Day NYC 2017: Full Text Search

Page 1: CouchDB Day NYC 2017: Full Text Search

CouchDB Developer Day

Full-Text Search Lab

Page 2: CouchDB Day NYC 2017: Full Text Search

Create a Cloudant account• Go to https://cloudant.com/sign-up/• Sign up!

Page 3: CouchDB Day NYC 2017: Full Text Search

Setupcurl $account.cloudant.com/foo –X PUTcurl $account.cloudant.com/foo/_design/bar –X PUT –d '{"indexes":{"baz":{"index":"function(doc){index(\"color\", doc.color); index(\"size\", doc.size);}"}}}'curl $account.cloudant.com/foo/doc1 –X PUT –d '{"size": "small", "color": "green"}'curl $account.cloudant.com/foo/doc2 –X PUT –d '{"size": "large", "color": "green"}'curl $account.cloudant.com/foo/doc3 –X PUT –d '{"size": "small", "color": "red"}'

Page 4: CouchDB Day NYC 2017: Full Text Search

Searchingcurl $account.cloudant.com/foo/_design/bar/_search/baz?q=size:small

curl $account.cloudant.com/foo/_design/bar/_search/baz?q=size:large

curl $account.cloudant.com/foo/_design/bar/_search/baz?q=color:red

curl $account.cloudant.com/foo/_design/bar/_search/baz?q=size:small%20AND%20color:red

Page 5: CouchDB Day NYC 2017: Full Text Search

PaginationEvery search request returns a "bookmark" attribute. Pass this back to Cloudant to get the next "page" of results.

curl https://$account.cloudant.com/foo/_design/bar/_search/baz?q=*:*&limit=1

curl https://$account.cloudant.com/_design/bar/_search/baz?q=*:*&limit=1&bookmark=g2wAAAABaANkAB9kYmNvcmVAZGI1LmplbmV2ZXIuY2xvdWRhbnQubmV0bAAAAAJhAGI_____amgCRj_wAAAAAAAAYQBq

Page 6: CouchDB Day NYC 2017: Full Text Search

SortingThe "sort" parameter lets you sort results on any indexed field or combination of indexed fields.

curl https://$account.cloudant.com/foo/_design/bar/_search/baz?q=*:*&sort="size<string>"

curl https://$account.cloudant.com/foo/_design/bar/_search/baz?q=*:*&sort="color<string>"

Page 7: CouchDB Day NYC 2017: Full Text Search

Tokenization (https://docs.cloudant.com/search.html)

• Tokenizers break down textual input into tokens for efficient and flexible searching• Using an appropriate tokenizer is often critical• Generic analyzers: standard, email, keyword, whitespace• Language specific analyzers: english, french, german, spanish,

chinese, dutch...• You can configure different analyzers for different fields• Some tokenizers omit common words• Some tokenizers omit common prefixes or suffixes

Page 8: CouchDB Day NYC 2017: Full Text Search

Tokenization Examples> curl https://$account.cloudant.com/_search_analyze –Hcontent-type:application/json –d '{"analyzer":"standard", "text": "[email protected]"}'{"tokens":["rnewson","apache.org"]}

> curl https://$account.cloudant.com/_search_analyze –Hcontent-type:application/json –d '{"analyzer":"email", "text": "[email protected]"}'{"tokens":["[email protected]"]}

> curl https://$account.cloudant.com/_search_analyze –Hcontent-type:application/json –d '{"analyzer":"english", "text": "running"}'{"tokens":["run"]}