Oslo Solr MeetUp March 2012 - Solr4 alpha

16

Click here to load reader

description

Short talk highlighting what we can expect in Solr 4.0 alpha/beta release soon to be released

Transcript of Oslo Solr MeetUp March 2012 - Solr4 alpha

Page 1: Oslo Solr MeetUp March 2012 - Solr4 alpha

Sponsors:

What is new in Solr 4.0ßJan Høydahl

March 20th 2012Oslo Solr Community

Page 2: Oslo Solr MeetUp March 2012 - Solr4 alpha

2Agenda

– Solr/Lucene 4 ß, what, when?– Near-Realtime-Search– SolrCloud– Better Spellchecker– Flex – smaller index– Pluggable Ranking– Sort by Function– Result field aliasing and pseudo fields– Pivot facets– Join query– New Admin GUI

– And what about Solr 3.6 ?

Page 3: Oslo Solr MeetUp March 2012 - Solr4 alpha

34.0 beta?

– Never released a public beta before– So many changes, it makes sense– Time frame??– Stability

Page 4: Oslo Solr MeetUp March 2012 - Solr4 alpha

4Near-Realtime-Search

– Before:• Add, add add add (not searchable)• Commit (new segment written → searchable)

– 4.0:• In-memory index• Add• Soft-commit-(within/auto)• Real-time GET:

<!-- realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. The current implementation relies on the updateLog feature being enabled.--> <requestHandler name="/get" class="solr.RealTimeGetHandler"> <lst name="defaults"> <str name="omitHeader">true</str> </lst> </requestHandler>

Page 5: Oslo Solr MeetUp March 2012 - Solr4 alpha

5Solr Cloud

– Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world

– Enables centralized configuration and cluster status monitoring

– Solr 4.0ß contains the first features• Apache ZooKeeper support, including built-in ZK• Support for auto distributed/LB query (by means of ZK)• Fault tolerant indexing and recovery• Add a new node and let it discover its role and sync up

– Expected features to come• Tools to manage the config in ZK• Re-balancing of shards

http://wiki.apache.org/solr/SolrCloud

Page 6: Oslo Solr MeetUp March 2012 - Solr4 alpha

6

Solr Cloud...

– New concepts:• Collection: Cores making up one data set• ZooKeeper: Central coordination server

– Easier distributed search:• /solr/web/select?q=*:*&distrib=true

– This queries all cores in same "collection"– Easier distributed indexing:

• http://<any.server>/solr/web/update...

Page 7: Oslo Solr MeetUp March 2012 - Solr4 alpha

7Solr Cloud on the index side...

http://wiki.apache.org/solr/SolrCloud

Page 8: Oslo Solr MeetUp March 2012 - Solr4 alpha

8Better spellchecker

– Direct SpellChecker– Automaton based

(no extra lucene-index)– No long build times– Better performance– Better accuracy (?)

Page 9: Oslo Solr MeetUp March 2012 - Solr4 alpha

9Flex – smaller index

– Lucene's Flex APIs– Lets you plug in your own

codecs– Greater flexibility in how

you can represent the binary index

– Opens up for many new features• DocValues• Pluggable ranking• TEXT index• Store as UTF-8[]• Or other encoding for

space saving for Chinese

Page 10: Oslo Solr MeetUp March 2012 - Solr4 alpha

10Pluggable Ranking

– Lucene uses TF/IDF and VSM– Now support for BM25

– Plug your own!– Hopefully attracts researchers– Also, pluggable Similarity class per field

Page 11: Oslo Solr MeetUp March 2012 - Solr4 alpha

11Sort by Function

– q=foo&sort=sub(price,discount) desc– q=foo&sort=dist(2, x, y, 0, 0) asc

Page 12: Oslo Solr MeetUp March 2012 - Solr4 alpha

12Result field aliasing and pseudo fields

– Aliasing:• q=foo&fl=score,tittel:title,rabattpris:sub(price,discount)

– Field name globbing:• q=foo&fl=score,t*

– Pseudo fields:• q=foo&fl=score,[explain],[docid],[shard],[value v=42 t=int]

Page 13: Oslo Solr MeetUp March 2012 - Solr4 alpha

13Pivot facets

– Multi dimensional facets• &facet.pivot=cat,popularity

Page 14: Oslo Solr MeetUp March 2012 - Solr4 alpha

14Join query

– Simple Join feature (inner join)– &q={!join from=manu_id to=id}ipod

Page 15: Oslo Solr MeetUp March 2012 - Solr4 alpha

15New Admin GUI

Page 16: Oslo Solr MeetUp March 2012 - Solr4 alpha

16Solr 3.6

– SOLR-2764*: NorwegianLightStemmer, NorwegianMinimalStemmer

– SOLR-2202*: Money/Currency FieldType– SOLR-2826*: URLClassify Update Processor– SOLR-3056: Japanese field type in schema.xml– SOLR-3026*: eDismax user fields– SOLR-3140*: omitNorms default for all numeric field types– SOLR-2901*: Upgrade Solr to Tika 1.0– SOLR-1709: Distributed Date and Range Faceting– SOLR-2487*: Do not include slf4j-jdk14 jar in WAR– SOLR-2509*: spellcheck StringIndexOutOfBoundsException

* Committed by Jan Høydahl