Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitoring & Analysis at Trulia
Oslo Solr MeetUp March 2012 - Solr4 alpha
Click here to load reader
-
Upload
cominvent-as -
Category
Technology
-
view
107 -
download
4
description
Transcript of Oslo Solr MeetUp March 2012 - Solr4 alpha
Sponsors:
What is new in Solr 4.0ßJan Høydahl
March 20th 2012Oslo Solr Community
2Agenda
– Solr/Lucene 4 ß, what, when?– Near-Realtime-Search– SolrCloud– Better Spellchecker– Flex – smaller index– Pluggable Ranking– Sort by Function– Result field aliasing and pseudo fields– Pivot facets– Join query– New Admin GUI
– And what about Solr 3.6 ?
34.0 beta?
– Never released a public beta before– So many changes, it makes sense– Time frame??– Stability
4Near-Realtime-Search
– Before:• Add, add add add (not searchable)• Commit (new segment written → searchable)
– 4.0:• In-memory index• Add• Soft-commit-(within/auto)• Real-time GET:
<!-- realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. The current implementation relies on the updateLog feature being enabled.--> <requestHandler name="/get" class="solr.RealTimeGetHandler"> <lst name="defaults"> <str name="omitHeader">true</str> </lst> </requestHandler>
5Solr Cloud
– Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world
– Enables centralized configuration and cluster status monitoring
– Solr 4.0ß contains the first features• Apache ZooKeeper support, including built-in ZK• Support for auto distributed/LB query (by means of ZK)• Fault tolerant indexing and recovery• Add a new node and let it discover its role and sync up
– Expected features to come• Tools to manage the config in ZK• Re-balancing of shards
http://wiki.apache.org/solr/SolrCloud
6
Solr Cloud...
– New concepts:• Collection: Cores making up one data set• ZooKeeper: Central coordination server
– Easier distributed search:• /solr/web/select?q=*:*&distrib=true
– This queries all cores in same "collection"– Easier distributed indexing:
• http://<any.server>/solr/web/update...
7Solr Cloud on the index side...
http://wiki.apache.org/solr/SolrCloud
8Better spellchecker
– Direct SpellChecker– Automaton based
(no extra lucene-index)– No long build times– Better performance– Better accuracy (?)
9Flex – smaller index
– Lucene's Flex APIs– Lets you plug in your own
codecs– Greater flexibility in how
you can represent the binary index
– Opens up for many new features• DocValues• Pluggable ranking• TEXT index• Store as UTF-8[]• Or other encoding for
space saving for Chinese
10Pluggable Ranking
– Lucene uses TF/IDF and VSM– Now support for BM25
– Plug your own!– Hopefully attracts researchers– Also, pluggable Similarity class per field
11Sort by Function
– q=foo&sort=sub(price,discount) desc– q=foo&sort=dist(2, x, y, 0, 0) asc
12Result field aliasing and pseudo fields
– Aliasing:• q=foo&fl=score,tittel:title,rabattpris:sub(price,discount)
– Field name globbing:• q=foo&fl=score,t*
– Pseudo fields:• q=foo&fl=score,[explain],[docid],[shard],[value v=42 t=int]
13Pivot facets
– Multi dimensional facets• &facet.pivot=cat,popularity
14Join query
– Simple Join feature (inner join)– &q={!join from=manu_id to=id}ipod
15New Admin GUI
16Solr 3.6
– SOLR-2764*: NorwegianLightStemmer, NorwegianMinimalStemmer
– SOLR-2202*: Money/Currency FieldType– SOLR-2826*: URLClassify Update Processor– SOLR-3056: Japanese field type in schema.xml– SOLR-3026*: eDismax user fields– SOLR-3140*: omitNorms default for all numeric field types– SOLR-2901*: Upgrade Solr to Tika 1.0– SOLR-1709: Distributed Date and Range Faceting– SOLR-2487*: Do not include slf4j-jdk14 jar in WAR– SOLR-2509*: spellcheck StringIndexOutOfBoundsException
* Committed by Jan Høydahl