Intro to Apache Solr
-
Upload
shalin-shekhar-mangar -
Category
Software
-
view
238 -
download
1
Transcript of Intro to Apache Solr
Apache SolrIntroduction & Demo
• What is Apache Solr?
• Start/stop Solr
• Indexing data to Solr
• Searching data
• Running a SolrCloud cluster
• Hacking Solr
Agenda
• Lucene based search server + other features
• Access Lucene over HTTP:
• Java, Python, Ruby, .NET, PHP over XML/JSON and other formats
• Faceting (guided navigation), suggestions, highlighting etc.
• Replication and distributed search
• Lucene best practices
What is Apache Solr?
• Extract:
• tar xvf solr-5.1.0.tgz (linux/mac)
• unzip solr-5.1.0.zip or click+extract (windows)
• Run:
• ./bin/solr start -e schemaless
• ./bin/solr start -e schemaless -p 8983
• ./bin/solr -help
• ./bin/solr start -help
• Stop:
• ./bin/solr stop
Running Solr
• ./bin/post script
• Using curl directly
• Using the Admin UI
• SolrJ and other indexing clients
Indexing data
Demo time
Inverted index
• +red +shoes = red AND shoes
• +shoes -red = shoes NOT red
• “android phone”
• “android phone” -samsung = “android phone” NOT samsung “android samsung”~4
• merced*
• createDate:[201301 TO 201401]
• author:shalin
• author:”shalin mangar”
• author:”shalin mangar” AND project:(lucene OR solr) title:samsung^5 category:phone
Lucene/Solr query syntax
• DataImportHandler: Index databases, Email, RSS, XMLs etc.
• Rich document support: PDF, MS Office, Images etc.
• Faceting, stats, analytics
• Replication for high query volume
• Production systems with billions of documents
• Very extensible and customizable
• Embedded in commercial search products from Lucidworks, DataStax, Cloudera, Hortonworks, Pivotal, Amazon Cloudsearch, Riak etc.
Other features of Solr
• Subset of optional features in Solr to enable and simplify horizontal scaling a search index using sharding and replication
• Goals: scalability, performance, high-availability, simplicity, and elasticity
What is SolrCloud?
• ./bin/solr -e cloud
• Yeah, it’s that simple!
Running SolrCloud
SolrCloud demo
• http://wiki.apache.org/solr/HowToContribute
• Pre-requisites:
• git: git clone http://git-wip-us.apache.org/repos/asf/lucene-solr.git
• github: fork and clone apache/lucene-solr
• ant 1.8.x or above
• Eclipse or Intellij Idea (I recommend Idea)
• Put svn/git and ant in your $PATH or %PATH%
Hacking Solr
• ant ivy-bootstrap (required only once)
• ant idea or ant eclipse (generated a complete project for you which you can open in your favourite IDE)
• Find an existing Jira issue or open a new one at http://issues.apache.org/jira/browse/SOLR
• Make changes, write tests, once finished:
• run ‘cd solr; ant server’ to build Solr and start via bin/solr scripts
• run ‘ant test’ (it can take a while), ensure all tests pass
• run ‘ant precommit’, (run from the checkout root) ensure it passes
• Generate a patch with ‘svn diff’ or ‘git diff’ and attach to Jira
Hacking Solr
• http://lucene.apache.org/solr
• https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
• https://issues.apache.org/jira/browse/SOLR
• Ask me: solr-help.slack.com
• Ask other users: [email protected]
• Ask developers: [email protected] (use sparingly)
Resources
Thank youShalin Shekhar Mangar, [email protected]