First oslo solr community meetup lightning talk janhoy

10
Sponsors: Programmet starter...

description

Lightning talk by Jan Høydahl on SolrCloud

Transcript of First oslo solr community meetup lightning talk janhoy

Page 1: First oslo solr community meetup lightning talk janhoy

Sponsors:

Programmet starter...

Page 2: First oslo solr community meetup lightning talk janhoy

2( )MeetUp May 8th 2011

– VelkommenBakgrunnen for MeetUp'en

– (Reklamepause)– Presentasjonsrunde– Ønsker for MeetUp-gruppen (diskusjon)– Lyn-taler á 10min (ca kl 18:30-19:00)

• Sture Svensson ""Querying Solr in various ways"• Jan Høydahl ""What can I do with SolrCloud today"• NN?

– Formelt slutt (ca 19:15)– Mingling...

Page 3: First oslo solr community meetup lightning talk janhoy

3( )Scaling & HA (redundancy)

– Index up to 25-100 million documents on a single server*• Scale linearly by adding servers (shards)

– Query up to 50-1000 QPS on a single server• Scale linearly by adding servers (replicas)

– Add redundancy or backup through extra replicas– Built-in software Load Balancer, auto failover– Indexing redundancy not out of the box

• But possible to have every row do index+search– High Availability for config/admin using Apache ZooKeeper

(TRUNK)

Page 5: First oslo solr community meetup lightning talk janhoy

5( )Replication

– Goals:• Increase QPS capacity• High availability of search

– Replication adds another "search row"– Done as a PULL from slave– ReplicationHandler is configured in solrconfig.xml

http://wiki.apache.org/solr/SolrReplication

Page 6: First oslo solr community meetup lightning talk janhoy

6( )Sharding

– Goals:• Split an index too large for one box into smaller chunks• Lower HW footprint by smart partitioning of data

– News search: One shard for last month, one shard per year• Lower latency by having smaller index per node

– A shard is a core which participates in a collection• Shards A and B may thus be on different or same host• Shards A and B should but do not need to share schema

– Shard distribution must be done by client application, adding documents to correct shard based on some policy• Most common policy is hash-based distribution• May also be date based or whatever client chooses

– Work under way to add shard distribution natively to Solr, see SOLR-2358

Page 7: First oslo solr community meetup lightning talk janhoy

7( )Solr Cloud

– Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world

– Enables centralized configuration and cluster status monitoring

– Solr TRUNK contains the first features• Apache ZooKeeper support, including built-in ZK• Support for easy distrib=true query (by means of ZK)• NOTE: Still experimental, work in progress

– Expected features to come• Auto index shard distribution using ZK• Tools to manage the config in ZK• Easy addition of row/shard through API

– NOTE: We do not know when SolrCloud will be included in a released version of Solr. If you need it, use TRUNK

http://wiki.apache.org/solr/SolrCloud

Page 8: First oslo solr community meetup lightning talk janhoy

8( )Solr Cloud...

– Setting up SolrCloud for our YP example• We'll setup a 4-node cluster on our laptops using four

instances of Jetty, on different ports• We'll have 2 shards, each with one replica• We'll index 5000 listings to each shard• And finally do distributed queries• For convenience, we'll use the ZK shipping with Solr

– Bootstrapping ZooKeeper to create a config "yp-conf"• java -Dbootstrap_confdir=./solr/conf

-Dcollection.configName=yp-conf -DzkRun -jar start.jar– Starting the other Jetty nodes

• java -Djetty.port=<port> -DhostPort=<port> -DzkHost=localhost:9983 -jar start.jar

– Zookeeper admin• http://localhost:8983/solr/yp/admin/zookeeper.jsp

http://wiki.apache.org/solr/SolrCloud

Page 9: First oslo solr community meetup lightning talk janhoy

9( )Solr Cloud...

– Solr Cloud will resolve all shards and replicas in a collection based on what is configured in solr.xml

– Querying /solr/yp/select?q=foo&distrib=true on this core will cause SolrCloud to resolve the core name to "yp-cloud" and then distribute the request to each of the shards which are members of the same collection

– Often, the core name and collection name will be the same– SolrCloud will load balance between replicas within the

same shard

http://wiki.apache.org/solr/SolrCloud

Page 10: First oslo solr community meetup lightning talk janhoy

10( )Solr Cloud, 2x2 setup

localhost:8983

Run ZK: localhost:9983

Core: ypShard: A (master)Colleciton: yp-collection

localhost:7973

Run ZK: no-DzkHost=localhost:9983Core: ypShard: B (master)Colleciton: yp-collection

localhost:6963

Run ZK: no-DzkHost=localhost:9983Core: ypShard: A (replica)Colleciton: yp-collection

localhost:5953

Run ZK: N/A-DzkHost=localhost:9983Core: ypShard: B (replica)Colleciton: yp-collection