First oslo solr community meetup lightning talk janhoy
-
Upload
cominvent-as -
Category
Technology
-
view
105 -
download
3
description
Transcript of First oslo solr community meetup lightning talk janhoy
Sponsors:
Programmet starter...
2( )MeetUp May 8th 2011
– VelkommenBakgrunnen for MeetUp'en
– (Reklamepause)– Presentasjonsrunde– Ønsker for MeetUp-gruppen (diskusjon)– Lyn-taler á 10min (ca kl 18:30-19:00)
• Sture Svensson ""Querying Solr in various ways"• Jan Høydahl ""What can I do with SolrCloud today"• NN?
– Formelt slutt (ca 19:15)– Mingling...
3( )Scaling & HA (redundancy)
– Index up to 25-100 million documents on a single server*• Scale linearly by adding servers (shards)
– Query up to 50-1000 QPS on a single server• Scale linearly by adding servers (replicas)
– Add redundancy or backup through extra replicas– Built-in software Load Balancer, auto failover– Indexing redundancy not out of the box
• But possible to have every row do index+search– High Availability for config/admin using Apache ZooKeeper
(TRUNK)
4( )Solr scaling example
5( )Replication
– Goals:• Increase QPS capacity• High availability of search
– Replication adds another "search row"– Done as a PULL from slave– ReplicationHandler is configured in solrconfig.xml
http://wiki.apache.org/solr/SolrReplication
6( )Sharding
– Goals:• Split an index too large for one box into smaller chunks• Lower HW footprint by smart partitioning of data
– News search: One shard for last month, one shard per year• Lower latency by having smaller index per node
– A shard is a core which participates in a collection• Shards A and B may thus be on different or same host• Shards A and B should but do not need to share schema
– Shard distribution must be done by client application, adding documents to correct shard based on some policy• Most common policy is hash-based distribution• May also be date based or whatever client chooses
– Work under way to add shard distribution natively to Solr, see SOLR-2358
7( )Solr Cloud
– Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world
– Enables centralized configuration and cluster status monitoring
– Solr TRUNK contains the first features• Apache ZooKeeper support, including built-in ZK• Support for easy distrib=true query (by means of ZK)• NOTE: Still experimental, work in progress
– Expected features to come• Auto index shard distribution using ZK• Tools to manage the config in ZK• Easy addition of row/shard through API
– NOTE: We do not know when SolrCloud will be included in a released version of Solr. If you need it, use TRUNK
http://wiki.apache.org/solr/SolrCloud
8( )Solr Cloud...
– Setting up SolrCloud for our YP example• We'll setup a 4-node cluster on our laptops using four
instances of Jetty, on different ports• We'll have 2 shards, each with one replica• We'll index 5000 listings to each shard• And finally do distributed queries• For convenience, we'll use the ZK shipping with Solr
– Bootstrapping ZooKeeper to create a config "yp-conf"• java -Dbootstrap_confdir=./solr/conf
-Dcollection.configName=yp-conf -DzkRun -jar start.jar– Starting the other Jetty nodes
• java -Djetty.port=<port> -DhostPort=<port> -DzkHost=localhost:9983 -jar start.jar
– Zookeeper admin• http://localhost:8983/solr/yp/admin/zookeeper.jsp
http://wiki.apache.org/solr/SolrCloud
9( )Solr Cloud...
– Solr Cloud will resolve all shards and replicas in a collection based on what is configured in solr.xml
– Querying /solr/yp/select?q=foo&distrib=true on this core will cause SolrCloud to resolve the core name to "yp-cloud" and then distribute the request to each of the shards which are members of the same collection
– Often, the core name and collection name will be the same– SolrCloud will load balance between replicas within the
same shard
http://wiki.apache.org/solr/SolrCloud
10( )Solr Cloud, 2x2 setup
localhost:8983
Run ZK: localhost:9983
Core: ypShard: A (master)Colleciton: yp-collection
localhost:7973
Run ZK: no-DzkHost=localhost:9983Core: ypShard: B (master)Colleciton: yp-collection
localhost:6963
Run ZK: no-DzkHost=localhost:9983Core: ypShard: A (replica)Colleciton: yp-collection
localhost:5953
Run ZK: N/A-DzkHost=localhost:9983Core: ypShard: B (replica)Colleciton: yp-collection