Solr and Elasticsearch, a performance study
-
Upload
charlie-hull -
Category
Software
-
view
11.296 -
download
0
description
Transcript of Solr and Elasticsearch, a performance study
![Page 1: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/1.jpg)
Tom Mortimer - Technical Director27th November 2014
[email protected]/blog+44 (0) 8700 118334Twitter: @FlaxSearch
Elasticsearch and SolrCloud
a performance comparison
![Page 2: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/2.jpg)
We design, build and support open source powered search applications
Who are Flax?
![Page 3: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/3.jpg)
We design, build and support open source powered search applications
Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers
Who are Flax?
![Page 4: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/4.jpg)
We design, build and support open source powered search applications
Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers
UK Authorized Partner of
Who are Flax?
![Page 5: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/5.jpg)
We design, build and support open source powered search applications
Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers
UK Authorized Partner of
Customers include Reed Specialist Recruitment, Mydeco, NLA, Gorkana, Financial Times, News UK, EMBL-EBI, Accenture, University of Cambridge, UK Government...
Who are Flax?
![Page 6: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/6.jpg)
We design, build and support open source powered search applications
Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers
UK Authorized Partner of
Customers in recruitment, government, e-commerce, news & media, bioinformatics, consulting, law...
Who are Flax?
![Page 7: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/7.jpg)
We design, build and support open source powered search applications
Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers
UK Authorized Partner of
Customers in recruitment, government, e-commerce, news & media, bioinformatics, consulting, law...
Who are Flax?
![Page 8: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/8.jpg)
Open source search server based on Lucene Created in 2004 by Yonik Seeley Became an Apache project in 2006 Merged with Lucene in 2011 Web API XML config, XML/JSON data formats SolrCloud features added in 2012 Uses Apache ZooKeeper for cluster management
![Page 9: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/9.jpg)
Open source search server based on Lucene Created in 2010 by Shay Banon RESTful Web API Everything is JSON Distributed and NRT by design Own Zen Discovery module for cluster management
![Page 10: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/10.jpg)
Both have large, dynamic communities Well-funded commercial backing Widely used in many diverse projects Elasticsearch easier to setup and configure Elasticsearch query DSL But: is Elasticsearch as tolerant of network faults?
(Jepsen tests by Kyle Kingsbury) How does performance compare?
vs.
![Page 11: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/11.jpg)
Both have large, dynamic communities Well-funded commercial backing Widely used in many diverse projects Elasticsearch easier to setup and configure Elasticsearch query DSL But: is Elasticsearch as tolerant of network faults?
(Jepsen tests by Kyle Kingsbury) How does performance compare? Note that we don't have a preference...we use both!
vs.
![Page 12: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/12.jpg)
Won't it be the same, as they both use Lucene? Can't you just throw hardware at it? Hardware is cheaper than developers
Why does performance matter?
![Page 13: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/13.jpg)
Won't it be the same, as they both use Lucene? Can't you just throw hardware at it? Hardware is cheaper than developers
Well, no.
Why does performance matter?
![Page 14: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/14.jpg)
There's a lot more to them than just a web API on top of Lucene.
Several of our customers have fixed hardware budgets May have to use limited internal resources With large indexes or complex queries, need to squeeze
every last bit of performance out of the hardware
Why does performance matter?
![Page 15: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/15.jpg)
There's a lot more to them than just a web API on top of Lucene.
Several of our customers have fixed hardware budgets May have to use limited internal resources With large indexes or complex queries, need to squeeze
every last bit of performance out of the hardware
Why does performance matter?
![Page 16: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/16.jpg)
Not many found by a Google search.
http://blog.socialcast.com/realtime-search-solr-vs-elasticsearch/
Solr much faster than Elasticsearch, except for NRT searches with concurrent indexing (where situation was reversed).
But: This was over 3 years ago, before SolrCloud
What performance studies are out there?
![Page 17: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/17.jpg)
Client with complex filtering requirements for content licensing, 10Ms of documents, limited hardware budget, no NRT requirement.
Performed tests 18 months ago on EC2. Solr was approximately 20 times faster!
More recently, Solr was 4 times faster for a project requiring geospatial filtering
What about now?
Our experience
![Page 18: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/18.jpg)
Recent versions of Elasticsearch (1.4.0) and Solr (4.10.2) Concentrated on indexing performance, query times with
and without concurrent indexing, QPS, filters and facets.
Hardware kindly provided by BigStep.com Full Metal Cloud (real instances, not VMs) Optimised for high performance Can be faster than your own dedicated hardware!
This study
![Page 19: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/19.jpg)
The results?
![Page 20: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/20.jpg)
Not really very interesting
The results?
![Page 21: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/21.jpg)
Not really very interesting SolrCloud and Elasticsearch were both very fast Similar performance with concurrent indexing or not Solr could handle higher QPS
The results?
![Page 22: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/22.jpg)
Cluster configuration Two machines, each with 96GB RAM Two instances of SolrCloud or Elasticsearch on
each Each instance has 24GB JVM heap Four shards No replicas
![Page 23: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/23.jpg)
Cluster configuration in BigStep
![Page 24: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/24.jpg)
Data 40M documents created by using a Markov chain on a
seed document (on Stoicism) from gutenberg.org
“Below planets. this Below lay this the lay infinite the void infinite without void beginning, without middle, beginning, or middle, end, or this end occupied...”
Small (5-20 word) and larger (200-1000 word) docs Randomly assigned ints for “source” and “level”, to
simulate licensing filters and for facets.
![Page 25: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/25.jpg)
Indexing Python script and requests library Single process for small index, four processes for
larger index Single process for indexing concurrent with search
![Page 26: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/26.jpg)
Searching Python and requests Each query time logged for analysis Single process for query time testing Multiple processes to test QPS All tests performed warm
Queries consisted of three randomly chosen terms combined with OR
Filters randomly generated Facets / Elasticsearch aggregations
![Page 27: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/27.jpg)
40M Small documents
Elasticsearch indexed them in 30 minutes Total index size was 8.8 GB (easily cacheable)
Solr indexed them in 43 minutes Total index size was 7.6 GB
![Page 28: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/28.jpg)
40M Small documents (concurrent indexing)
Elasticsearch: 0.01s mean, 99% < 0.06sSolr: 0.01s mean, 99% < 0.10s
![Page 29: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/29.jpg)
40M Large documents
Elasticsearch indexed them in 179 minutes Total index size was 363 GB (not completely
cacheable)
Solr indexed them in 119 minutes Total index size was 226 GB
![Page 30: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/30.jpg)
40M Large documents (search with facets)
Elasticsearch: 0.21s mean, 99% < 0.75sSolr: 0.25s mean, 99% < 0.84s
![Page 31: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/31.jpg)
40M Large documents (with 10 filters)
Elasticsearch: 0.21s mean, 99% < 0.72sSolr: 0.09s mean, 99% < 0.50s
![Page 32: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/32.jpg)
40M Large documents (concurrent indexing)
Elasticsearch: 0.16s mean, 99% < 0.86sSolr: 0.09s mean, 99% < 0.46s
![Page 33: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/33.jpg)
40M Large documents (QPS)
![Page 34: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/34.jpg)
Conclusions
SolrCloud seems to be slightly faster. However, performance was acceptable in all cases.
SolrCloud can apparently support a significantly higher number of queries per second (tested without concurrent indexing, however).
![Page 35: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/35.jpg)
Limitations and problems Validity of generated documents? Validity of random queries? Searches did not fetch any document data Did not test highlighting, range facets, geolocation,
etc. etc... Only tested one type of cluster configuration
(Elasticsearch is very flexible about node role). Did not tune JVM parameters Did not perform profiling to identify reasons for
differences
![Page 36: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/36.jpg)
What's next Would have also liked to have compared BigStep with
Amazon EC2. If there is any interest, I hope to address some of
these problems in the near future. We'll open source the code (next week?) onwww.github.com/flaxsearch
![Page 37: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/37.jpg)
What to take away from this? Elasticsearch and Solr are both awesome They currently seem very close in terms of
performance (according to this limited study)
![Page 38: Solr and Elasticsearch, a performance study](https://reader031.fdocuments.net/reader031/viewer/2022031517/559b9c091a28ab02448b45d4/html5/thumbnails/38.jpg)
What to take away from this? Elasticsearch and Solr are both awesome They currently seem very close in terms of
performance (according to this limited study)
However, all search applications are different Solr and Elasticsearch may have quite different
performance characteristics in certain cases. Hard to predict.
If performance is important to you, it will pay to try both.