Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Ubiquitous Solr - A Database's not-so-evil Twin
-
Upload
ayon-sinha -
Category
Technology
-
view
238 -
download
0
Transcript of Ubiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database’s not-so-evil TwinAyon Sinha
Data Foundation @WalmartLabs
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
2
Text Search
wow
Search Suggestions
Search Engine… Lucene… Solr
• Internet and Intranet Search
• Relevance
• Search Suggestions
• Faceting
• Recommendations
• Time series
• Log search
• Geo-spatial search
• Analytics
• Graph search
Recommendations
RelevanceFacets
3
Overview
• Scale any data infrastructure with the help of search engine like Apache Solr
• Build a high performance and highly available data platform for internal and external users alike
• Walmart’s commitment to open source
4
About me
• Team lead at the Data Foundation team for the largest retailer and the largest private employer in the world
• Prior to Walmart, worked at startups building recommendation and analytics systems
• And prior to that, was building search applications, recommendations systems and Hadoop based analytics systems for the largest online auction company, ebay, for 6 years
• Manuscript reviewer for Manning publications and have helped shape the contents of “Hadoop in Practice” and “Big Data”, among others.
5
About Walmart
• 11,000+ Stores in 27 countries
• 11 eCommerce sites
• 250M customers weekly in stores and online
• Millions of database transactions per day
• Sales, Holidays and massive volume shifts
6
It starts-up so simple
An idea implemented on the LAMP stack
7
Turns out to be a great idea!
Users seem to like the new product
8
Users REALLY like this..
Higher volume, increased use cases. Quick fix scaling alternatives add some headroom … and complexity
9
We need more Business Intelligence
Business is looking good but source-of-truth data store, not so much …
10
Scale up (in a hurry) with hardware
Least risk. Diminishing returns. What next?
11
Relieve The Pressure
• Offload queries to a Search Engine
• Offload recurring reads to Cache
• Offload analytics to OLAP datastores
• Shard the database
… and do something to hide the complexity. It is worth it.
12
The Inspiration
Integration tools with a Lucene based search engines are abundant
13
The “not-so-evil” Twin to protect your Source of Truth DB
• What if a copy of your source-of-truth data is available … Just about anywhere you want it?
• Redirect queries to a search engine to protect your database?
• Helps scale by reducing demand for– database indexing– database connections– scarce database resources like memory, storage, interconnects
Adding near real-time search adds complexity … and it comes at a cost; but done right, the benefits far outweigh the costs
14
Our Approach
• Abstract the complexity of managing– source-of-truth database– cache coherence– Search queries– message bus
• Abstract Connection pool management
• Provide a scalable way to query across shards with full control of Solr schema
• And to analyze big data without affecting real-time systems and isolating individual data domains
15
From a situation like..
16
DB, Solr and Hadoop
17
Sharded DB with Solr
18
The Eco-system
Separation of concerns
19
The Result
Scatter-gather vs Powered by Apache Solr
20
Lessons learned
A Search engine like Apache Solr is…
• not limited to search-based business applications.
• a first class citizen in your persistence technology stack; it complements the SoT database.
• easy to adopt and has all of us as community for support.
21
The Future
• Symbiotic existence of Solr/Lucene with RDBMS, NoSQL and Big Data systems
• Walmart is committed to be part of the community building it
22
Questions? Reach us at:
• You can reach me, Ayon Sinha, at:– [email protected]– https://www.linkedin.com/in/ayonsinha
• Jason Sardina, our Lead Persistence Architect– [email protected]
• @WalmartLabs is always hiring the best