Ubiquitous Solr - A Database's not-so-evil Twin

Ubiquitous Solr - A Database’s not- so-evil Twin Ayon Sinha Data Foundation @WalmartLabs OCTOBER 13-16, 2016 AUSTIN, TX

Transcript of Ubiquitous Solr - A Database's not-so-evil Twin

Page 1: Ubiquitous Solr - A Database's not-so-evil Twin

Ubiquitous Solr - A Database’s not-so-evil TwinAyon Sinha

Data Foundation @WalmartLabs

O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X

Page 2: Ubiquitous Solr - A Database's not-so-evil Twin


Text Search


Search Suggestions

Search Engine… Lucene… Solr

• Internet and Intranet Search

• Relevance

• Search Suggestions

• Faceting

• Recommendations

• Time series

• Log search

• Geo-spatial search

• Analytics

• Graph search



Page 3: Ubiquitous Solr - A Database's not-so-evil Twin



• Scale any data infrastructure with the help of search engine like Apache Solr

• Build a high performance and highly available data platform for internal and external users alike

• Walmart’s commitment to open source

Page 4: Ubiquitous Solr - A Database's not-so-evil Twin


About me

• Team lead at the Data Foundation team for the largest retailer and the largest private employer in the world

• Prior to Walmart, worked at startups building recommendation and analytics systems

• And prior to that, was building search applications, recommendations systems and Hadoop based analytics systems for the largest online auction company, ebay, for 6 years

• Manuscript reviewer for Manning publications and have helped shape the contents of “Hadoop in Practice” and “Big Data”, among others.

Page 5: Ubiquitous Solr - A Database's not-so-evil Twin


About Walmart

• 11,000+ Stores in 27 countries

• 11 eCommerce sites

• 250M customers weekly in stores and online

• Millions of database transactions per day

• Sales, Holidays and massive volume shifts

Page 6: Ubiquitous Solr - A Database's not-so-evil Twin


It starts-up so simple

An idea implemented on the LAMP stack

Page 7: Ubiquitous Solr - A Database's not-so-evil Twin


Turns out to be a great idea!

Users seem to like the new product

Page 8: Ubiquitous Solr - A Database's not-so-evil Twin


Users REALLY like this..

Higher volume, increased use cases. Quick fix scaling alternatives add some headroom … and complexity

Page 9: Ubiquitous Solr - A Database's not-so-evil Twin


We need more Business Intelligence

Business is looking good but source-of-truth data store, not so much …

Page 10: Ubiquitous Solr - A Database's not-so-evil Twin


Scale up (in a hurry) with hardware

Least risk. Diminishing returns. What next?

Page 11: Ubiquitous Solr - A Database's not-so-evil Twin


Relieve The Pressure

• Offload queries to a Search Engine

• Offload recurring reads to Cache

• Offload analytics to OLAP datastores

• Shard the database

… and do something to hide the complexity. It is worth it.

Page 12: Ubiquitous Solr - A Database's not-so-evil Twin


The Inspiration

Integration tools with a Lucene based search engines are abundant

Page 13: Ubiquitous Solr - A Database's not-so-evil Twin


The “not-so-evil” Twin to protect your Source of Truth DB

• What if a copy of your source-of-truth data is available … Just about anywhere you want it?

• Redirect queries to a search engine to protect your database?

• Helps scale by reducing demand for– database indexing– database connections– scarce database resources like memory, storage, interconnects

Adding near real-time search adds complexity … and it comes at a cost; but done right, the benefits far outweigh the costs

Page 14: Ubiquitous Solr - A Database's not-so-evil Twin


Our Approach

• Abstract the complexity of managing– source-of-truth database– cache coherence– Search queries– message bus

• Abstract Connection pool management

• Provide a scalable way to query across shards with full control of Solr schema

• And to analyze big data without affecting real-time systems and isolating individual data domains

Page 15: Ubiquitous Solr - A Database's not-so-evil Twin


From a situation like..

Page 16: Ubiquitous Solr - A Database's not-so-evil Twin


DB, Solr and Hadoop

Page 17: Ubiquitous Solr - A Database's not-so-evil Twin


Sharded DB with Solr

Page 18: Ubiquitous Solr - A Database's not-so-evil Twin


The Eco-system

Separation of concerns

Page 19: Ubiquitous Solr - A Database's not-so-evil Twin


The Result

Scatter-gather vs Powered by Apache Solr

Page 20: Ubiquitous Solr - A Database's not-so-evil Twin


Lessons learned

A Search engine like Apache Solr is…

• not limited to search-based business applications.

• a first class citizen in your persistence technology stack; it complements the SoT database.

• easy to adopt and has all of us as community for support.

Page 21: Ubiquitous Solr - A Database's not-so-evil Twin


The Future

• Symbiotic existence of Solr/Lucene with RDBMS, NoSQL and Big Data systems

• Walmart is committed to be part of the community building it

Page 22: Ubiquitous Solr - A Database's not-so-evil Twin


Questions? Reach us at:

• You can reach me, Ayon Sinha, at:– [email protected]– https://www.linkedin.com/in/ayonsinha

• Jason Sardina, our Lead Persistence Architect– [email protected]

• @WalmartLabs is always hiring the best