This Ain't Your Parent's Search Engine
description
Transcript of This Ain't Your Parent's Search Engine
![Page 1: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/1.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
This ain’t your Parent’s Search Engine
Grant IngersollCTO, LucidWorks
Twitter: @gsingers
![Page 2: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/2.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Search is dead.
![Page 3: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/3.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Long live search
![Page 4: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/4.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
Search is good for…
• Traditional: Fast, fuzzy text matching across a large document collection
• De-normalized data- “light” relational
• Top N problems- Key-value (n=1)- Recommendations- “Good enough” classification, clustering
• Faceting, aggregations, analytical slicing and dicing of data
• Spatial, record/event linkage, alerting
http://cheezburger.com/5243950080
![Page 5: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/5.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
Foundational Changes in Lucene/Solr 4
•Reduced Memory usage•Pluggable Codecs/similarity•FS(A|T)•Doc Values (column oriented)•Spatial upgrade•New facets and functions•Cursors (deep paging)•Distributed capabilities•Joins/Grouping
![Page 6: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/6.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
Search + Hadoop
•What’s Old is New Again
•“Traditional” Use Cases:-Build/Store indexes-https://cwiki.apache.org/confluence/display/
solr/Running+Solr+on+HDFS
•Enrichment and Signal processing-PageRank, Statistically Interesting Phrases, etc.
![Page 7: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/7.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
LucidWorks + Hadoop
• Ingestion Help- Flexible Map-Reduce content ingestion supporting:»Directory of files»CSV, Writable, etc.»LogStash»Build Your Own
•Pig Load/Store and UDFs•Hive 2-way support•http://www.lucidworks.com/search-for-hadoop/-Open source this summer
![Page 8: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/8.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks SiLK
LucidWorks Search
JDBC Connector
Web/File System Crawl
Data Warehouse
Hadoop Connectors
Clickstream Networking
Data Sources
Connectors
Servers
![Page 9: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/9.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr/Solr Cloud
Search Analytics—Data Ingestion & Visualization
Gateway(Reverse Proxy)
Solr Output Writer for
LogStash (Http)
Search Logs
Visualization Configurable Dashboards
Hadoop ConnectorGrokIngestMapperLogStash
![Page 10: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/10.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks Open Source
• Logstash for Solr: https://github.com/LucidWorks/solrlogmanager• Banana (Kibana for Solr): https://github.com/LucidWorks/banana
• Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk
• Data Quality Toolkit: https://github.com/LucidWorks/data-quality
![Page 11: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/11.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Demos
![Page 12: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/12.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 201312
Fly the friendly skies
http://www.ibm.com/developerworks/library/j-solr-lucene/index.html
![Page 13: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/13.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Make $$$
• Leverage time series data and visualization using LucidWorks SiLK
• Monitor Social• Traditional Research
https://github.com/lucidworks/lws-financial-demo
![Page 14: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/14.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Cure what ails you
![Page 15: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/15.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 201315
Space-Time Continuum
• Leverage Solr’s spatial capabilities to index non-spatial data, such as time ranges- Useful for Open Hours, Shifts, etc.
•Query using rectangle intersections- q = shift:"Intersects(0 19 23 365)”
https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
![Page 16: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/16.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Signal Processing for Search and Discovery
• Signals power modern relevance– Clicks, conversions, sharing, history, signatures
• LucidWorks 5 makes it easy to capture and leverage signals– Recommendations, analytics, discovery
• Simplifies your data workflow• Simplify your operational footprint
![Page 17: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/17.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr Powered Signal Processing
• Use Case: eCommerce
• Data: – Product catalog (~1.2m items)– Click data (~3.9M clicks)
![Page 18: This Ain't Your Parent's Search Engine](https://reader033.fdocuments.net/reader033/viewer/2022061105/53fe1dd88d7f72db2d8b45b7/html5/thumbnails/18.jpg)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Meta
• http://www.lucidworks.com– [email protected]– @gsingers
• Lucene/Solr Revolution – Washington DC, Nov 11-14– http://www.lucenerevolution.org