eGrove Systems - "SOLR" An Apache Product
-
Upload
egrove-systems-corporation -
Category
Software
-
view
69 -
download
0
Transcript of eGrove Systems - "SOLR" An Apache Product
SOLR
777 Washington Road #5Parlin, NJ 08859
Phone: 732 307 2655Email: [email protected]
- An Apache Product
CONTENTS
INTRODUCTION
FEATURES
FUNCTIONS
ARCHITECTURE
PERFORMANCE
PROs & CONs
FUTURE TRENDS
WEBSITES USING SOLR
2
INTRODUCTION
INTRODUCTION
• A full text search server based on Lucene• XML/HTTP Interfaces• Loose Schema to define types and fields• Web Administration Interface• Extensive Caching• Index Replication• Extensible Open Architecture• Written in Java5, deployable as a WAR
4
5
INTRODUCTION
FEATURES
• Advanced full – text search.• Optimized for high traffic volume.• Standards based open interfaces – XML, JSON & HTTP• Comprehensive administration interfaces• Near real – time indexing• Extensible plugin architecture• Multiple search indices• Apache UIMA• Rich document parsing• Advanced storage options• Performance optimization
FEATURES
7
FUNCTIONS
• XML/HTTP and JSON APIs• Hit highlighting• Faceted Search and Filtering• Geospatial Search• Fast Incremental Updates and Index Replication• Caching• Replication• Web administration interface
FUNCTIONS
9
ARCHITECTURE
ARCHITECTURE
Source : www.xaviermorera.com 11
PERFORMANCE
Performance Factors
• Schema design• # of indexed fields• omitNorms• Term – vectors• Docvalues
• Configuration• mergeFactor• Caches
• Indexing• Bulk updates• Commit Strategy• Optimize
• Querying
PERFORMANCE
14
1. Memory Testing – SOLR response time for 1 million volume index with 8 GB and 32 GB instance.
Source : www.hathitrust.org
PERFORMANCE
15
2. SOLR index size analysis for Twitter dataset
Source : www.dzone.com
PERFORMANCE
16
PROs & CONs
PROS CONS Easy monitoring. Highly Scalable. Fault Tolerant. Flexible and adaptable with
easy configuration. Performance Optimization. Highly Configurable and
user extensible caching. Freely available. Multilingual support. Easy implementation and setup Less resource utilization
A general lack of commitment towards SOLR.
Less attentions on JVM settings & garbage.
Increased latency. Occasional large IO load to
replicate large merges. Complicated load balance and
management. Reconfiguration if the master
is lost.
PROs & CONs
18
FUTURE TRENDS
• OOTB Simple Faceted Browsing• Automatic Database Indexing• Federated Search– HA with failover
• Alternate output formats (JSON, Ruby)• Highlighter integration• Spellchecker• Alternate APIs (Google Data, OpenSearch)
FUTURE TRENDS
20
WEBSITESUSING SOLR
• Whitehouse.gov• Buy.com• Cnet• Netflix• Apple• Disney• eTrade• NASA• MTV• Zappos• AOL• Digg
WEBSITES USING SOLR
22
Thank You