ANDREW BROCKIE Develop a Search Engine Friendly Website using Joomla
Develop open source search engine
-
Upload
iwillstudycom -
Category
Technology
-
view
1.068 -
download
2
Transcript of Develop open source search engine
DEVELOP OPEN SOURCE SEARCH ENGINERitesh Ambastha – CEO, iWillStudy.com26th Feb 2012
Open Source Search Engines
Sphinx Lucene DataparkSearch
Zettair YaCy Xapian
SWISH-E Seeks Recoll
OpenFTS Nutch Namazu
We are going to talk about
Sphinx & Apache-Solr
Sphinx
Sphinx is an open source full text search server.
It's written in C++ and works on Linux (RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few other systems.
Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily
Sphinx
Text processing features Searching via SphinxAPI is as
simple as 3 lines of code, and querying via SphinxQL is even simpler
Sphinx clusters scale up to billions of documents and tens of millions search queries per day, powering top websites such as Craigslist, DailyMotion, NetLog, etc.
Performance and scalability
Indexing performance: Sphinx indexes up to 10-15 MB of text per second per single CPU core.
Searching performance: Searching through 1,000,000-document, 1.2 GB text collection that they use for everyday development and testing runs at 500+ queries/sec on a 2-core desktop machine with 2 GB of RAM.
Scalability: Biggest known Sphinx cluster indexes almost 5 billion documents, resulting in over 6 TB of data.
Busiest known one is, unsurpisingly, Craigslist, top-10 website in the US that serves 50+ million search queries/day.
Key Features
Batch and Real-Time full-text indexes Non-text attributes support SQL database indexing Non-SQL storage indexing Easy application integration Advanced full-text searching syntax Rich database-like querying features Better relevance ranking Flexible text processing Distributed searching
http://lucene.apache.org/solr/
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project.
Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.
Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat.
Solr Features
Advanced Full-Text Search Capabilities Optimized for High Volume Web Traffic Standards Based Open Interfaces - XML,JSON and
HTTP Comprehensive HTML Administration Interfaces Server statistics exposed over JMX for monitoring Scalability - Efficient Replication to other Solr
Search Servers Flexible and Adaptable with XML configuration Extensible Plugin Architecture
What is it all about?
Solr is based on Lucene
More about Lucene