Post on 15-Jan-2015
description
Search@Hyves
Anuj Ahuja | anuj@hyves.nl | anujahuja.hyves.nl | #anujmca
female single amsterdam 20
Old Search
• MySQL Full text indexes• Hash Match - Combinations of the searchterms are stored
Limitations• Indexing is very slow - takes ~5h to index • Fragile State management - As coordinated by daemons• Scalability - It is not transparent for Application• No support for indexing data from distributed databases
Scale@Hyves?
• MySql Master-Slave architectureso 40 Masters, 284 Slaves
• Storeo ~64 Clusters, ~256 Mysql Hosts
• How big is dataset [Jan 2011] ?o ~400G of indexable datao Includes reactions, photos Title, WWW, etc
Search Architecture - Ideas
Function• Enable search for everything on Hyves• Apply social relevance/weight to content• Make new data available for search within an hour
Tech• Combine data from multiple data sources• Attributes based filtering - for example geo location• Abstract state management from data import jobs • Scaling should be transparent to application layer
Search Architecture - Decisions
• Pure data jobs Vs Leveraging Hyves application stack(PHP)• Listeners Vs Iterator• Handling deletes - Realtime updates Vs Ignore on select
Search Architecture - Technology
• Search backend - Sphinx• Data Importers - PHP and Hadoop Job• Pre-Indexing database - Mysql on temp fs• State Management - Mysql (Innodb)• Job Orchestration – Jenkins• Deploy – Puppet, Hyves Deploy Script• Monitoring - Ganglia, Realtime stats, Google Analytics
Search Architecture - SearchTube
Sphinx?
• Sphinx is full text search server written in C++• Easy Distribution• Attributes based filtering• Support querying multiple indexes• Ranking - (BM25 + Phrase Proximity) + Social Relevance • Utilize multi-core machines by distributed index• Benchmarking results
Search Tube - Job Orchestration
• Responsible for executing and synchronizing various jobs• Jenkins Plugin
o Join Plugin - Job synchronizationo Plot plugin - Reportingo Dependency Graph View Plugin - Visualization
• Other servers are added as labeled nodeso slow slaves, hadoop node, search slaves, etc.
• Puppetized and Jenkins API• https://github.com/salimfadhley/jenkinsapi
Search Tube - Jenkins
Search Tube - Reporting
Search - Failover Scenarios
Failed 1
Failed 2
Failed 3
Failed 4
Failed 5
Failed 6
Failed 7
Failed 8
Search - What is new?
• Simplified user interface- Single search field for searcho “ivo utrecht 26” [first name + city + age]o “amsterdam female 20” [city + gender + age]o “ram* van alte* ams” [partial search]o “milea marius” [last name + First name]o “coumans amst” [last name + city]o “hyves hq” [hub name]
• Improved Rankingo Member results are influenced by number of friendso Hub results are influenced by number of hub members.
• Snappy searcho Server side it takes ~ 20mso Enabled search on every key stroke.
• Refining results o Results can be further refined by type for example member, hub, etc.
• New Content is indexed every hour
Search Result [December]
• Page View - 8,599,572• Ajax Search Queries - 28,742,425• Search Slaves (2x3 slaves, 2 search master )
o During peeks hours 120 Search/seco Average query ~20ms
• Google Analytic shows click through and relevance
* Only 1% of traffic is measured by Google Analytic
Questions?
Anuj Ahuja | anuj@hyves.nl | anujahuja.hyves.nl | #anujmca