SF ElasticSearch Meetup 2012.10.03

Post on 26-Jun-2015

604 views 3 download

Tags:

description

Some thoughts on scaling ElasticSearch, especially related to index building and optimizing for query performance.

Transcript of SF ElasticSearch Meetup 2012.10.03

Scaling ElasticSearch

SF Meetup2012.10.03

Sushant Shankarsushant.shankar@33across.com

Agenda

• Why we need a search engine• Monitoring• Index Building• Query Performance

Who is asdfas

>600,000 PublishersMachine Learning and Graph algorithms to:- Build advertising segments- Extract insights out of social and interest data- Target via high-performance distributed systems that

integrate with our advertising partners

Website | Facebook | Twitter

Why we really need a search engine

… …

Batch! Good for complicated tasks (Machine Learning, Graph Algorithms, etc.)

INDEX BUILDING

1 WEEK → 3 HOURS

Mappers to build index

Build index using MR job and Bulk API

6 nodes, 24GB RAM16GB for ES service 4 cores3x 1.5TB drive

>1TB/index (replicated) ~300M documents~5KB / document~3 hours

Monitoring: Zabbix

Monitoring: SPM

Parameter OptimizationAmount bulk indexed

# Shards

Time takenCPU util.

Mem util.Disk I/ONetwork

Index Building: Learnings

• Bulk API• No replicas• 2 shards / CPU• 10,000 documents (users) per indexing

request• Refresh off (index.refresh_interval = -1)

QUERY PERFORMANCE

5 MINUTES 10 SECONDS

Query Performance: Learnings

• 1-2 Replicas (and for reliability)• Turn refresh on again (5s default)• Warm up effect (Index Warm up API 0.20+)• Optimize API• Simulate multiple users

Warm Up: load into memory and cache

Other cool features

• Custom Scoring functions• Scripts – MVEL, Python• Facets

• Exploring:• Real-time indexing• Indexing images, files, etc.• Parent-child relationships

QUERIES?

Index Building over time