Building a CRM on top of ElasticSearch

17
+ How we’re building a CRM on top of ElasticSearch

description

How EverTrue is building a donor CRM on top of ElasticSearch. We cover some of the issues around scaling ElasticSearch and which aspects of ElasticSearch we are using to deliver value to our customers.

Transcript of Building a CRM on top of ElasticSearch

Page 1: Building a CRM on top of ElasticSearch

+

How we’re building a CRM on top of ElasticSearch

Page 2: Building a CRM on top of ElasticSearch

About me (quickly)

Director of Engineering @ EverTrue

Love distributed data stores, love them!

Using ElasticSearch for ~1 year

Mark Greene / @markjgreene

Page 3: Building a CRM on top of ElasticSearch

What does EverTrue do?

We help nonprofits raise more money

by allowing them to identify and build relationships with potential donors

Page 4: Building a CRM on top of ElasticSearch

How do we do that?

Obligatory database tube

Resolving identities across third party data sources

Page 5: Building a CRM on top of ElasticSearch

Cluster Setup•3 Masters, 2 data nodes, AZ aware

•~40m documents, ~25GB

•1 index, 7 types

•5 shards, 1 replica

•Peak work loads equate to 4-5k ops/s

•Using mostly default settings

Page 6: Building a CRM on top of ElasticSearch

Data Model•Mapping contains ~50 default fields.

•Most fields are stored as both analyzed and not analyzed

•Leverage dynamic templates for custom fields created by our customers

•Each custom field is stored by as analyzed and not analyzed

Page 7: Building a CRM on top of ElasticSearch

Write Path

SQSSQSSQSSQS

Background Background JobsJobs

Background Background JobsJobs

Page 8: Building a CRM on top of ElasticSearch

Read Path

3. Load full contact objects w/ meta Offline streaming jobs

ContactContacts APIs API

ContactContacts APIs API

Search Search APIAPI

Search Search APIAPI

1. Submit EverTrue DSL

Query

2. Translate to ES Query, returns contact

Id’s

Page 9: Building a CRM on top of ElasticSearch

Arbitrary field filtering

Aggregations ES Hadoop Plugin

Page 10: Building a CRM on top of ElasticSearch

Filter Cache: Our first scaling issue

Turns out field cache is unbounded by default...

Page 11: Building a CRM on top of ElasticSearch

First Solution

• We set indices.fielddata.cache.size to 50%

• No more OOME Crashes

• Then something else happened....Really slow queries (Problem sign #1)

Page 12: Building a CRM on top of ElasticSearch
Page 13: Building a CRM on top of ElasticSearch

Slow Query?... More Hardware Right?!

Type m1.xlarge r3.2xlarge r3.2xlarge

Hardware

4 CPU 8 CPU 8 CPU

15GB RAM 60GB RAM 60GB RAM

Round disk thingy SSD’s SSD’s

ES Version v1.1.2 v1.1.2 v1.3.2

has_child query time 12-15s 6-8s ~100ms

Page 14: Building a CRM on top of ElasticSearch

Lessons Learned

•Watch the release notes & GH issues like a hawk

•Don’t fall to far behind w/r/t versions

•We waited to long (6 months)

•Keep ES fed with plenty of memory

•Need monitoring to have any hope of understanding operational issues

Page 15: Building a CRM on top of ElasticSearch

Settings We Tweaked

• indices.store.throttle.max_bytes_per_sec

• Default 20mb -> 60mb (SSD’s can handle it)

• indices.fielddata.cache.size

• Set to 70% of heap

Page 16: Building a CRM on top of ElasticSearch

ES Hadoop Integration

•We use it for a lot of our offline jobs

•One map task per shard

•Small shard deployments may underutilize your hadoop cluster

•Mapper inputs do not contain meta fields like _version

•Forces another read for write back scenarios

Page 17: Building a CRM on top of ElasticSearch

tail -f ~/questions