BlaBlaCar Elastic Search Feedback

37
1/37 ElasticSearch feedback

Transcript of BlaBlaCar Elastic Search Feedback

Page 1: BlaBlaCar Elastic Search Feedback

1/37

ElasticSearchfeedback

Page 2: BlaBlaCar Elastic Search Feedback

2/37

Introduction

Page 3: BlaBlaCar Elastic Search Feedback

3/37

Nicolas Blanc - BlaBlArchitect

SinfomicSinfomic (1999)

@thewhitegeek

(2001)

(2005)

(2008)

(2012)

Page 4: BlaBlaCar Elastic Search Feedback

4/37

What is BlaBlaCar ?

Page 5: BlaBlaCar Elastic Search Feedback

5/37

3 000 000MEMBERSIN EUROPE

Page 6: BlaBlaCar Elastic Search Feedback

6/37

10 9 countries10 9 countries

● France● Spain● Italy● UK● Poland● Portugal● Netherlands● Belgium● Luxemburg● NEW Germany

● France● Spain● Italy● UK● Poland● Portugal● Netherlands● Belgium● Luxemburg

Page 7: BlaBlaCar Elastic Search Feedback

7/37

Growth50 millions

25 millions

January

2008January

2013

Page 8: BlaBlaCar Elastic Search Feedback

8/37

Infrastructure

2 front web servers 2 MySQL master (+4 slaves SSD) 1 private cloud

(KVM + Open vSwitch)● Redis● Memcache● RabbitMQ/workers

1 cluster ElasticSearch

Page 9: BlaBlaCar Elastic Search Feedback

9/37

Changing the Search Engine

Page 10: BlaBlaCar Elastic Search Feedback

10/37

What's existing ? Why Changing ?

MySQL Database● Relationnal DB (lots of join needed)● Plain SQL query● Home made geographical search

Recent problems● New feature, means more complex queries● Scalability : Performance depending on DB load

Page 11: BlaBlaCar Elastic Search Feedback

11/37

Initial requirements

Scalability● Trip search need to be made in less than 200ms● The system part of the solution easy to maintain● Be able to cluster it (also to not have SPOF)

Low code impact on existing application● Same features as of today (geographical search)● Minimize the developper's work ● Add one missing feature : facets

Page 12: BlaBlaCar Elastic Search Feedback

12/37

Initial Competitors

SenseiDB

Page 13: BlaBlaCar Elastic Search Feedback

13/37

Why ElasticSearch

✔ Easyest cluster possibility✔ Good performance when indexing✔ Few code to write to use it✔ Schema less✔ Based on Lucene✔ Written in Java (need to code grouping feature)

Page 14: BlaBlaCar Elastic Search Feedback

14/37

ElasticSearch has won,now migrate our search !

Page 15: BlaBlaCar Elastic Search Feedback

15/37

Changing our mindset

Object in Relationnal Database● Can be exploded on multiple tables● Lots of informations usable by JOIN

Object in Document Oriented Database● Only one big index for theses objects● All informations need to be in the object, not on multiple tables

Page 16: BlaBlaCar Elastic Search Feedback

16/37

Changing our mindset

Object in Relationnal Database● Can be exploded on multiple tables● Lots of informations usable by JOIN

Object in Document Oriented Database● Only one big index for theses objects● All informations need to be in the object, not on multiple tables

Page 17: BlaBlaCar Elastic Search Feedback

17/37

Well defining our objects

Need to know what we want to search● Searching trips (front office usage)● Searching members (backoffice usage)● Searching FAQ (front office usage)

Think of all needed field● The ones used for query● The ones used for filters● The ones used for facets

Page 18: BlaBlaCar Elastic Search Feedback

18/37

Thinking of well defining index

System point of view● Number of Nodes in the cluster● Number of Shards● Number of Replica

Application point of view● Define type and attributes for all fields (mapping)● Using parent/child or nested to improve indexing● How to push documents from DB ?

Page 19: BlaBlaCar Elastic Search Feedback

19/37

Indexing : using a river or not ?

River advantages● Plugs directly to our source backend● ElasticSearch API exists to code a new one

River problems● Not easy to add business logic on some fields● Really hard when your DB is unconventionnal● Full Reindex all the documents

Page 20: BlaBlaCar Elastic Search Feedback

20/37

Indexing : our manual way

We write an asynchronous indexer● Written in java● Have business logic when fetching from db● Fetch from multiple DB/source● Use of java ES library● Easy interface

●send {“trip”:1234567} and the server answer {“OK”}

Page 21: BlaBlaCar Elastic Search Feedback

21/37

One index sample : Trip

Page 22: BlaBlaCar Elastic Search Feedback

22/37

Well defining our object Trip

Think of all needed field● The ones used for query

● Trip date of departure,from where,to where,user id● The ones used for filters

● User ratings,price,vehicle,seats left,is user blocked(a blocked user, is a user who made some forbidden

action on the website.)● The ones used for facets

● User ratings,price,vehicle

Page 23: BlaBlaCar Elastic Search Feedback

23/37

Well defining our index Trip

Think of all system requirement● The cluster has 2 nodes

● We keep the default configuration for shards/replica

Think of object mapping● For each field :

● Define the type (string, long, geo_point, date, float, boolean)

● Define the scope (include_in_all)● Define the analyzer (for type string)

Page 24: BlaBlaCar Elastic Search Feedback

24/37

Trip Mapping

"trip": { "properties": { "is_user_blocked": { "type": "boolean", "include_in_all" : false }, "user_ratings" : { "type" : "long", "include_in_all" : false }, "from": { "type": "geo_point", "include_in_all" : false }, "price": { "include_in_all": false, "type": "float" },

"price_euro": { "type": "float", “include_in_all: false }, "seats_left": { "include_in_all": false, "type": "long" }, "seats_offered": { "include_in_all": false, "type": "long" }, "to": { "include_in_all": false, "type": "geo_point" },

"trip_date": { "format": "dateOptionalTime", "include_in_all": false, "type": "date" }, “vehicle”: { "include_in_all": false, "type": "string" }, "userid": { "include_in_all": false, "index": "not_analyzed", "type": "string" } }}

Page 25: BlaBlaCar Elastic Search Feedback

25/37

Well indexing eventsWhich modification send event change●All trips creation/deletion/modification●Member modifications (block or not)●New ratings from other members●A seat has been reserved●Member change his vehicle

Event change is a call to internal indexer●Send '{“trip”:123456}' to indexer (create/update)●Send '{“tripd”:123456}' to indexer (delete)

Page 26: BlaBlaCar Elastic Search Feedback

26/37

Sample trip index query{"query": { "filtered": { "query": { "match_all": {} }, "filter": { "and": [{ "geo_distance": { "distance": "40.14937866995km", "from": { "lat": 48.856614, "lon": 2.3522219 } } }, { "geo_distance": { "distance": "40.14937866995km", "to": { "lat": 45.764043, "lon": 4.835659 } } },

{ "range": { "price": { "from": 0, "include_lower": false } } }] } } }, "sort": [{ "trip_date": { "order": "asc" }, }], "filter": { "term": { "is_user_blocked": false } } }, "from": 0, "size": 10}

Page 27: BlaBlaCar Elastic Search Feedback

27/37

The Real WorldA trip has now more than 30 fields● (faq is around 25 fields)● (members even more...)

To build a trip document we need 3 differents SQL queries● (FAQ : 2 differents SQL queries)● (Member : 10 differents SQL queries)

A trip has only 1 shard (grouping)

Page 28: BlaBlaCar Elastic Search Feedback

28/37

And now the caveats

Page 29: BlaBlaCar Elastic Search Feedback

29/37

Preloaded Scripts

We use mvel script to improve scoring● They are not clustered● Each node need to have the scripts● Need a node restart to be added or modified

Solution : Chef (tool from Opscode) All nodes configurations are centralized into Chef repository

Page 30: BlaBlaCar Elastic Search Feedback

30/37

Grouping documents

Home made patchs to ElasticSearch(based on a Martijn Van Groningen work for lusini.de)

Soon in ElasticSearch(I hope so much)

Page 31: BlaBlaCar Elastic Search Feedback

31/37

Mapping modification

On a running index :Changing a type is not allowedChanging analyzer is not allowed

Solution : index alias1) Changing mapping → create a new index2) When new index is up to date → changing alias

Page 32: BlaBlaCar Elastic Search Feedback

32/37

IOs limits

We have only 2 nodes● Trip index is around 2GB● But only 1 shard for Trip index● Can index 100 trips / seconds on busy evening

Solution : We put Intel SSDs(waiting for distributed grouping feature)

Page 33: BlaBlaCar Elastic Search Feedback

33/37

Choosing the analyzer

Some field need to not be analyzed● If you use ISO code for country(IT, for Italy or DE for Germany are ignored in some cases)

Global analyzer has limits● Accentuation from countries like France, Germany or Spain are not always parsed correctly● One analyzer by country is difficult to implement in some cases

Page 34: BlaBlaCar Elastic Search Feedback

34/37

OK Sweet,What's next

?

Page 35: BlaBlaCar Elastic Search Feedback

35/37

Using ElasticSearch to ease log analysis

Page 36: BlaBlaCar Elastic Search Feedback

36/37

By the way…

We’re hiring !!! Dev, HTML Ninja, leader,…

Come & See me right now… or send me your friends

(And we have beer, baby foot and arcade cabinet )

Page 37: BlaBlaCar Elastic Search Feedback

37/37

Thank you !

Follow us !

@covoiturage

Apply now :

[email protected]