BlaBlaCar Elastic Search Feedback

ElasticSearchfeedback

Introduction

Nicolas Blanc - BlaBlArchitect

SinfomicSinfomic (1999)

@thewhitegeek

(2001)

(2005)

(2008)

(2012)

What is BlaBlaCar ?

3 000 000MEMBERSIN EUROPE

10 9 countries10 9 countries

● France● Spain● Italy● UK● Poland● Portugal● Netherlands● Belgium● Luxemburg● NEW Germany

● France● Spain● Italy● UK● Poland● Portugal● Netherlands● Belgium● Luxemburg

Growth50 millions

25 millions

January

2008January

Infrastructure

2 front web servers 2 MySQL master (+4 slaves SSD) 1 private cloud

(KVM + Open vSwitch)● Redis● Memcache● RabbitMQ/workers

1 cluster ElasticSearch

Changing the Search Engine

What's existing ? Why Changing ?

MySQL Database● Relationnal DB (lots of join needed)● Plain SQL query● Home made geographical search

Recent problems● New feature, means more complex queries● Scalability : Performance depending on DB load

Initial requirements

Scalability● Trip search need to be made in less than 200ms● The system part of the solution easy to maintain● Be able to cluster it (also to not have SPOF)

Low code impact on existing application● Same features as of today (geographical search)● Minimize the developper's work ● Add one missing feature : facets

Initial Competitors

SenseiDB

Why ElasticSearch

✔ Easyest cluster possibility✔ Good performance when indexing✔ Few code to write to use it✔ Schema less✔ Based on Lucene✔ Written in Java (need to code grouping feature)

ElasticSearch has won,now migrate our search !

Changing our mindset

Object in Relationnal Database● Can be exploded on multiple tables● Lots of informations usable by JOIN

Object in Document Oriented Database● Only one big index for theses objects● All informations need to be in the object, not on multiple tables

Changing our mindset

Object in Relationnal Database● Can be exploded on multiple tables● Lots of informations usable by JOIN

Object in Document Oriented Database● Only one big index for theses objects● All informations need to be in the object, not on multiple tables

Well defining our objects

Need to know what we want to search● Searching trips (front office usage)● Searching members (backoffice usage)● Searching FAQ (front office usage)

Think of all needed field● The ones used for query● The ones used for filters● The ones used for facets

Thinking of well defining index

System point of view● Number of Nodes in the cluster● Number of Shards● Number of Replica

Application point of view● Define type and attributes for all fields (mapping)● Using parent/child or nested to improve indexing● How to push documents from DB ?

Indexing : using a river or not ?

River advantages● Plugs directly to our source backend● ElasticSearch API exists to code a new one

River problems● Not easy to add business logic on some fields● Really hard when your DB is unconventionnal● Full Reindex all the documents

Indexing : our manual way

We write an asynchronous indexer● Written in java● Have business logic when fetching from db● Fetch from multiple DB/source● Use of java ES library● Easy interface

●send {“trip”:1234567} and the server answer {“OK”}

One index sample : Trip

Well defining our object Trip

Think of all needed field● The ones used for query

● Trip date of departure,from where,to where,user id● The ones used for filters

● User ratings,price,vehicle,seats left,is user blocked(a blocked user, is a user who made some forbidden

action on the website.)● The ones used for facets

● User ratings,price,vehicle

Well defining our index Trip

Think of all system requirement● The cluster has 2 nodes

● We keep the default configuration for shards/replica

Think of object mapping● For each field :

● Define the type (string, long, geo_point, date, float, boolean)

● Define the scope (include_in_all)● Define the analyzer (for type string)

Trip Mapping

"trip": { "properties": { "is_user_blocked": { "type": "boolean", "include_in_all" : false }, "user_ratings" : { "type" : "long", "include_in_all" : false }, "from": { "type": "geo_point", "include_in_all" : false }, "price": { "include_in_all": false, "type": "float" },

"price_euro": { "type": "float", “include_in_all: false }, "seats_left": { "include_in_all": false, "type": "long" }, "seats_offered": { "include_in_all": false, "type": "long" }, "to": { "include_in_all": false, "type": "geo_point" },

"trip_date": { "format": "dateOptionalTime", "include_in_all": false, "type": "date" }, “vehicle”: { "include_in_all": false, "type": "string" }, "userid": { "include_in_all": false, "index": "not_analyzed", "type": "string" } }}

Well indexing eventsWhich modification send event change●All trips creation/deletion/modification●Member modifications (block or not)●New ratings from other members●A seat has been reserved●Member change his vehicle

Event change is a call to internal indexer●Send '{“trip”:123456}' to indexer (create/update)●Send '{“tripd”:123456}' to indexer (delete)

Sample trip index query{"query": { "filtered": { "query": { "match_all": {} }, "filter": { "and": [{ "geo_distance": { "distance": "40.14937866995km", "from": { "lat": 48.856614, "lon": 2.3522219 } } }, { "geo_distance": { "distance": "40.14937866995km", "to": { "lat": 45.764043, "lon": 4.835659 } } },

{ "range": { "price": { "from": 0, "include_lower": false } } }] } } }, "sort": [{ "trip_date": { "order": "asc" }, }], "filter": { "term": { "is_user_blocked": false } } }, "from": 0, "size": 10}

The Real WorldA trip has now more than 30 fields● (faq is around 25 fields)● (members even more...)

To build a trip document we need 3 differents SQL queries● (FAQ : 2 differents SQL queries)● (Member : 10 differents SQL queries)

A trip has only 1 shard (grouping)

And now the caveats

Preloaded Scripts

We use mvel script to improve scoring● They are not clustered● Each node need to have the scripts● Need a node restart to be added or modified

Solution : Chef (tool from Opscode) All nodes configurations are centralized into Chef repository

Grouping documents

Home made patchs to ElasticSearch(based on a Martijn Van Groningen work for lusini.de)

Soon in ElasticSearch(I hope so much)

Mapping modification

On a running index :Changing a type is not allowedChanging analyzer is not allowed

Solution : index alias1) Changing mapping → create a new index2) When new index is up to date → changing alias

IOs limits

We have only 2 nodes● Trip index is around 2GB● But only 1 shard for Trip index● Can index 100 trips / seconds on busy evening

Solution : We put Intel SSDs(waiting for distributed grouping feature)

Choosing the analyzer

Some field need to not be analyzed● If you use ISO code for country(IT, for Italy or DE for Germany are ignored in some cases)

Global analyzer has limits● Accentuation from countries like France, Germany or Spain are not always parsed correctly● One analyzer by country is difficult to implement in some cases

OK Sweet,What's next

Using ElasticSearch to ease log analysis

By the way…

We’re hiring !!! Dev, HTML Ninja, leader,…

Come & See me right now… or send me your friends

(And we have beer, baby foot and arcade cabinet )

Thank you !

Follow us !

@covoiturage

Apply now :

join@BlaBlaCar.com

BlaBlaCar Elastic Search Feedback

Travel

Transcript of BlaBlaCar Elastic Search Feedback

BlaBlaCar goes mobile! - PARP - Wspieramy e-Biznes · BlaBlaCar uzupełnia istniejący rynek o tanie ... Tańsze podróżowanie nawet przy rezerwacji na ostatnią chwilę. Możliwość

Europas BlaBlaCar pitch

IDCEE 2014: Ridesharing From Local Idea To Global Scale - Aleksey Lazorenko (Country Manager @ BlaBlaCar)

BlaBlaCar - NOAH16 London

BlaBlaCar and infrastructure automation

Social Drink-Up ! #8 - BlaBlaCar, from desktop to mobile first - Laure Wagner

NOUS RAPPROCHER - BlaBlaCar

Tekis Elastic - 2017 Elastic Catalog

Elastowave: Localized Tactile Feedback in a Soft Haptic ...of the Elastowave, and methods for rendering localized tactile feedback via focused elastic waves. Because our method is

BlaBlaCar Europas 2014

BlaBlaCar: How We Built a 25 Million Member Strong Community Based on Trust

PR case study: Profeina dla BlaBlaCar

Zabbix at BlaBlaCar - Paris Monitoring meetup #1

Consommation collaborative et réaction stratégique des marques : le cas de Ebay et Blablacar

How we design at BlaBlaCar by Julien Pelletier

Case Study: BlaBlaCar and Social Innova

A detailed analysis on BlaBlaCar ride-sharing users

BlaBlaCar - Going Native !

BlaBlaCar rides to greater returns across the globe … › blob › ...“Bing is really turning into one of our major sources of acquisition.” Ruchir Gupta Marketing Manager BlaBlaCar

OFFERING MEMORANDUM PART II OF OFFERING STATEMENT …€¦ · European unicorn BlaBlaCar in Paris. While at BlaBlaCar, Christian was responsible for scaling the overall engineering