New Metrics Engine to Help Drive UBER
-
Upload
sasha-ovsankin -
Category
Software
-
view
25 -
download
0
Transcript of New Metrics Engine to Help Drive UBER
t.uber.com/scala2016
Sasha OvsankinUBER
New Metrics Engine to Help Drive Uber
November 2016
1
t.uber.com/scala2016
“Transportation as reliable as running water, everywhere. for everyone”
2
t.uber.com/scala2016
Data & Analytics Engineer
About Me
Mathematical PhysicsLomonosov Moscow University
Contact
[email protected]://linkedin.com/in/sashaohttp://t.uber.com/scala2016
3
t.uber.com/scala2016
What Do We Work On
Fuel Uber’s innovation, make software release cycle more robust and
data driven
Experimentation Platform Uber Data Platform
Cutting-edge data platforms powering Uber’s intelligence
4
t.uber.com/scala2016
What This Talk Is About
Building a company-wide Metrics Platform is possible and practical,
and you should do it
5
t.uber.com/scala2016
Agenda
Why Metrics PlatformTechnologyProcessConclusion
6
t.uber.com/scala2016
How Do You Want Your Metrics?
Aligned
Reliable
Trusted
7
t.uber.com/scala2016
Uber Situation
● Over 450 cities in over 70 countries
● Lots of growth: ○ 1B rides by Dec 2015, 2B rides by
June 2016● Teams have high level of
independence
8
t.uber.com/scala2016
How do you make data-driven decisions in a business like that?
9
t.uber.com/scala2016
Metrics Platform = Technology + Process
10
t.uber.com/scala2016
Our Metrics PlatformArchitecture and Process
Engines
Registry
Council
Web-UI
Spark / Hive / Real Time
BI Tool UI
DS / Ops / Product
Definition DSL Query DSL
11
t.uber.com/scala2016
Our Metrics Platform
Easy & Powerful
Integrated
Lightweight Process
12
t.uber.com/scala2016
Metric Walkthrough
Metric hours active
English description
Hours spent by drivers logged-in and online in the driver app
SQL select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac, right join dim.driver dr on dr.id=ac.driver_id
13
t.uber.com/scala2016
Metric walkthroughContinued
Add date select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac, right join dim.driver dr on dr.id=ac.driver_idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’
In San Francisco select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac, right join dim.driver dr on dr.id=ac.driver_idjoin dim.city c on c.id=ac.city_idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’ and c.name=’San Francisco’
14
t.uber.com/scala2016
Metric walkthroughContinued#2
Group by experiment treatment
select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac, right join dim.driver dr on dr.id=ac.driver_idjoin dim.city c on c.id=ac.city_idjoin xp.user_experiment xp on xp.user_id=dr.idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’ and c.name=’San Francisco’and xp.experiment_key=’crm_driveronboarding_wcdrip’group by xp.treatment
Group by driver type
select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac, right join dim.driver dr on dr.id=ac.driver_idjoin dim.city c on c.id=ac.city_idjoin xp.user_experiment xp on xp.user_id=dr.id join model.driver dm on dm.id=dr.idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’ and c.name=’San Francisco’and xp.experiment_key=’crm_driveronboarding_wcdrip’group by xp.treatment, model.driver.type
15
t.uber.com/scala2016
Complicated
Unmanageable
Fragile
16
t.uber.com/scala2016
Anatomy of a Metric
Preaggregationtransformations
Preaggregationtransformations
Aggregation
Aggregation
Post-aggregationformulasInput
Input
Input
Results
Dimensions
Metric definitions
Filters
FinalJoin
dim1
dim2
…
m1
m2
...
dim1
dim2
…
m1
m2
...
select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac,right join dim.driver dr on dr.id=ac.driver_idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’group by driver.city
17
t.uber.com/scala2016
Metric = Formula + Query
select(core.avg_driver_hours_active)where(dim_city.name===”San Fransisco”)over(days(7) upto today)groupBy(driver_model.category, user_experiment.treatment)
select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac,right join dim.driver dr on dr.id=ac.driver_idjoin dim.city c on c.id=ac.city_idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’and c.name=’San Francisco’...
avg_driver_hours_active = sum(agg.driver_activity.minutes_active / 60) / count(dim.driver)
FormulaQuery
18
t.uber.com/scala2016
The Metrics DSL: Formula
val hours_online = driver_activity.minutes_active / 60val all_drivers = count(dim_driver)val avg_driver_hours_online = sum(hours_online) / all_drivers
sum count
/
/
driver_activity.minutes_active 60
dim_driver
19
t.uber.com/scala2016
The Metrics DSL: Query
val query= select(avg_driver_hours_online) where(dimDriver.partner_city_id==="San Francisco") over(days * 7 towards today) groupBy(dimDriver.partner_city_id)
val df= engine.toDF(query)
DSL DataFrame Output20
t.uber.com/scala2016
✔ Easy & Powerful
Integrated
Lightweight Process
21
t.uber.com/scala2016
The Engine Core
Company Schema Repository
22
t.uber.com/scala2016
The Engine Core
Company Schema Repository
Table Schemas
Foregn key Relationships
23
t.uber.com/scala2016
The Engine Core
Company Schema Repository
Table Schemas
Foreign key Relationships
Engine Configuration
24
t.uber.com/scala2016
The Engine Core
Company Schema Repository
Table Schemas
Foreign key Relationships
Engine Configuration
Engine Core
Query
25
t.uber.com/scala2016
The Engine Core
Company Schema Repository
Table Schemas
Foreign key Relationships
Engine Configuration
Engine Core
Query
Execution Plan
26
t.uber.com/scala2016
✔ Easy & Powerful
✔ Integrated
Lightweight Process
27
t.uber.com/scala2016
Our Metrics PlatformArchitecture and Process
Engines
Registry
Council
Web-UI
Spark / Hive / Real Time
BI Tool UI
DS / Ops / Product
Definition DSL Query DSL
28
t.uber.com/scala2016
Metric Creation Process
29
t.uber.com/scala2016
Metric Management Web UI
Video link: https://youtu.be/we3q6O4eZIg 30
t.uber.com/scala2016
Our Metrics PlatformTechnology
✔ Easy & Powerful
✔ Integrated
✔ Lightweight Process
31
t.uber.com/scala2016
● Experimentation● Product groups● Financial reporting● Real time decision making● Fraud detection
Users
32
t.uber.com/scala2016
● Futher adoption within Uber● Further work on DSL● More Engines● Real Time ● Open Source?
Future Direction
Interested?
http://t.uber.com/[email protected]
33
t.uber.com/scala2016
What this talk was about
Building company-wide Metrics Platform is possible and practical,
and you should do it
34
t.uber.com/scala2016
The Metrics Platform Team
Contact us:● http://t.uber.com/scala2016● [email protected]
35
We are hiring!
t.uber.com/scala2016
Questions?
36
t.uber.com/scala2016
Thank you
Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be
reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or
by any information storage or retrieval systems, without permission in writing from Uber. This document is intended
only for the use of the individual or entity to whom it is addressed and contains information that is privileged,
confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified
that the information contained herein includes proprietary and confidential information of Uber, and recipient may not
make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person
other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber.
Image credits: ● Erik bij de Vaate● Bernard Spragg
37