EXPERIMENTS IN DATA SCIENCE - Project Jupyter...EXPERIMENTS IN DATA SCIENCE Examples oftheneedfor...

3
Date Tuesday March 19 2019 EXPERIMENTS IN DATA SCIENCE Examples of theneed for experimentation 1 Economicindicators based on a country's poverty employment rate happiness etc 2 Medical treatments will treatment A help control ailment 13 Focus Assessing cause a effect aka causal analysis mm Mmmmm Purposeful data collection IEEE.EE EstaiksaiahIeEnieIesEmatiwaysof Notation and Nomenclature Dependent variable y Measures the outcome that we want to optimize over Ex CTR session duration bounce rate etc click throughrate Explanatory variables Xi Xs Xp variables that we expect to influence our dependent variable Y In an experiment explanatory variables are referred to as factors the values they can take on leg domain are called levels

Transcript of EXPERIMENTS IN DATA SCIENCE - Project Jupyter...EXPERIMENTS IN DATA SCIENCE Examples oftheneedfor...

Page 1: EXPERIMENTS IN DATA SCIENCE - Project Jupyter...EXPERIMENTS IN DATA SCIENCE Examples oftheneedfor experimentation 1 Economicindicators based on a country's poverty employment rate

Date Tuesday March19 2019

EXPERIMENTS IN DATA SCIENCE

Examples of theneedfor experimentation1 Economicindicators based on a country's povertyemployment rate happiness etc

2 Medical treatments will treatmentA helpcontrol ailment 13

Focus Assessing causea effect akacausalanalysis

mm Mmmmm

Purposeful datacollection

IEEE.EE EstaiksaiahIeEnieIesEmatiwaysofNotationand Nomenclature

DependentvariableyMeasures the outcomethat wewanttooptimizeoverEx CTR session duration bouncerate etc

clickthroughrateExplanatory variables XiXs Xpvariables that we expect to influence ourdependentvariable YIn an experiment explanatory variables are referred toasfactorsthe values they can take on legdomain arecalled levels

Page 2: EXPERIMENTS IN DATA SCIENCE - Project Jupyter...EXPERIMENTS IN DATA SCIENCE Examples oftheneedfor experimentation 1 Economicindicators based on a country's poverty employment rate

Primary aim Understandwhich combinations of explanatoryvariables have a causal relationship with YThisinference gives us an action for futuredesignengineering

Experimental conditions

Uniquecombinations of the levels ofoneormore factors

Experimental UnitsApplied to each condition andresponse value is recorded

Example 1 Buttonmessage

Yi I indi click button button iStates

Xii message 4,1g submit3419Gro 39,19 let'sgoXia color p I button i is red to219batoniisblue

Conditions 3submit R 3submit Bgo R go Blet'sgo R 9 Let'sgo B

Experimentalunits Individualsthatwe've assigned

eachcondition above

Page 3: EXPERIMENTS IN DATA SCIENCE - Project Jupyter...EXPERIMENTS IN DATA SCIENCE Examples oftheneedfor experimentation 1 Economicindicators based on a country's poverty employment rate

Experimentsvs Observational StudiesIn an experiment we control and know howunitsareassigned to a condition we can then assesscausal relationships between conditions and the response

In an Obs Study wehave no control overassignmentto conditions Instead the data is observedpassivelyIt is difficult to test for casuality here thoughmethods do existEx DAGS propensityscorematching Grangercausality

Directedacyclic graph

Example AIB testing of user activity in secondson version At B of a websiteconditions versionA version B 12conditionsDependent variable yi time in second user

i stays on the siteExperimental unit Theusers

Note Assignment of units to conditions isdoneusing various forms of randomizationThe choice of randomization is typicallyreferred to as the Design

Usually we cannot or do not wantto assign unitsto multiple conditions1Drugtreatment version of a webpage seeing toomanyconfuse or frustrate an UterBecause of this we do not measure the dependentvariablefor aUler on at leastone condition The unobserved responsefor that user1condition is called a counterfactualTheprimary aim of design is to ensure thattheonlydifference we see in response are due to differencesin conditions thus we need to control for other intrinsicfeature