Apache Atlas. Data Governance for Hadoop. Strata London 2015
Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016
-
Upload
june-andrews -
Category
Data & Analytics
-
view
2.877 -
download
0
Transcript of Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016
![Page 1: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/1.jpg)
![Page 2: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/2.jpg)
Iterative supervised clusteringA dance between data science and machine learning
Dr June Andrews — September 2016
![Page 3: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/3.jpg)
Explore Pinterest’s content Question our understanding Inspire the future
Agenda
1
2
3
Design system
![Page 4: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/4.jpg)
Explore Pinterest’s content Question our understanding Inspire the future
Agenda
1
2
3
Design system
![Page 5: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/5.jpg)
Clothing Cooking Decorating Beauty Teaching Carpentry Cars Animated GIFs
Electronics Stereos Fashion Sewing Articles Painting Photography Nature
Cute cats Tattoos Hair Microscopy TV shows Apps Self help Motorcycles
![Page 6: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/6.jpg)
![Page 7: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/7.jpg)
Chairs
![Page 8: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/8.jpg)
Fashion
Travel
Garden
Chairs
Food
![Page 9: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/9.jpg)
Links are behind every PinHow are users engaging with link domains?
2:50 PM 100%
![Page 10: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/10.jpg)
Tool Pros Cons
Cluster algorithms (SVM, K-Means, Spectral)
• Considers all users • Accurate
• Tough to communicate • Definitions change over time
User experience studies • Deep knowledge • Captures the immeasurable
• Costly • Considers few users
Domain expert hypothesis • Human interpretable • Inaccurate
![Page 11: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/11.jpg)
Tool Pros Cons
Cluster algorithms (SVM, K-Means, Spectral)
• Considers all users • Accurate
• Tough to communicate • Definitions change over time
User experience studies • Deep knowledge • Captures the immeasurable
• Costly • Considers few users
Domain expert hypothesis • Human interpretable • Inaccurate
![Page 12: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/12.jpg)
Current cluster analysisClean and load data into favorite clustering algorithm
Build visualizations on top of clusters
Fiddle with parameters in clustering algorithm
Add human labels to each cluster
Share human interpretation of clusters
1
2
3
4
5
![Page 13: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/13.jpg)
Current cluster analysisClean and load data into favorite clustering algorithm
Build visualizations on top of clusters
Fiddle with parameters in clustering algorithm
Add human labels to each cluster
Share human interpretation of clusters
1
2
3
4
5
Fatal flaw
![Page 14: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/14.jpg)
Human in the loop computingCommunity membership identification from small seed sets (Kloumann & Kleinberg)
T
Domain Expert
Favorite Clustering Algorithm
![Page 15: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/15.jpg)
Human in the loop computingWhen machine confidence dips, engage with domain expert
T
Domain Expert
Favorite Clustering Algorithm
?
T
Unsure
Confident
![Page 16: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/16.jpg)
Human in the loop computingWhen machine confidence dips, engage with domain expert
T
Domain Expert
Favorite Clustering Algorithm
T
T
Unsure
Confident
?
![Page 17: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/17.jpg)
Human in the loop computingDomain expert determines when labeling is done
T
Domain Expert
Favorite Clustering Algorithm
T
Thats all!
![Page 18: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/18.jpg)
Current analysis methodologyClean and load data into favorite clustering algorithm
Build visualizations on top of clusters
Fiddle with parameters in clustering algorithm
Add human labels to each cluster
Share human interpretation of clusters
1
2
3
4
5
![Page 19: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/19.jpg)
Human in the loop computingStage 1: Machine clusters data
Favorite Clustering Algorithm
![Page 20: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/20.jpg)
Human in the loop computingStage 2: Domain expert creates 1 human interpretable cluster
Domain Expert
![Page 21: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/21.jpg)
Human in the loop computingStage 3: Remove human labeled clusters and iterate
Favorite Clustering Algorithm
Domain Expert
![Page 22: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/22.jpg)
How are users engaging with link domains?
• For a sample set of link domains we’re interested in: • All Pin creates in their first year on Pinterest • All repins in their first year on Pinterest • 100k link domains sampled total
Links are behind every Pin
2:50 PM 100%
![Page 23: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/23.jpg)
Python Notebook
![Page 24: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/24.jpg)
Provides guided iteration
Python Notebook
![Page 25: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/25.jpg)
Sample visualization for each cluster
Python Notebook
Pin creates RepinsFew Many
Many
Few
![Page 26: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/26.jpg)
Iteration 1
Title Dark content
Description Fewer than 2 Pins a week on average
Examples Noisy low quality content
![Page 27: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/27.jpg)
Iteration 242% of domains left
Few Many Few Some Few Many
0 0 0 0 0 0
Cluster 1 Cluster 3Cluster 2
Pin creates Repins Pin creates RepinsPin creates Repins
![Page 28: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/28.jpg)
DescriptionDomains with few Pins, but these Pins thrive in the Pinterest ecosystem
Calculation
def detect_pinterest_specials(domain_engagement): ratio = domain_engagement.n_repins / max(1.0, float(domain_engagement.n_pin_creates)) return domain_engagement.n_pin_creates <= X and ratio >= Y
Examples Fashion and impulse sites
Iteration 2Pinterest specials
Few
Pinterest specialsRepins
Many
0 0
Pin creates
![Page 29: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/29.jpg)
Iteration 333% of domains left
Few Few Few Some Few Many
0 0 0 0 0 0
Cluster 1 Cluster 3Cluster 2
Pin creates Repins Pin creates RepinsPin creates Repins
![Page 30: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/30.jpg)
Iteration 3Steady growth
DescriptionActive Pin creates and steady growth throughout the year
Calculationdef detect_steady_growth(domain_engagement): (growth_rate, intercept) = np.polyfit(range(len(domain_engagement.monthly_repins)), domain_engagement.monthly_repins,1) return months_pins_created >= X and growth_rate >= Y
Examples Recipe and DIY sites
Some
Steady growthRepins
Many
0 0
Pin creates
![Page 31: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/31.jpg)
Iteration 425% of domains left
Few Some Many Some Few Some
0 0 0 0 0 0
Cluster 1 Cluster 3Cluster 2
Pin creates Repins Pin creates RepinsPin creates Repins
![Page 32: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/32.jpg)
Iteration 4Slow growth
Description Similar to steady growth, but not as fast
Calculation
def detect_steady_growth(domain_engagement): (growth_rate, intercept) = np.podef detect_steady_growth(domain_engagement): (growth_rate, intercept) = np.polyfit(range(len(domain_engagement.monthly_repins)), domain_engagement.monthly_repins,1) return months_pins_created >= X and growth_rate >= Ylyfit(range(len(domain_engagement.monthly_repins)), domain_engagement.monthly_repins,1) return months_pins_created >= X and growth_rate >= Y
Examples Little lower quality recipe and DIY sites
Few
Slow growthRepins
Many
0 0
Pin creates
![Page 33: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/33.jpg)
Iteration 5Churning
Description Slowly fade through the year
Calculation
def detect_churning(domain_engagement): (repin_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_repins[2:], 1) (pin_create_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_pin_creates[2:], 1) return repin_growth < 0 and pin_create_growth < 0
Examples Fashion sale and click bait sites
Few
ChurningRepins
Many
0 0
Pin creates
![Page 34: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/34.jpg)
Iteration 6Yearly
Description Slowly fade through the year
Calculation
def detect_churning(domain_engagement): (repin_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_repins[2:], 1) (pin_create_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), domain_engagement.monthly_pin_creates[2:], 1) return repin_growth < 0 and pin_create_growth < 0
Examples Seasonal fashion, such as snow boots
Few
YearlyPin creates Repins
Many
0 0
![Page 35: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/35.jpg)
Iteration 7Late bloomer
Description Peak mid year
Calculation
def detect_late_bloomer(domain_engagement): (concavity, pin_growth, intercept) = np.polyfit( range(len(domain_engagement.monthly_repins) - 2), [r + p for (r, p) in zip(domain_engagement.monthly_repins[2:], domain_engagement.monthly_pin_creates[2:])], 2) return concavity < 0
Examples Blogs that get off to a slow start
Few
Pinterest late bloomerPin creates Repins
Many
0 0
![Page 36: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/36.jpg)
Clusters• Dark content • Pinterest specials • Steady growth • Slow growth • Churning • Yearly • Late bloomer
![Page 37: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/37.jpg)
Explore Pinterest’s content Question our understanding Inspire the future
Agenda
1
2
3
Design system
![Page 38: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/38.jpg)
Does asking twice yield the same answer?Should we cluster again?
2:50 PM 100%
![Page 39: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/39.jpg)
Cost of replicating analysis is leaving other business opportunities on the table
2:50 PM 100%Data science is expensive
![Page 40: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/40.jpg)
Unknown
2:50 PM 100%Would it make a difference?
![Page 41: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/41.jpg)
Replication Crisis in Psychology
Silberzahn & Ahlmann; Crowdsourced research: Many hands make tight work
Nature August 2015
![Page 42: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/42.jpg)
Crowd sourced study on red cards in soccer
Silberzahn & Ahlmann; Crowdsourced research: Many hands make tight work
Nature October 2015
![Page 43: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/43.jpg)
The New York Times on predicting the presidencySeptember, 2016
Cohn; We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results.
![Page 44: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/44.jpg)
… but we’ve lowered the cost!
2:50 PM 100%Data science is expensive
![Page 45: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/45.jpg)
… 9 data scientists and machine learning engineers. Same data, same UI, same day. Everyone finished in ~1 hour.
…so we did it again
![Page 46: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/46.jpg)
Models a real world situation with limited resources
9 is huge!
![Page 47: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/47.jpg)
were the results the same?
Everything was the same
![Page 48: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/48.jpg)
Baseline clusters Results e Results l Results d Results m Results z Results b Results k
Dark content
Pinterest specials
Steady growth
Slow growth
Churning
Yearly
Late bloomer
Existing clusters as our baseline
![Page 49: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/49.jpg)
Baseline clusters Results e Results l Results d Results m Results z Results b Results k
Dark content Unpopular (95%) Trailing (90%)
Pinterest specials Trailing (100%) Viral on Pinterest (98%)
Pin creates drop off (97%)
Steady growth Increasing repins (94%)
Continuous growth (94%)
Slow growth
Churning
Yearly
Late bloomer
90% Matches
![Page 50: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/50.jpg)
Baseline clusters Results e Results l Results d Results m Results z Results b Results k
Dark content Unpopular (95%) Trailing (90%) Original pinny (84%)
Pinterest specials Trailing (100%) Minimal original Pins (66%)
Viral on Pinterest (98%)
Pin creates drop off (97%)
Steady growth Pinterest viral content (62%) Other (53%) Original Pinny
(51%)Viral on the internet (69%)
Increasing repins (94%)
Continuous growth (94%)
Suspected Save button high Pin creates (73%)
Slow growth Pinterest viral content (55%)
Original Pinny (82%)
Viral on the internet (65%)
Increasing repins (65%)
Continuous growth (86%)
Suspected Save button high Pin creates (51%)
Churning Original Pinny (68%)
Viral on the internet (53%)
Yearly Original Pinny (71%)
Late bloomer Original Pinny (71%)
Continuous growth (55%)
Suspected Save button high Pin creates (59%)
50% Matches
![Page 51: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/51.jpg)
Baseline Clusters Results e Results l Results d Results m Results z Results b Results k
Dark content Unpopular (95%) Trailing (90%) Original pinny (84%)
Pinterest specials Trailing (100%) Minimal original Pins (66%)
Viral on Pinterest (98%)
Pin creates drop off (97%)
Steady growth Pinterest viral content (62%) Other (53%) Original Pinny
(51%)Viral on the internet (69%)
Increasing repins (94%)
Continuous growth (94%)
Suspected Save button high Pin creates (73%)
Slow growth Pinterest viral content (55%)
Original Pinny (82%)
Viral on the internet (65%)
Increasing repins (65%)
Continuous growth (86%)
Suspected Save button high Pin creates (51%)
Churning Original Pinny (68%)
Viral on the internet (53%)
Yearly Original Pinny (71%)
Late bloomer Original Pinny (71%)
Continuous growth (55%)
Suspected Save button high Pin creates (59%)
50% Matches
![Page 52: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/52.jpg)
Baseline clusters Results e Results l Results d Results m Results z Results b Results k
Dark content Unpopular (95%) Trailing (90%) Original pinny (84%)
Pinterest specials Trailing (100%) Minimal original Pins (66%)
Viral on Pinterest (98%)
Pin creates drop off (97%)
Steady growth Pinterest viral content (62%) Other (53%) Original Pinny
(51%)Viral on the internet (69%)
Increasing repins (94%)
Continuous growth (94%)
Suspected Save button high Pin creates (73%)
Slow growth Pinterest viral content (55%)
Original Pinny (82%)
Viral on the internet (65%)
Increasing repins (65%)
Continuous growth (86%)
Suspected Save button high Pin creates (51%)
Churning Original Pinny (68%)
Viral on the internet (53%)
Yearly Original Pinny (71%)
Late bloomer Original Pinny (71%)
Continuous growth (55%)
Suspected Save button high Pin creates (59%)
50% Matches
![Page 53: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/53.jpg)
Baseline clusters Results e Results l Results d Results m Results z Results b Results k
Dark content Unpopular (95%) Trailing (90%) Original pinny (84%)
Pinterest specials Trailing (100%) Minimal original Pins (66%)
Viral on Pinterest (98%)
Pin creates drop off (97%)
Steady growth Pinterest viral content (62%) Other (53%) Original Pinny
(51%)Viral on the internet (69%)
Increasing repins (94%)
Continuous growth (94%)
Suspected Save button high Pin creates (73%)
Slow growth Pinterest viral content (55%)
Original Pinny (82%)
Viral on the internet (65%)
Increasing repins (65%)
Continuous growth (86%)
Suspected Save button high Pin creates (51%)
Churning Original Pinny (68%)
Viral on the internet (53%)
Yearly Original Pinny (71%)
Late bloomer Original Pinny (71%)
Continuous growth (55%)
Suspected Save button high Pin creates (59%)
50% Matches
![Page 54: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/54.jpg)
Baseline clusters Results e Results l Results d Results m Results z Results b Results k
Yearly Seasonal Throwback Seasonal Annual
Steady growth Gaining popularity Increasing repins Continuous
growth High engagement
Pinterest specials Initial flurry Minimal original Pins Viral on Pinterest Pin create drop
offUnpopular domains with good content
Conceptually similar clustersBut not related in implementation
![Page 55: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/55.jpg)
…Good vs. bad
Differences in perspective
Two roots of variations
![Page 56: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/56.jpg)
Signs of suboptimal clustering
• Leading with biases • Cherry-picking: responding
to a limited subset of the data
Few
SeasonalPin creates Repins
Few
0 0
![Page 57: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/57.jpg)
Differences of perspective• Results m - Viral growth centric
• Viral on Pinterest • Viral on the internet • Lame
• Results d - Original content centric • Persistent original Pins • Minimal original Pins • Original Pinny
• Results l - Return on investment centric • Underserved • Draught • Trailing
![Page 58: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/58.jpg)
Impact implications
9 data scientists 9 answers• Products depending on cluster used
• Viral mechanisms • Speeding Pin demotion • Promoting underserved Pins
• For same product, domains impacted differ for • Seasonality • Steady growth • Pinterest specials
![Page 59: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/59.jpg)
Bottom lineIt matters which data scientist does an analysis
![Page 60: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/60.jpg)
Explore Pinterest’s content Question our understanding Inspire the future
Agenda
1
2
3
Design system
![Page 61: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/61.jpg)
Let’s ask the hard question and brave the answer together
When is data science a house of cards?
![Page 62: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/62.jpg)
Avalanche of ResourcesMeasuring data science impact• Experimental systems are now standard • Data scientists are more available • Reproducible analysis • [Now] Fast replicable analysis
![Page 63: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/63.jpg)
Utilize ResourcesExperiment• Record end to end from analysis to impact • Innovate on processes • Borrow ideas on replication from science • Tailor our techniques for replication
![Page 64: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/64.jpg)
Concrete experimentsBreak down the problem and build up• Narrow Difference in Perception
through Priming analysts • Develop a rubric of excellence • Train analysts on generated data • Add process stabilizers
![Page 65: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/65.jpg)
Pinterestis interested
pin.it/Data
Reach out!
Dr June Andrews [email protected] / DrAndrews/ DrJuneAndrews
![Page 66: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/66.jpg)
Let’s data science, data science!Let’s crack the code to systematic innovation
![Page 68: Replication in Data Science - A Dance Between Data Science & Machine Learning Strata 2016](https://reader031.fdocuments.net/reader031/viewer/2022022412/58f9a90d760da3da068b6a68/html5/thumbnails/68.jpg)