Surviving the AB Testing Hype Cycle - Reaktor Breakpoint 2015
-
Upload
craig-sullivan -
Category
Internet
-
view
2.022 -
download
2
Transcript of Surviving the AB Testing Hype Cycle - Reaktor Breakpoint 2015
Surviving the Hype CycleShortcuts to Split Testing Success
@OptimiseOrDie, Yo Helsinki!
@OptimiseOrDie• Split Testing, Analytics, UX• Lean, Agile, Growth Optimisation• 100M+ visitors tested• 200+ sites, 26 languages
• Starting Testing? Bad ROI? • Want to scale volume & quality?• Get in touch!
@OptimiseOrDie
colinski.tumblr.com
@OptimiseOrDie
The Gartner Hype Cycle ™
1 Tool Installed
2 Stupid testing
3
4Peak of Stupidity
5 ROI questioned
6 Statistics debunked
7 Faith crisis
8The Trough of Testing
Slide of
moodiness
------>
Scaled upStupidity
Slop
e of
St
upid
ity
------
>
9 Where, How, Why
10 Data science
11 Testing to learn
12
Innovation Testing
Hacking
Business
Futures
------>
@OptimiseOrDie
#fail
@OptimiseOrDie
@OptimiseOrDie
26.6M
@OptimiseOrDie
28.4M
Oppan Gangnam Style!
@OptimiseOrDie
6.9M
You been naughty again?
1. Get Analytics Health Checked
2. Test in the right place3. Understand Cross
Device4. Do your Research5. Prioritise your testing
@OptimiseOrDie
6. Perform Pre Flight Checks
7. Know how long to test8. Have a good reason to
test9. Learn from your tests10. Burn down the silos
10 Shortcuts to Testing Success
@OptimiseOrDie
1. Your Analytics Setup is Broken
• Nearly 100 Sites in 3 years• 95% were broken, often
badly• Trust in data was missing• Management made bad
calls• Nobody checked the tills• Calibrate from the basics
up!@OptimiseOrD
ie
• What sales do we capture?• What categories?• What about refunds, lunch
money, gift certificates?• How do we monitor fraud?• Do we check it adds up?• Where does this data go?
1. What about MY clients?
@OptimiseOrDie
1. Bulls***t flows upwards!Cool BS
Dashboard
BS metrics
BS Collection
BS metrics
BS Collection
BS metrics
BS Collection
BS metrics
BS Collection
BS reports BS Reports
TILLS
DEPT
STORE
DIVISION
@OptimiseOrDie
• Review takes 1-3 days• Prioritise the issues• Fix directly with developers• Integrate with the Testing Tool1
Get an Analytics Health Check
@OptimiseOrDie
2. You Test in the Wrong Place
2. Let’s do Random Testing
Let’s try the
homepage
I’ve got targets to
hit!
I hate this job
Let’s test button
colours!
Has lots of opinions but no
data
Spends too much time on
Driven by Ego and Competitors
Wishes he cared about testing
@OptimiseOrDie
“STOP copying your competitors
They may not know what the f*** they are doing either” Peep Laja, ConversionXL
1. Let’s do Random Testing
Best Practice Testing?• Your customers are not the same• Your site is not the same• Your advertising and traffic are not the
same• Your UX is not the same• Your X-Device Mix is not the same• You have no idea of the data
• Use them to inform or suggest approaches• Use them for ideas• Do not use them as a playbook• It will make you very unhappy
@OptimiseOrDie
@OptimiseOrDie
2. Modelling - Intent
All traffic
HearingSight
StoreOther
Step 1
Step 2
Step 3Goal
Page Page
Page
Page
Hearing
@OptimiseOrDie
2. Modelling – Multiple Endings
All traffic
InfluenceIntent
Influence
Step 1
Step 2
Step 3Goal
Page Page
Page
Page
Entry Page
1234
1234
1234
Bounce
Search or
CategoryProduct
PageAdd to basket
View basketCheck
out
@OptimiseOrDie
2. Modelling – Ring Model
2. Modelling – Horizontal Funnels
@OptimiseOrDie
• Do some Analytics modelling• Understand the shedding of
layers• Narrow your focus and scope• Bank better gains earlier in time
2Test in the Right Places
@OptimiseOrDie
3. Responsive solves everything, right?
@OptimiseOrDie
@OptimiseOrDie
Vs.
@OptimiseOrDie
@OptimiseOrDie
@OptimiseOrDie
@OptimiseOrDie
@OptimiseOrDie
1. Motorola Hardware Menu Button2. MS Word Bullet Button3. Android Holo Composition Icon4. Android Context Action Bar Overflow (top right on Android
devices)
@OptimiseOrDie
Increase in revenue of > $200,000 per annum!
bit.ly/hamburgertest
Mystery Meats of Mobile
BURGER SHISH DONER
@OptimiseOrDie
• Do you really know your mix?• Most people undercount
Android!• What iPhone models visit?• How big is tablet traffic?• What screen sizes do they have?
• Find out BEFORE you design tests
• Check BEFORE you launch tests
• Use Google Analytics to find out• 3 reports to rule them all
https://www.google.com/analytics/web/template?uid=lpVf8LveSqyd3mdsHjdfzQhttps://www.google.com/analytics/web/template?uid=fmUzp_gzRIy7LnvZJjCDOQhttps://www.google.com/analytics/web/template?uid=y7sYIXDhQrmswHAiNo8iLA
3. Our customers use iPhones, right?
@OptimiseOrDie
3. What iPhone Models do we see?
Screen Resolution320 x 480 = iPhone 4/4S 320 x 568 = iPhone 5/5S 375 x 667 = iPhone 6 414 x 736 = iPhone 6+
https://www.google.com/analytics/web/template?uid=lpVf8LveSqyd3mdsHjdfzQhttps://www.google.com/analytics/web/template?uid=fmUzp_gzRIy7LnvZJjCDOQhttps://www.google.com/analytics/web/template?uid=y7sYIXDhQrmswHAiNo8iLA
@OptimiseOrDie
• Desktop Browsers & versions• Tablet Models• Mobile Device Models• Screen Resolutions3
Figure Out the Device Mix for Testing
Is there anything holding you from doing conversion research?
1. Time
2. Client/Company Buy-In
3. Budget
4. Don’t know where to start
4. You don’t do any Research before testing?
40
@OptimiseOrDie
4. If you have 4 hoursPLUS
• Snap interviews (Sales, Customer Services, Tech Support)
• Run a quick poll or survey (See my tools slides)
Less Bullshit!
@OptimiseOrDie
4. 1 Hour Page Analytics
Influence PagesEntry Points
Landing PagesDevice Mix
Customer MixTraffic Mix
FlowIntent
Marketing -> Site flow
Page or Process
Next StepsAbandonment
ExitsMix of
abandonment Flow
@OptimiseOrDie
4. 1 Hour Landing Page Analytics• How old are the visitors?
https://www.google.com/analytics/web/template?uid=hab8Ta93SCCffUpjefjtNQ
• What are the key metrics like (e.g. bounce rate, conversion)?https://www.google.com/analytics/web/template?uid=hab8Ta93SCCffUpjefjtNQ
• What is the goal or ecommerce conversion through this page?https://www.google.com/analytics/web/template?uid=hab8Ta93SCCffUpjefjtNQ
• What channel traffic comes to the page?https://www.google.com/analytics/web/template?uid=Kjb9q8M4QN-fsPe8dOGaig
• What is the mix of tablet / mobile / desktop to the page?https://www.google.com/analytics/web/template?uid=wLMUWs8eTIa3_mmQHOtPkw
• What are the resolutions of devices?https://www.google.com/analytics/web/template?uid=wLMUWs8eTIa3_mmQHOtPkw
• How slow are the landing pages? https://www.google.com/analytics/web/template?uid=AavFsgMoRkucYYKnxlB76Q
• What are the pages right after the landing page? (Use a landing page report and choose the ‘Entrance Paths’ to show next pages.)
• What is the flow like from this page? (Use the Behaviour Flow Report)
• What does it look like on the top devices? (Use real devices + Appthwack.com, Crossbrowsertesting.com or Deviceanywhere.com)
AdSERP
BannerEmail
CampaignAff
Referrer
Landing Page
TemplateGoal
ReachedInteraction
Step or Layer
PPCOrganicDisplayEmailSocial
Desktop - Mobile - Tablet
@OptimiseOrDie
4. If you have 2 hours
• Form Analytics data • Scroll or Click Maps• Session Recording Videos (Hotjar, Decibel Insight,
Yandex)• Make a horizontal funnel from the landing page
• Check the:– Marketing Creatives / SERP fully– Look at Landing page ZERO!
@OptimiseOrDie
4. If you have 4 hours• Set up a poll or survey (See my tools slides)• Set up an exit (bail) survey• Friends, Family, New Employee user testing• Guerrilla user testing• Snap Interviews – 5-10 minutes:• Customer Services, Sales team (if applicable) then
Customers • 5 Second Test• Article is here : bit.ly/conversiondeadline
@OptimiseOrDie
• Lean Analytics• UX Research• Interviewing• Surveys and Polls4
No Excuses – Do Your Research
@OptimiseOrDie
5. You don’t Prioritise your TestsScoring can be cost, time to market, resource, risk, political complexity
• Cost 1-10 Higher is cheaper• Time 1-10 Higher is shorter• Opportunity 1-10 Higher is greater
Score = Cost * Time * Opportunity
• For financial institutions, risk should be a factor• Want to build your own? – ask me!
@OptimiseOrDie
5. Opportunity vs. Cost
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Cheap (high is better)
Opportunity
MONEY!
@OptimiseOrDie
5. Make a Money Model
Test Description Metric 2% lift 5% lift 10% lift EstimateProduct page Simplification Basket adds 200,000 500,000 1,000,000 500,000
Register new Improve onboarding New register funnel ratio 25,000 62,500 125,000 250,000
IE8 bugs in cart Fix key broken stuff IE8 Conversion 80,000 200,000 400,000 200,000
Category list page Get product higher User Category -> Product 500,000 1,250,000 2,500,000 1,250,000
Payment Page New card handling User Payment -> Thank you 60,000 150,000 300,000 300,000
@OptimiseOrDie
• Score all Test Targets• Use Cost vs. Opportunity
minimum• Check it works!• Make a Money Model
5Prioritise your Testing Targets
@OptimiseOrDie
6. You Don’t Test Before Launch• Dirty secret of AB testing?• People break their tests all the time!• Most people don’t notice
Why?
• Because developers can break them very easily• What if your AB test was broken on iPhones?• If you didn’t know, would your results be valid?
• About 40% of my tests fail basic QA
Browser Checkswww.crossbrowsertesting.com
www.browserstack.comwww.spoon.netwww.saucelabs.com
Mobile & Tabletwww.appthwack.com
www.deviceanywhere.comwww.opendevicelab.com
Article & Info bit.ly/devicetesting
6. Here is my £80M testing rig!
@OptimiseOrDie
6. Always Use Real Devices
@OptimiseOrDie
• Check Every Test Works• Browser and Devices• Check Analytics records
correctly6Perform Pre Flight Checks
@OptimiseOrDie
7. You Stop When you Hit 95% Confidence
The 95% Stopping Problem
@OptimiseOrDie
• Many people use 95, 99% ‘confidence’ to stop
• This value is unreliable and moves around• Nearly all my tests reach significance
before they are actually ready• You can hit 95% early in a test (18
minutes!)• If you stop, it could be a false result• Read this Nature article : bit.ly/1dwk0if• Optimizely and VWO have updated their
tools• This 95% thingy – must be LAST on your
stop list
The 95% Stopping Problem
Scenario 1 Scenario 2 Scenario 3 Scenario 4After 200 observations Insignificant Insignificant Significant! Significant!
After 500 observations Insignificant Significant! Insignificant Significant!
End of experiment Insignificant Significant! Insignificant Significant!
“You should know that stopping a test once it’s significant is deadly sin number 1 in A/B testing land. 77% of A/A tests (testing the same thing as A and B) will reach significance at a certain point.”Ton Wesseling, Online Dialogue
The 95% Stopping Problem
@OptimiseOrDie
“Statistical Significance does not equal Validity”http://bit.ly/1wMfmY2
“Why every Internet Marketer should be a Statistician”http://bit.ly/1wMfs1G
“Understanding the Cycles in your site”http://mklnd.com/1pGSOUP
• TWO BUSINESS CYCLES minimum (week/month)• 1 PURCHASE CYCLE minimum (or most of one)• 250 CONVERSIONS minimum per creative• 350, 500, more if creative response is similar• FULL WEEKS/CYCLES never part of one
• KNOW what marketing, competitors and cycles are doing• RUN a test length calculator - bit.ly/XqCxuu• SET your test run time , RUN IT, STOP IT, ANALYSE IT
• ONLY RUN LONGER if you need more data• DON’T RUN LONGER just because the test isn’t giving the
result you want!@OptimiseOrD
ie
7. Know How Long to Test for…
@OptimiseOrDie
• Most critical mistake• Use a test calculator• Full business cycles, 2 minimum• Don’t waste time hoping7
Know How Long to Test for
Insight - Inputs
#FAIL
Competitor copying
GuessingDice rolling
An article the CEO
read
Competitor change
Panic
Ego
OpinionCherished
notions Marketing whims Cosmic rays
Not ‘on brand’ enough
IT inflexibility
Internal company
needs
Some dumbass
consultant
Shiny feature
blindnessKnee jerk reactons
@OptimiseOrDie
8. So you think you have a Hypothesis?
Insight - Inputs
Insight
Segmentation
SurveysSales and
Call Centre
Session Replay
Social analytics
Customer contact
Eye tracking
Usability testing
Forms analytics Search
analytics Voice of Customer
Market research
A/B and MVT testing
Big & unstructured
data
Web analytics
Competitor evalsCustomer
services
@OptimiseOrDie
8. So you think you have a Hypothesis?
@OptimiseOrDie
1. Because we saw (data/feedback)2. We expect that (change) will
cause (impact)3. We’ll measure this using (data metric)
bit.ly/hyp_kit
8. Use this to deflect stupid testing!
@OptimiseOrDie
1. Because we saw (an angry email from the CEO)
2. We expect that (changing button colours) will cause (the office to cool down for a day)
3. We’ll measure this using (some metric we pluck out of the air – whatever, man) bit.ly/hyp_kit
8. Let’s try a real one
@OptimiseOrDie
• Don’t do Ego driven testing• Use the Hypothesis Kit!8
Get a Proper Hypothesis Going
bit.ly/hyp_kit
@OptimiseOrDie
9. Our Testing teaches us Nothing!
• Either your research or hypothesis is weak• Work back from the outcome!
What if A won – what would that tell us?What if A failed – what would that tell us?
• What is the value to the business in finding out the answer?
• Is the finding actionable widely and deeply?• Testing isn’t about lifts – it’s about learning
@OptimiseOrDie
9. Our Testing teaches us Nothing!
“You are trying to run a bundle of tests, whose expected additional information will give you the highest return.”Matt Gershoff, CEO, Conductrics.com
@OptimiseOrDie
• Do your research• Form a solid hypothesis• Work back from the outcomes• Learning useful stuff = huge lifts9
Design Tests for Maximum Learning
@OptimiseOrDie
10. Burn Down the Silos• Non agile, non iterative design• Silos work on product
separately• No ‘One Team’ per
product/theme • Large teams, unwieldy
coordination• Pass the product around• More PMs and BAs than a
conference• Endless sucking signoff• AB testing done the same way!
@OptimiseOrDie
10. FT Example• Small teams (6-15) with direct access to
publish• Ability to set and get metrics data directly• Tools, Autonomy, Lack of interference• No Project Managers or Business Analysts
• Business defines ‘outcomes’ – teams deliver• No long signoff chain• No pesky meddling fools• 18 Month projects over budget?
@OptimiseOrDie
10. FT Example• 100s of releases a day!• MVP approach• Launch as alpha, beta,
pilot, phased rollout• Like getting in a
shower• Read more at
labs.ft.com
@OptimiseOrDie
10. Positive Attributes
• Rapid, Iterative, User Centred & Agile Design. No Silos.
• Small empowered autonomous teams
• Polymaths and Overlap• Toolkit & Analytics investment• Persuasive copywriting &
Psychology• Great Testing & Optimisation
Tools
@OptimiseOrDie
• Agile, Lean, Iterative x silo teams
• Ability to get and set metrics• Autonomy, Control, Velocity• Iterative MVP approach• Work on outcomes, not features
10Burn Down the Silos!
76
“If you think of technology as something that’s spreading like a sort of fractal stain, almost every point on the edge represents an interesting problem.”Paul Graham
77Time
ROI
Rumsfeldian Space• What if we changed our prices?• What if we gave away less for free?• What if we took this away?• What about 3 packages, not 5?• What are these potential futures I can
take?• How can I know before I spend money?
• McDonalds Hipster Test Store
bit.ly/1TiURi7
@OptimiseOrDie
Congratulations!
Today you’re the lucky winner of our random awards programme.
You get all these extra features for free, on
us.
Enjoy!
Innovation Testing
@OptimiseOrDie
@OptimiseOrDie
2004 Headspace
What I thought I knew in 2004
Reality
2015 Headspace
What I KNOW I know
Me, on a good day
WE’RE ALL WINGING IT
Guessaholics Anonymous
1 Tool Installed
2 Stupid testing
3
4Peak of Stupidity
5 ROI questioned
6 Statistics debunked
7 Faith crisis
8The Trough of Testing
Slide of
moodiness
------>
Scaled upStupidity
Slop
e of
St
upid
ity
------
>
9 Where, How, Why
10 Data science
11 Testing to learn
12
Innovation Testing
Hacking
Business
Futures
------>
@OptimiseOrDie
Thank You!
Email me [email protected] bit.ly/reaktorOODLinkedin linkd.in/pvrg14