Netflix Prize and Heritage Health Prize Philip Chan.
-
Upload
gordon-ellis -
Category
Documents
-
view
241 -
download
1
Transcript of Netflix Prize and Heritage Health Prize Philip Chan.
Cash Prizes to Stimulate Research
Ansari X Prize for Private Spaceflight (2004) [$10M] 100 km above earth twice within 2 weeks
DAPRA Grand Challenge (2005) [$2M] autonomous vehicle: 131 miles in 10 hours
Archon X Prize for Genomics (2006) [$10M] map 100 human genomes in 10 days
Cash Prizes to Stimulate Research
Netflix Prize (2006) [$1M] Recommend movies with 10% improvement
Heritage Health Prize (2011) [$3M] Days in hospital next year with 0.4 error
Netflix Prize
Task Given customer ratings on some movies Predict customer ratings on other movies
If John rates “Mission Impossible” a 5 “Over the Hedge” a 3, and “Back to the Future” a 4, how would he rate “Harry Porter”, … ?
Performance Error rate (accuracy)
Cash Award
Grand Prize $1M 10% improvement by 2011 (in 5 years)
Progress Prize $50K per year 1% improvement
Intellectual Property
Netflix has a non-exclusive license to the algorithm
Authors tell the world what the algorithm is
Leader Board
Started on Oct 2, 2006 Improvement by the top algorithm
after a week: ~0.9% after two weeks: ~4.5% after a month: ~5% after a year: ~8.4% after two years: ~9.4% July 26, 2009 (less than 3 years): 10%
Winner
BellKor’s Pragmatic Chaos 7 members Merger of 3 teams
BellKor AT&T Labs, USA & Yahoo! Research, Israel
PragmaticTheory telecommunications, Canada
BigChaos started a company, Austria
A combination of different algorithms
Runner-up
The Ensemble ~30 members “last-minute” merger
teams had 30 days to beat the first team that crossed the 10% threshold
same accuracy behind by 20 minutes!
Heritage Provider Network
Has a network of doctors in California
Can we identify earlier those most at risk and ensure they get the treatment they need?
Can we reduce unnecessary hospitalizations?
Heritage Health Prize
Launch http://www.youtube.com/watch?v=GuZ8nkpygAs
Given patient data Predict how many days a patient will spend in
a hospital in the next year
The prediction helps develop strategies to reduce emergencies and hence hospitalizations
Grand Prize
$3M At most 0.4 in error (~0.5 day) By Apr 4, 2013 [2 years]
$500K Consolation Prize not below 0.4 error
Milestone Prizes
top 2 performers at each milestone
Aug 31, 2011 $30K, $20K
Feb 13, 2012 $50K, $30K http://www.youtube.com/watch?v=pkmkNnGyihY
Sep 4, 2012 $60K, $40K
Performance of Algorithms
Prediction Error Rate (RMSLE)
where real = log ( actual # of days + 1 ) prediction = log ( predicted # of days + 1 )
Prediction error threshold = 0.4 (~0.5 day)
n
predictionrealn
iii 2)(
Intellectual Property
Exclusive license to Sponsor and participant’s own use
Algorithms not previously published
Use of data sets is for the competition only written consent for other purposes
Data Sets
Training and validation data sets For participants to design algorithms
Feedback data set For calculating standings on Leaderboard
Scoring data set For determining winners for prizes
http://www.heritagehealthprize.com/c/hhp/Data
Data (in CSV format)
Members Data (113K members) Claims Data (2.7M claims) Drug Count Data (818K prescriptions) Lab Count Data (361K labs) Outcome Data (76K in Y2, 71K in Y3) Target (71K in Y4 for prediction)
Total ~264 MB (including other files)
Claims Data
MemberID ProviderID Vendor ID PCP (Primary care physician) ID Year Specialty (of physician/vendor?) PlaceSvc (place of service)
office, outpatient hospital, inpatient hospital, … PayDelay (between service and payment)
Claims Data [continued]
LengthOfStay (in hospital) DSFS (days since first claim) PrimaryConditionGroup (diagnostic
categories) CharlsonIndex (affect of diseases on illness) ProcedureGroup (intervention categories) SupLOS (supplement to LengthOfStay)
1 if LenghtOfStay is NULL because of de-identificaiton
Lab Count Data
Member Id Year DSFS (Days since first service) LabCount (unique lab or pathology tests)
Outcome Data
MemberID DaysInHospital_Y2 (claims in Y1)
ie, Predict Y2 based on Y1 DaysInHospital_Y3 (claims in Y2) ClaimedTruncated
1 if members with “truncated” claims
Using Other Data?
Yes Freely available to anyone (public source) URL needs to be published to the forum
Except for demographic, socioeconomic or clinical
information about the members
Naive Algorithms
For predicting the number of Days in Hospital in the next year
Posted as “benchmarks” on the Leaderboard
Always Predict 15 (max)
Everyone goes to the hospital for at least 15 days
RMSLE = 2.628062 550+% over threshold
Leader Board
Competition started on Apr 4, 2011 with partial data
All data were released on June 4, 2011
Sep 9, 2011
Leader Board
Competition started on Apr 4, 2011 with partial data
All data were released on June 4, 2011
Sep 9, 2011 RMSLE: 0.456384 ~14.1% over threshold
Aug 29, 2012 RMSLE: 0.450426 ~12.6% over threshold
Teams
Form your own teams www.heritagehealthprize.com
Join my team CSE 4403 Independent Study CSE 5801 Independent Research