MtMeasurement t f P ll Id tifi blts of Personally Identifiablets of...

1
M t Measurement Measurement Informatio Informatio Informatio Xi S hi W 1 Xiao Sophia Wang 1 , Xiao Sophia Wang , 1 University of W 1 University of W h // http://www.cs M i i http://www.cs Motivation Motivation Target Browser Target Browser Website Website Eavesdroppers Attackers Ads/trackers Eavesdroppers Attackers Ads/trackers P ll id tifi bl if ti (PII) Personally identifiable information (PII) use Personally identifiable information (PII) use prevalent on the Web but little known about PII prevalent on the Web, but little known about PII handling handling Risk is exposure to network eavesdroppers and Risk is exposure to network eavesdroppers and d bi 3 rd party websites 3 party websites A k i ld id i hf lki i Attacks include identity theft, stalking, spamming Attacks include identity theft, stalking, spamming W d th t 100 US b it t We measured the top 100 US websites to We measured the top 100 US websites to understand the prevalence and mechanisms of understand the prevalence and mechanisms of PII exposures in practice PII exposures in practice M tR lt Measurement Results Measurement Results Wbi Websites Websites % 97% sent P 97% sent P Wh ? L Why? L 35% 35% i 35% sent p i 35% sent p i 26% sent P 26% sent P 97% Wh ? M 54% 97% Why? M 54% 97% 54% 54% store 54% store Why? S Why? S Websites that e pose PII Websites that expose PII Websites that expose PII 1 Send password 2 Send PII to third- 1. Send password 2. Send PII to third- in the clear party websites in the clear party websites htb kt Websites photobucket.com Websites Bit l ehow.com Bit.ly hulu com ehow.com Cnet.com hulu.com Cnet.com C yelp.com Cnn.com pandora com Conduit.com pandora.com h Conduit.com Di yahoo.com Digg.com foxnews com Ehow.com foxnews.com t Ehow.com Ei il etsy.com Ezinearticles.com livejournal.com Facebook.com livejournal.com washingtonpost com Facebook.com F washingtonpost.com Foxnews.com weather.com Espn go com bit ly Espn.go.com H ffi li k di bit.ly Huffingtonpost.com linkedin.com Myspace com ezinearticles.com Myspace.com N i ezinearticles.com typepad com Nytimes.com typepad.com Photobucket com cnet.com Photobucket.com cnet.com nytimes com Pogo.com db nytimes.com Tumblr com adobe.com Tumblr.com i 3 Send PII in cookies Twitter.com 3. Send PII in cookies Washingtonpost com Websites PII Washingtonpost.com iki di Websites PII d Wikipedia.org Espn.go.com Password Wordpress com Hulu com Password Zip code or City Wordpress.com h dl Hulu.com Password d d b Zip code or City 5 other adult sites Intuit.com Credit card number Username or Email Username or Email Measurements are taken in May 2010 Names Measurements are taken in May 2010. t fP ll Id tifi bl ts of Personally Identifiable ts of Personally Identifiable on E pos res on the Web on Exposures on the Web on Exposures on the Web S B 2 B G i 3 D id W h ll 13 Sam Burnett 2 , Ben Greenstein 3 , David Wetherall 13 Sam Burnett , Ben Greenstein , David Wetherall Washington 2 Georgia Tech 3 Intel Labs Seattle Washington, 2 Georgia Tech, 3 Intel Labs Seattle hi d/ h/ ki / i / s.washington.edu/research/networking/privacy/ S &M hdl s.washington.edu/research/networking/privacy/ System & Methodology System & Methodology T T i Target HTTP Testing Target HTTP Testing Website Tool Website HTTPS Tool HTTPS Browser Browser Piigeon Piigeon PII Input PII M h PII Input PII Matches T k PII T t b it i Track PII exposure Test websites via Track PII exposure Test websites via with a Firefox automatic with a Firefox automatic extension - navigation extension - navigation Pii h h i Piigeon through sites Piigeon through sites Ud t d and forms with Understands and forms, with ti ki encryption, cookies, manual entry dd ti ti manual entry and destinations f PII ( t 3d ti ) of PII (e.g., to 3rd parties) of PII f d il PII often and unnecessarily expose PII often and unnecessarily expose PII 3 rd party Ads Trackers Others PII in the clear 3 party Ads Trackers Others PII in the clear Full name 0 4 1 L h d? Full name 0 4 1 Lowers server overhead? Username 5 6 1 Username 5 6 1 passwords in the clear Cit 2 2 2 passwords in the clear City 2 2 2 Zi Cd 9 3 2 PII to 3 rd party sites Zip Code 9 3 2 PII to 3 party sites M i h id ifi ? More convenient than opaque identifiers? ed PII in cookies ed PII in cookies Simplifies client-side access to PII? Simplifies client-side access to PII? F W k Future Work google analytics com Future Work googleanalytics.com doubleclick.net A tool to advise users scorecardresearch com A tool to advise users… scorecardresearch.com d atdmt.com yieldmanager com yieldmanager.com t quantserve.com bluelithium.com bluelithium.com clear request com clearrequest.com adnxs.com 2mdn net 2mdn.net d ti i advertising.com doubleverity.com doubleverity.com edgesuite net edgesuite.net scrapblog.com scrapblog.com specialclick net specialclick.net b toptvbytes.com turn com turn.com id tb k widgetbucks.com doubleverity.com doubleverity.com dmtracker com dmtracker.com revsci.net Y d ld b d d mmismm com Your password could be eavesdropped mmismm.com i lli optintelligence.com P di h yieldmanager net Predicts exposures as users hover l d i yieldmanager.net Predicts exposures as users hover googleadservices.com over Web elements media6degrees.com over Web elements simplyhired com media6degrees.com simplyhired.com Analyzes website code and imwx.com Analyzes website code, and dw com com b tt d hi t dw.com.com li t browser state and history liveperson.net

Transcript of MtMeasurement t f P ll Id tifi blts of Personally Identifiablets of...

Page 1: MtMeasurement t f P ll Id tifi blts of Personally Identifiablets of …research.cs.washington.edu/networking/privacy/doc/osdi10... · 2010. 10. 2. · ppp opt‐intelligence.com Pdi

M tMeasurementMeasurementInformatioInformatioInformatioXi S hi W 1Xiao Sophia Wang1,Xiao Sophia Wang ,

1University of W1University of Wyh //http://www.cs

M i ihttp://www.cs

MotivationMotivation

TargetBrowser Target Browser WebsiteWebsite

Eavesdroppers AttackersAds/trackersEavesdroppers AttackersAds/trackersP ll id tifi bl i f ti (PII)Personally identifiable information (PII) usePersonally identifiable information (PII) use prevalent on the Web but little known about PIIprevalent on the Web, but little known about PII p ,handlinghandlingg

Risk is exposure to network eavesdroppers andRisk is exposure to network eavesdroppers and p ppd b i3rd party websites3 party websitesp

A k i l d id i h f lki iAttacks include identity theft, stalking, spammingAttacks include identity theft, stalking, spamming

W d th t 100 US b it tWe measured the top 100 US websites toWe measured the top 100 US websites to understand the prevalence and mechanisms ofunderstand the prevalence and mechanisms of pPII exposures in practicePII exposures in practice p p

M t R ltMeasurement ResultsMeasurement ResultsW b iWebsitesWebsites

%97% sent P97% sent PWh ? LWhy? L35% y35%

i 35% sent pi 35% sent pi p

26% sent P26% sent P97%

Wh ? M54%97%

Why? M54%97%

W y?54%54% store54% store

Why? SWhy? S

Websites that e pose PIIWebsites that expose PIIWebsites that expose PII1 Send password 2 Send PII to third-1. Send password 2. Send PII to third-

in the clear party websitesin the clear party websites

h t b k tWebsites photobucket.comWebsites

Bit l ehow.comBit.lyhulu comehow.com

Cnet.com hulu.comCnet.com

Cyelp.com

Cnn.comy p

pandora comConduit.com

pandora.comhConduit.com

Diyahoo.com

Digg.com foxnews comy

Ehow.comfoxnews.com

tEhow.com

E i i letsy.com

Ezinearticles.com livejournal.comFacebook.com

livejournal.comwashingtonpost comFacebook.com

Fwashingtonpost.com

Foxnews.com weather.comEspn go com bit lyEspn.go.com

H ffi li k dibit.ly

Huffingtonpost.com linkedin.comMyspace com ezinearticles.comMyspace.com

N i

ezinearticles.comtypepad comNytimes.com typepad.com

Photobucket com cnet.comPhotobucket.com cnet.comnytimes comPogo.com

d bnytimes.com

Tumblr com adobe.comTumblr.com

i 3 Send PII in cookiesTwitter.com 3. Send PII in cookiesWashingtonpost com Websites PIIWashingtonpost.com

iki di

Websites PII

dWikipedia.org Espn.go.com Password

Wordpress com Hulu com Password Zip code or CityWordpress.com

h d l

Hulu.com Password

d d b

Zip code or City

5 other adult sites Intuit.com Credit card number Username or EmailUsername or Email

Measurements are taken in May 2010 NamesMeasurements are taken in May 2010.

t f P ll Id tifi blts of Personally Identifiablets of Personally Identifiable yon E pos res on the Webon Exposures on the Webon Exposures on the Webp

S B 2 B G i 3 D id W h ll13Sam Burnett2, Ben Greenstein3, David Wetherall13Sam Burnett , Ben Greenstein , David WetherallWashington 2Georgia Tech 3Intel Labs SeattleWashington, 2Georgia Tech, 3Intel Labs Seattleg g

hi d / h/ ki / i /s.washington.edu/research/networking/privacy/S & M h d l

s.washington.edu/research/networking/privacy/System & MethodologySystem & Methodologyy gy

TT i TargetHTTPTesting Target HTTPTesting WebsiteTool Website

HTTPSTool

HTTPSBrowserBrowser

PiigeonPiigeong

PII Input PII M hPII Input PII Matchesp atc es

T k PIIT t b it i Track PII exposureTest websites via Track PII exposure Test websites via with a Firefoxautomatic with a Firefox automatic extension -navigation extension -navigation Pii

gh h i Piigeonthrough sites Piigeonthrough sites

U d t dand forms with Understands and forms, with ti ki

,encryption, cookies, manual entry yp , ,

d d ti timanual entry

and destinations y

f PII ( t 3 d ti )of PII (e.g., to 3rd parties)of PII ( g , p )

f d il PIIoften and unnecessarily expose PIIoften and unnecessarily expose PIIp

3rd‐party Ads Trackers OthersPII in the clear 3 ‐party Ads Trackers OthersPII in the clearFull name 0 4 1

L h d?Full name 0 4 1

Lowers server overhead? Username 5 6 1Username 5 6 1passwords in the clear Cit 2 2 2passwords in the clear City 2 2 2p

Zi C d 9 3 2PII to 3rd party sites Zip Code 9 3 2PII to 3 party sites p

M i h id ifi ?More convenient than opaque identifiers?o e co ve e op que de e s?

ed PII in cookiesed PII in cookiesSimplifies client-side access to PII?Simplifies client-side access to PII?

F W kFuture Workgoogle analytics com Future Workgoogle‐analytics.comdoubleclick.net

A tool to advise usersscorecardresearch com A tool to advise users…scorecardresearch.comdatdmt.com

yieldmanager comyieldmanager.comtquantserve.com

bluelithium.combluelithium.comclear request comclear‐request.comadnxs.com2mdn net2mdn.netd ti iadvertising.comg

doubleverity.comdoubleverity.comedgesuite netedgesuite.netscrapblog.comscrapblog.comspecialclick netspecialclick.net

btoptvbytes.comp yturn comturn.comid tb kwidgetbucks.com

doubleverity.comdoubleverity.comdmtracker comdmtracker.comrevsci.net

Y d ld b d dmmismm com Your password could be eavesdroppedmmismm.comi lli

p ppopt‐intelligence.com

P di hp gyieldmanager net Predicts exposures as users hoverl d iyieldmanager.net Predicts exposures as users hover googleadservices.com

over Web elementsmedia6degrees.com over Web elementssimplyhired commedia6degrees.comsimplyhired.com

Analyzes website code andimwx.com Analyzes website code, and dw com com

b t t d hi tdw.com.comli t browser state and historyliveperson.net yp