Private Communications Issues in the Social Web era
description
Transcript of Private Communications Issues in the Social Web era
Private Communications Issues in the Social Web era
CS315 – Web Search and Data Mining
The AOL search data release
2
By the way, …
… search companies log your searches …
3
Privacy concerns
Data is often collected silently Web allows large quantities of data
to be collected inexpensively and unobtrusively
Data from multiple sources may be merged Non-identifiable information can become identifiable when merged
Data collected for business purposes may be used in civil and criminal
proceedingsUsers given no meaningful choice
Few sites offer alternatives
4
Browser Chatter
Browsers chatter about IP address, domain
name, organization, Referring page Platform: O/S, browser What information is
requested URLs and search terms
Cookies
To anyone who might be listening End servers System administrators Internet Service
Providers Other third parties
Advertising networks Anyone who might
subpoena log files later
6
Cookies 101
Cookies can be useful Used like a staple
to attach multiple parts of a form together Used to identify you when you return to a web site
so you don’t have to remember a password Used to help web sites understand how people use them
Cookies can do unexpected things Used to profile users and track their activities, especially across web sites
7
How cookies work – the basics
A cookie stores a small string of charactersA web site asks your browser to “set” a cookieWhenever you return to that site your browser sends the cookie back automatically
8
browsersite
Please store cookie xyzzy
First visit to site
browsersite
Here is cookie xyzzy
Later visits
How cookies work – advanced
Cookies are only sent back to the “site” that set them – but this may be any host in domain
Sites setting cookies indicate path, domain, and expiration for cookies
Cookies can store user info or a database key that is used to look up user info
9
DatabaseUsers …Email …Visits …
Send me with any request to x.com
until 2008
Send me with requests for
index.html on y.x.com for this session
only
Visits=13 User=4576904309
Cookie terminology
Cookie Replay – sending a cookie back to a siteSession cookie – cookie replayed only during current browsing sessionPersistent cookie – cookie replayed until expiration dateFirst-party cookie – cookie associated with the site the user requestedThird-party cookie – cookie associated with an image, ad, frame, or other content from a site with a different domain name that is embedded in the site the user requested
Browser interprets third-party cookie based on domain name, even if both domains are owned by the same company
10
“Web bugs”
Invisible “images” (1-by-1 pixels, transparent) embedded in web pages and cause referrer info and cookies to be transferred
Also called web beacons, clear gifs, tracker gifs,etc.
Work just like banner ads from ad networks, but you can’t see them unless you look at the page source
Also embedded in HTML formatted email messages, MS Word documents, etc.
11
Ad networks
12
Ad companycan get your name & address from CD order and link them to your search
(This is NOT how Google Ads work)
Random Ad Medical Ad
search for medical information
set cookie
buy CD
replay cookie
Search Service CD Store
What ad networks may know…
Personal data: Email address Full name Mailing address
(street, city, state, and Zip code)
Phone number
Transactional data: Details of plane trips Search phrases used at
search engines Health conditions
14
“It was not necessary for me to click on the banner ads for information to be sent to DoubleClick servers.”
– Richard M. Smith
Offline data goes online…
15
•The Stop and Shop grocery store began posting purchase information for customers who had frequent shopper cards
•The Cranor family ’s 25 most frequentgrocery purchases (sorted by nutritional value)!
Spyware
Spyware: Software that employs a user's Internet connection, without their knowledge or explicit permission, to collect info
Most products use pseudonymous, but unique ID
Over 800 known freeware and shareware products contain Spyware, for example:
Beeline Search Utility GoZilla Download Manager Comet Cursor
Often difficult to uninstall!Anti-Spyware Sites:
http://grc.com/oo/spyware.htm http://www.adcop.org/smallfish http://www.spychecker.com http://cexx.org/adware.htm
16
Devices that monitor you
18
Creative Labs Nomad JukeBoxMusic transfer software reportsall uploads to Creative Labs.
http://www.nomadworld.com
SportbrainMonitors daily workout. Customphone cradle uploads data to company Web site for analysis.
http://www.sportbrain.com/
Sony eMarkerLets you figure out the artitst and title of songs you hear on the radio. And keeps a personal log of all the music you like on the emarker Web site.
http://www.emarker.com
:CueCatKeeps personal log of advertisements
you‘re interested in.
http://www.crq.com/cuecat.html
See http://www.privacyfoundation.org/
HTTP request that sets a Cookie
Web Site which uses Cookies
GET /models/model_overview.asp?ModelName=S2000 HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*
Referer: http://www.google.com/search?hl=en&q=s2000
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Host: automobiles.honda.com
Proxy-Connection: Keep-Alive
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Wed, 05 Oct 2005 19:51:07 GMT
X-Powered-By: ASP.NET
P3P: policyref="http://www.honda.com/w3c/privacy.xml ", CP="IDC DSP COR ADMi DEVi TAIa PSAi PSDi IVAi CONi OUR SAMi IND PHY ONL COM NAV STA"
pragma: no-cache
cache-control: no-store
Content-Length: 1435
Content-Type: text/html
Expires: Sat, 18 Jan 1997 17:36:16 GMT
Set-Cookie: bhCookieSaveSess=1; path=/
Set-Cookie: bhCookieSess=1; path=/
Set-Cookie: bhCookiePerm=1; expires=Fri, 07-Oct-2005 19:51:06 GMT; path=/
Set-Cookie: BrowserInfo=VBScript=True&BrowserOS=Win&Crawler=False&BrowserVer=6&BrowserName=IE; path=/
Cache-control: private
Web Page Request Header
Web Page Response Header
HTTP request after the cookie is set
Web Site which uses Cookies
GET /models/model_overview.asp?ModelName=S2000&bhcp=1 HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Host: automobiles.honda.com
Proxy-Connection: Keep-Alive
Cookie: bhCookieSaveSess=1; bhCookieSess=1; bhCookiePerm=1; BrowserInfo=VBScript=True&BrowserOS=Win&Crawler=False&BrowserVer=6&BrowserName=IE; bhResults=bhjs=1; bhPrevResults=bhjs=1
Request Header after cookie is set