Private Communications Issues in the Social Web era

20
Private Communications Issues in the Social Web era CS315 – Web Search and Data Mining

description

Private Communications Issues in the Social Web era. CS315 – Web Search and Data Mining. The AOL search data release. By the way, …. … search companies log your searches …. Privacy concerns. Data is often collected silently - PowerPoint PPT Presentation

Transcript of Private Communications Issues in the Social Web era

Page 1: Private Communications  Issues in the Social Web era

Private Communications Issues in the Social Web era

CS315 – Web Search and Data Mining

Page 2: Private Communications  Issues in the Social Web era

The AOL search data release

2

Page 3: Private Communications  Issues in the Social Web era

By the way, …

… search companies log your searches …

3

Page 4: Private Communications  Issues in the Social Web era

Privacy concerns

Data is often collected silently Web allows large quantities of data

to be collected inexpensively and unobtrusively

Data from multiple sources may be merged Non-identifiable information can become identifiable when merged

Data collected for business purposes may be used in civil and criminal

proceedingsUsers given no meaningful choice

Few sites offer alternatives

4

Page 5: Private Communications  Issues in the Social Web era
Page 6: Private Communications  Issues in the Social Web era

Browser Chatter

Browsers chatter about IP address, domain

name, organization, Referring page Platform: O/S, browser What information is

requested URLs and search terms

Cookies

To anyone who might be listening End servers System administrators Internet Service

Providers Other third parties

Advertising networks Anyone who might

subpoena log files later

6

Page 7: Private Communications  Issues in the Social Web era

Cookies 101

Cookies can be useful Used like a staple

to attach multiple parts of a form together Used to identify you when you return to a web site

so you don’t have to remember a password Used to help web sites understand how people use them

Cookies can do unexpected things Used to profile users and track their activities, especially across web sites

7

Page 8: Private Communications  Issues in the Social Web era

How cookies work – the basics

A cookie stores a small string of charactersA web site asks your browser to “set” a cookieWhenever you return to that site your browser sends the cookie back automatically

8

browsersite

Please store cookie xyzzy

First visit to site

browsersite

Here is cookie xyzzy

Later visits

Page 9: Private Communications  Issues in the Social Web era

How cookies work – advanced

Cookies are only sent back to the “site” that set them – but this may be any host in domain

Sites setting cookies indicate path, domain, and expiration for cookies

Cookies can store user info or a database key that is used to look up user info

9

DatabaseUsers …Email …Visits …

Send me with any request to x.com

until 2008

Send me with requests for

index.html on y.x.com for this session

only

[email protected]

Visits=13 User=4576904309

Page 10: Private Communications  Issues in the Social Web era

Cookie terminology

Cookie Replay – sending a cookie back to a siteSession cookie – cookie replayed only during current browsing sessionPersistent cookie – cookie replayed until expiration dateFirst-party cookie – cookie associated with the site the user requestedThird-party cookie – cookie associated with an image, ad, frame, or other content from a site with a different domain name that is embedded in the site the user requested

Browser interprets third-party cookie based on domain name, even if both domains are owned by the same company

10

Page 11: Private Communications  Issues in the Social Web era

“Web bugs”

Invisible “images” (1-by-1 pixels, transparent) embedded in web pages and cause referrer info and cookies to be transferred

Also called web beacons, clear gifs, tracker gifs,etc.

Work just like banner ads from ad networks, but you can’t see them unless you look at the page source

Also embedded in HTML formatted email messages, MS Word documents, etc.

11

Page 12: Private Communications  Issues in the Social Web era

Ad networks

12

Ad companycan get your name & address from CD order and link them to your search

(This is NOT how Google Ads work)

Random Ad Medical Ad

search for medical information

set cookie

buy CD

replay cookie

Search Service CD Store

Page 13: Private Communications  Issues in the Social Web era
Page 14: Private Communications  Issues in the Social Web era

What ad networks may know…

Personal data: Email address Full name Mailing address

(street, city, state, and Zip code)

Phone number

Transactional data: Details of plane trips Search phrases used at

search engines Health conditions

14

“It was not necessary for me to click on the banner ads for information to be sent to DoubleClick servers.”

– Richard M. Smith

Page 15: Private Communications  Issues in the Social Web era

Offline data goes online…

15

•The Stop and Shop grocery store began posting purchase information for customers who had frequent shopper cards

•The Cranor family ’s 25 most frequentgrocery purchases (sorted by nutritional value)!

Page 16: Private Communications  Issues in the Social Web era

Spyware

Spyware: Software that employs a user's Internet connection, without their knowledge or explicit permission, to collect info

Most products use pseudonymous, but unique ID

Over 800 known freeware and shareware products contain Spyware, for example:

Beeline Search Utility GoZilla Download Manager Comet Cursor

Often difficult to uninstall!Anti-Spyware Sites:

http://grc.com/oo/spyware.htm http://www.adcop.org/smallfish http://www.spychecker.com http://cexx.org/adware.htm

16

Page 17: Private Communications  Issues in the Social Web era
Page 18: Private Communications  Issues in the Social Web era

Devices that monitor you

18

Creative Labs Nomad JukeBoxMusic transfer software reportsall uploads to Creative Labs.

http://www.nomadworld.com

SportbrainMonitors daily workout. Customphone cradle uploads data to company Web site for analysis.

http://www.sportbrain.com/

Sony eMarkerLets you figure out the artitst and title of songs you hear on the radio. And keeps a personal log of all the music you like on the emarker Web site.

http://www.emarker.com

:CueCatKeeps personal log of advertisements

you‘re interested in.

http://www.crq.com/cuecat.html

See http://www.privacyfoundation.org/

Page 19: Private Communications  Issues in the Social Web era

HTTP request that sets a Cookie

Web Site which uses Cookies

GET /models/model_overview.asp?ModelName=S2000 HTTP/1.1

Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*

Referer: http://www.google.com/search?hl=en&q=s2000

Accept-Language: en-us

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

Host: automobiles.honda.com

Proxy-Connection: Keep-Alive

HTTP/1.1 200 OK

Server: Microsoft-IIS/5.0

Date: Wed, 05 Oct 2005 19:51:07 GMT

X-Powered-By: ASP.NET

P3P: policyref="http://www.honda.com/w3c/privacy.xml ", CP="IDC DSP COR ADMi DEVi TAIa PSAi PSDi IVAi CONi OUR SAMi IND PHY ONL COM NAV STA"

pragma: no-cache

cache-control: no-store

Content-Length: 1435

Content-Type: text/html

Expires: Sat, 18 Jan 1997 17:36:16 GMT

Set-Cookie: bhCookieSaveSess=1; path=/

Set-Cookie: bhCookieSess=1; path=/

Set-Cookie: bhCookiePerm=1; expires=Fri, 07-Oct-2005 19:51:06 GMT; path=/

Set-Cookie: BrowserInfo=VBScript=True&BrowserOS=Win&Crawler=False&BrowserVer=6&BrowserName=IE; path=/

Cache-control: private

Web Page Request Header

Web Page Response Header

Page 20: Private Communications  Issues in the Social Web era

HTTP request after the cookie is set

Web Site which uses Cookies

GET /models/model_overview.asp?ModelName=S2000&bhcp=1 HTTP/1.1

Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*

Accept-Language: en-us

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

Host: automobiles.honda.com

Proxy-Connection: Keep-Alive

Cookie: bhCookieSaveSess=1; bhCookieSess=1; bhCookiePerm=1; BrowserInfo=VBScript=True&BrowserOS=Win&Crawler=False&BrowserVer=6&BrowserName=IE; bhResults=bhjs=1; bhPrevResults=bhjs=1

Request Header after cookie is set