What you want is not what you get:
description
Transcript of What you want is not what you get:
![Page 1: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/1.jpg)
What you want is not what you get:
Predicting sharing policies for text-based content on Facebook
Arunesh Sinha*, Yan Li †, Lujo Bauer*
*Carnegie Mellon University
†Singapore Management University
![Page 2: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/2.jpg)
2
Motivation
![Page 3: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/3.jpg)
3
Problem for Social Networks
o Report in dailymail.co.uk†
† http://www.dailymail.co.uk/sciencetech/article-2423713/Facebook-users-committing-virtual-identity-suicide- quitting-site-droves-privacy-addiction-fears.html
![Page 4: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/4.jpg)
4
More User Control ⇏ Better Privacy
o Users fail to comprehend controls
o Users fails to comprehend consequences
o Though concerned, often no effort towards better use of controls
![Page 5: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/5.jpg)
5
More user controlSmarter user control
Our goal: Help users pick correct policy for new Facebook posts
![Page 6: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/6.jpg)
Facebook Wall
Post n+1Facebook’s Strategy
Post n-2
Post n-1
Post n
Friends
Public
Public
Default:Public
![Page 7: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/7.jpg)
Our Goal and Approach
Facebook Wall
Post n+1
Post n-2
Post n-1
Post n
Friends
Public
Public
Default:?
ML
![Page 8: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/8.jpg)
8
Outline
o Data collection methodologyo Survey resultso Machine learning approacho Results and analysiso Limitations / Conclusion
![Page 9: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/9.jpg)
9
Survey Methodology
o Created an online survey o Advertised on Craigslist and at CMU
Data Collection Method
Participate in a Carnegie Mellon research study on Facebook sharing. Earn $5 for participating in a ~20 minute online study.
We’re looking for English speaking adults, who have used Facebook for at least 4 months, update their Facebook status or post on Facebook at least every other day, and have used more than one privacy setting for their posts.Please click on the following link to start the online study: http://greyw1.ece.cmu.edu/survey/survey.phpUpon completion of the study, you will receive a $5 Amazon gift card.
![Page 10: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/10.jpg)
Filtering UsersData Collection
Method
![Page 11: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/11.jpg)
11
Survey Questions
o Collected demographic data– Age, gender, country, level of education
o Degree of agreement with the statements: – I have a strong set of privacy rules.– I find Facebook's privacy controls confusing.
o Have you ever posted something on a social network and then regretted doing it? If so, what happened?
Data Collection Method
![Page 12: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/12.jpg)
12
o Fetched 4 months of users’ posts
Facebook AppData Collection
Method
Policy
Text in post
![Page 13: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/13.jpg)
13
Survey Results: Demographics
o 42 participants (avg. 146 posts and 4.6 policies)o Age: 18 to 65, with an average of 29.1o 35 female, 7 maleo 39 from USA Level of education
High SchoolCollegeAdvanced
Survey Results
![Page 14: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/14.jpg)
14
Survey Results: SentimentSurvey Results
Regretted posting ever? (No/Yes)
Find privacy control confusing
Have a strong set of privacy rules
0% 20% 40% 60% 80% 100%
DisagreeNeutralAgree
![Page 15: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/15.jpg)
ML Usage Plan
Facebook Wall Post n+1
Post n-2
Post n-1
Post n
Friends
Public
Public
Default:?
ML
![Page 16: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/16.jpg)
16
Machine Learning
o We use MaxEnt as the ML tool– Used Stanford NLP software
o MaxEnt: provides good generalization– I.e., prevents overfitting– Learns probabilistic hypothesis h that outputs
probability over labels given data x– Chooses hypothesis h with maximizes entropy
• Subject to a form of agreement with training data
Machine Learning Approach
![Page 17: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/17.jpg)
17
Features Considered
o Words and 2-grams in the Facebook posto Presence of multimediao Time of day – morning, evening, nighto Previous post’s policy
o Model (feature set) chosen using cross validation
Machine Learning Approach
![Page 18: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/18.jpg)
18
Temporal Testing o The data is temporalo Picked 10 posts randomly as test datao We simulate a real-world scenario
Test
Test
Train to predict
Train to predict
Machine Learning Approach
Time
![Page 19: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/19.jpg)
19
Trainingo Cross-validation to choose featureso May have different model for different test point
Machine Learning Approach
Test
Test
Train to predict
Train to predict
Time
![Page 20: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/20.jpg)
Baseline Approach
o Previous policy (Facebook’s approach)– Use the policy of the last post as the prediction
o Surprisingly, pretty good accuracy– 0.85 on average
Results and analysis
![Page 21: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/21.jpg)
MaxEnt Accuracy
Technique AccuracyBaseline Previous Policy 0.85
MaxEnt 0.86
Results and analysis
![Page 22: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/22.jpg)
Prediction Mismatch
o Problem: We are not predicting intended policy– Instead, predicting implemented policy
o Conjecture:– Implemented policy is often incorrect– Users just use Facebook’s default policy
Results and analysis
![Page 23: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/23.jpg)
Ground Truth Collection
o Feedback on 20 randomly chosen posts– Provides ground truth (intended policy)
23
Results and analysis
All policie
sever used
Text of post
![Page 24: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/24.jpg)
24
Datasets
Original data Clean data
Correct 20 posts basedon feedback
Prunedclean data
Remove 80%
Implemented Policy
Results and analysis
![Page 25: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/25.jpg)
25
Temporal Testing o 20 intended policy knowno Picked 8 of these randomly as test datao We simulate a real-world scenario
Test
Test
Train to predict
Train to predict
Results and analysis
![Page 26: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/26.jpg)
Baselineo Same previous policy approach as beforeo Measure intended accuracy
– Predict only for posts with known intended policy– Better measure of performance
o Baseline intended accuracy: 0.67– 0.85 obtained previously on implemented policies
Results and analysis
![Page 27: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/27.jpg)
27
MaxEnt Intended AccuracyResults and
analysis
Baseline
67%
MaxEnt(clean)71%
MaxEnt(pruned clean)
81%
![Page 28: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/28.jpg)
28
Confidence About PolicyConfidence Factor (CF): Fraction of posts for which intended policy matched implemented policy
Results and analysis
77
12
16
Users binned by confidence factor
0.00-0.250.26-0.500.51-0.750.76-1.00
![Page 29: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/29.jpg)
29
Analysis of Improvement
0.00-0.25 (7)
0.26-0.50 (7)
0.51-0.75 (12)
0.76-1.00 (16)
00.10.20.30.40.50.60.70.80.9
1
BaselineMaxEnt (Clean)MaxEnt (Pruned Clean)
Confidence factor (#users)
Intended Accuracy
Results and analysis
![Page 30: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/30.jpg)
30
Limitations
o Only 20 intended policy availableo 42 participants is not a huge number
– Other studies have used similar numbers
o Richer feature space possible– By processing the attachments of the post
o Could use more sophisticated ML techniques
Limitations
![Page 31: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/31.jpg)
31
Conclusion
o Accuracy: 67% 81%o Accuracy for CF>0.5: 78% 94%
An approach demonstrating feasibility of learning intended
privacy policy of Facebook posts
![Page 32: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/32.jpg)
32
Discarding “Bad” Data Helps
20% 40% 60% 80%0
0.10.20.30.40.50.60.70.80.9
1
Percentage of “bad” data discarded
Accuracy
Result and analysis
![Page 33: What you want is not what you get:](https://reader034.fdocuments.net/reader034/viewer/2022051020/568161c1550346895dd1a30d/html5/thumbnails/33.jpg)
Improvement #Participants
0-0.25 0.26-0.50
0.51-0.75
0.76-1.00
02468
1012141618
#Partic-ipants#Im-provement Clean#Improve-ment Pruned Clean
Confidence factor
Number ofPartici-pants
Result and analysis