Privacy and Security on Online Social Media: Workshop on Data Analytics & Its Security Issues

32
Privacy and Security in Online Social Media Workshop on Data Analy0cs & Its Security Issues Jaypee Ins0tute of Informa0on Technology, Sector 62 Dec 4, 2015 Ponnurangam Kumaraguru (“PK”) Associate Professor J/ponnurangam.kumaraguru, @ponguru

Transcript of Privacy and Security on Online Social Media: Workshop on Data Analytics & Its Security Issues

Privacy and Security in Online Social Media

Workshop  on  Data  Analy0cs  &  Its  Security  Issues    Jaypee  Ins0tute  of  Informa0on  Technology,  

Sector  62    Dec  4,  2015    

 Ponnurangam  Kumaraguru  (“PK”)  

Associate  Professor    J/ponnurangam.kumaraguru,  @ponguru  

Extremely glad to be here!

3  

Who am I? � Associate  Professor,  IIIT-­‐Delhi      � Ph.D.  from  School  of  Computer  Science,  Carnegie  Mellon  University  (CMU)      

� Research  interests    - Privacy,  e-­‐crime,  online  social  media,  and  usable  security    

� Founding  Head,  CERC@IIITD,  cerc.iiitd.ac.in  � Co-­‐ordinate  and  manage  Precog,  precog.iiitd.edu.in    

� ACM  India  Eminent  Speaker  4  

CERCs

5  

6  

What we dabble with!

http://precog.iiitd.edu.in/

7  

Non-trustworthy Content

FAKE  

RUMORS  

8  

$  

Methodology

9  

Training Data

� 500  Tweets  per  event  � Used  CrowdFlower  

10  

Event   Tweets   Users  Boston  Marathon  Blasts  (2013)   7,888,374   3,677,531  

Typhoon  Haiyan  /  Yolanda  (2013)   671,918   368,269  

Cyclone  Phailin  (2013)   76,136   34,776  

Washington  Navy  yard  shoo0ngs  (2013)   484,609   257,682  

Polar  vortex  cold  wave  (2014)   143,959   116,141  

Oklahoma  Tornadoes  (2013)   809,154   542,049  

 Total       10,074,150   4,996,448  

Credibility Modeling

11  

Feature  set      Features  (45)    

Tweet  meta-­‐data    Number  of  seconds  since  the  tweet;  Source  of  tweet  (mobile  /  web/  etc);  Tweet  contains  geo-­‐coordinates  

Tweet  content  (simple)    

Number  of  characters;  Number  of  words;  Number  of  URLs;  Number  of  hashtags;  Number  of  unique  characters;  Presence  of  stock  symbol;  Presence  of  happy  smiley;  Presence  of  sad  smiley;  Tweet  contains  `via';  Presence  of  colon  symbol  

Tweet  content  (linguis0c)    

Presence  of  swear  words;  Presence  of  nega0ve  emo0on  words;  Presence  of  posi0ve  emo0on  words;  Presence  of  pronouns;  Men0on  of  self  words  in  tweet  (I;  my;  mine)  

Tweet  author     Number  of  followers;  friends;  0me  since  the  user  if  on  Twiher;  etc.  

Tweet  network    Number  of  retweets;  Number  of  men0ons;  Tweet  is  a  reply;  Tweet  is  a  retweet  

Tweet  links     WOT  score  for  the  URL;  Ra0o  of  likes  /  dislikes  for  a  YouTube  video  

TweetCred Demo

12  

Implementation

Feedback by Users

14  

15  

v

http://twitdigest.iiitd.edu.in/TweetCred/

17  

18  

19  

How  many  of  you  have  posted  mobile  numbers  on  Online  Social  

Networks?  

How  many  of  you  have  seen  mobile  numbers  being  posted  on  

Online  Social  Networks?    

Sample posts

20  

Sample posts

21  

Sample posts

22  

Sample posts

23  

Data statistics � Twiher:  12th  October  2012  –  20th  October  2013  �  Facebook:    16th  November  2012  –  20th  April  2013  

24  

Numbers   Category  +91   Category  0   Category  void   Total  

TwiOer   Facebook   TwiOer   Facebook  

TwiOer   Facebook   TwiOer   Facebook  

Mobile  Numbers  

885   2,191   14,909   8,873   25,566   25,294   41,360   36,358  

User  profiles  

1,074   2,663   17,913   9,028   31,149   25,406   49,817   36,588  

SocialCaller App

25  

hhps://play.google.com/store/apps/details?id=com.ayush.socialcaller&hl=en  

26  

Data Extraction

� Data  was  collected  from  various  open  government  data  sources  using  PHP  scripts  and  stored  as  MySQL  databases.  

27  

OPEN    GOVT.  WEBSITES  

Alphabets  a-­‐z  for  name,  across  70  cons0tuencies  

Name  and  DOB  from  DL  

Random  5  seeds,  ‘Incremental  ahack’  

PAN  [53,419]  

DRIVING  LICENCE  [2,24,982]  

VOTER  [81,95,053]  

Data Extraction

� Public  data  from  various  online  social  networking  sites  was  collected  using  public  API  calls.    

� OAuth  tokens  were  used  for  authen0ca0on  and  authoriza0on.  

28  

UNIQUE  NAME  

API  CALLS  

GOOGLEPLUS  [28,900]  

LINKEDIN  [1,86,798]  

FOURSQUARE  [29,393]  

TWITTER  [15,57,715]  

FACEBOOK  [33,77,102]  

OCEAN Demo

29  

Risk of Collation

30  

Details   User  1   User  2  

Mobile  Number  

+9199xxxx2708   +9198xxxx5485  

Full  Name   x  Gambhir     xxxxxx  Jeswani  

Age   23   53  

Gender   Male   Male  

Father’s  Name  

xx  Gambhir   x  x  Jeswani  

Address    ***,  xxxx  Bagh,  Delhi  

***,  Mig  Flats,  *-­‐block,  xxxxx  Vihar  Phase-­‐I  

ID   Voter  ID:  NLNxxx5696    

Driving  License:    DL/04/xxx/222668  

Shared  by  Owner?  

No   Yes  

8  Delhi  Users  

Idenffied  Uniquely  

 OCEAN:  Open  Government  

Data  Repository  

 

Takeaways

� Online  Social  Media  is  a  different  beast  in  terms  of  privacy,  iden0ty,  and  credibility  - Research  /  technologies  should  be  developed  

� Mul0ple  interes0ng  research,  engineering,  and  innova0on  wai0ng  to  be  done  in  India  

31  

Thank you [email protected]  cerc.iiitd.ac.in  

J/ponnurangam.kumaraguru