Wolves Vs Dogs. loyalty Wolves are loyal to their pack. Dogs are loyal to their family.
Like a Pack of Wolves: Community Structure of Web Trackers
-
Upload
vasia-kalavri -
Category
Technology
-
view
458 -
download
0
Transcript of Like a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves:Community Structure of Web Trackers
V. Kalavri, [email protected] (KTH Royal Institute of Technology)J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research)
Passive and Active Measurements Conference31 March - 1 April 2016, Heraklion, Crete, Greece
Ads
Recommendations
Browsing the Web
2
Tracker
Tracker
Ad Server
display relevant ads
cookie exchange
profiling
Tracking
3
4
The study's authors defined "creepiness" by the feeling consumers get when they sense an ad is too personal because it uses data the consumer did not agree to provide, such as online-search and browsing history. Consumers are even more creeped out by this because they don't know how and where that information will be used.
5
Can’t we block them?
proxy
Tracker
Tracker
Ad Server
6
Legitimate site
● not frequently updated● not sure who or based on what criteria URLs are
blacklisted● miss “hidden” trackers or dual-role nodes● blocking requires manual matching against the list● can you buy your way into the whitelist?
Available Solutions
AdBlock, DoNotTrack, EasyPrivacy:
crowd-sourced “black lists” of tracker URLs
7
8
Towards Automatic Tracker Detection
Exploit fundamental properties of web tracker operation to automate tracker detection
● Structural attributes: network positions, connections● Operational aspects: data exchanged, communication
patterns
9
DataSet
6 months(Nov 2014 - April 2015)of augmented Apache logs from a web proxy
● 80m requests● 2m distinct URLs● 3k users
10
● User identification● URL requested● Headers● Performance
information, i.e. latency, bytes
● Tagged as Trackers or non-Trackers with EasyPrivacy
Web Tracking as a Graph Problem
11
facebook.com
youtube.com
google-analytics.com
b.scorecardresearch.com
V: hostsU: Referers
Referer-Hosts Graph
U: URLs visited by the user
V: embedded URLs
Referer-Hosts Graph: Connected Components
12
94% of all trackers belong to the same connected component!
Communities in Graphs
13
Vertices in the same community are likely to be similar with respect to network position and connectivity
Do trackers form communities?
Densely connectedinternally
Sparsely connectedwith each other
h2
h3 h4
h5 h6
h8
h7
h1
h3
h4
h5
h6
h1
h2
h7
h8
r1
r2
r3
r5
r6
r7
NT
NT
T
T
?
T
NT
NT
r4
referer-hosts graph
r1
r2r3
r3 r3 r4
r5r6
r7
hosts-projection graph
: referer: non-tracker host: tracker host: unlabeled host
The Hosts-Projection Graph
14
Hosts-Projection Graph: Degrees
15
#unique referers that tracker / other host are embedded within
Hosts-Projection Graph: Tracker Neighbors
16
Trackers are mainly connected to other Trackers
Web Tracker Communities
17
Popular trackers, e.g. google-analytics
Smaller trackers
Ad servers
Normal webpages
Data Pipeline
raw logs cleaned logs
1: logs pre-processing
2: bipartite graph creation
3: largest connected component extraction
4: hosts-projection graph
creation
5: community detection
google-analytics.com: Tbscored-research.com: Tfacebook.com: NTgithub.com: NTcdn.cxense.com: NT...
6: results
18
h5
h7 h8 h3 h4 h6
h2
h3 h4
h5 h6
h8
h7
h1
Classification via Neighborhood Analysis
19
: non-tracker host: tracker host: unlabeled host
⅖ non-tracker neighbors⅗ tracker neighbors
if % of tracker neighbors > threshold=> classify as tracker
Results
20
Classification via Label Propagation
non-tracker
tracker
unlabeled
Iterative Algorithm forCommunity Detection
● Vertices propagate their labels to their neighbors and adopt the most popular label in their neighborhood.
● Upon convergence, vertices with the same label belong to the same community.
● If an unlabeled node ends up in a trackers community, it is classified as a tracker
Classification via Label Propagation
2
3 4
5 6
8
7
1
i=0
Classification via Label Propagation
2
4
5 6
8
7
1
i=1
{2} {1, 3}
{2, 4, 5} {3, 5, 6}
{4, 5}{3, 4, 6, 7}{5, 8}
{7}
3
5 6
7 6
8
8
2
3
Classification via Label Propagation
3
5 6
7 6
8
8
2
i=2
5
7 7
6 7
8
8
3{3} {2, 5}
{3, 6, 7} {5, 6, 7}
{6, 7}{5, 6, 6, 8}{7, 8}
{8}
Classification via Label Propagation
5
7 7
6 7
8
8
3
i=3
7
7 7
7 7
8
8
5{5} {3, 7}
{5, 6, 7} {6, 7, 7}
{6, 7}{7, 7, 7, 8}{6, 8}
{8}
Classification via Label Propagation
7
7 7
7 7
8
8
5
i=4
7
7 7
7 7
8
8
7
{7} {5, 7}
{7, 7, 7} {7, 7, 7}
{7, 7}{7, 7, 7, 8}{7, 8}
{8}
Classification via Label Propagation
7
7 7
7 7
8
8
7 7
7 7
7 7
8
8
7
Results
28
Conclusions
● Web trackers are well-connected with each other○ 94% of web trackers are in the same connected component
● Web trackers are mainly connected to other trackers○ High clustering, tight communities
● 97% classification accuracy and < 2% FPR with simple methods○ Can be used to build robust and fully automated privacy preservation
systems
29
Like a Pack of Wolves:Community Structure of Web Trakcers
V. Kalavri, [email protected] (KTH Royal Institute of Technology)J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research)
Passive and Active Measurements Conference31 March - 1 April 2016, Heraklion, Crete, Greece
Extra Slides
Referer-Hosts Graph: Degrees
32
#unique referers that tracker / other hosts are embedded within