Peeking into massive Online Social Networks (aka “Walking on Facebook”)
description
Transcript of Peeking into massive Online Social Networks (aka “Walking on Facebook”)
1
Peeking into massive Online Social Networks
(aka “Walking on Facebook”)
Maciej Kurant
Miniprojects
http://ws.audioscrobbler.com/2.0/?method=user.getfriends&user=rj&limit=10&page=1&api_key=1b4218629b50c1159e15a6b8285b90ba
LastFMAPI
import urllib2import re
api_key = '1b4218629b50c1159e15a6b8285b90ba'user = "rj"command = "http://ws.audioscrobbler.com/2.0/?method= user.getfriends&user="+user+"&limit=10&page=1&api_key="+api_key data = urllib2.urlopen(command).read() # XML formatdegree = int(re.search('total="(\d+)"', data).group(1))friends = re.findall("<name>(.*)</name>", data)
print degree # number of friends of "rj"print friends # first 10 friends (because page=1 and limit=10).
http://ws.audioscrobbler.com/2.0/?method=user.getfriends&user=rj&limit=10&page=1&api_key=1b4218629b50c1159e15a6b8285b90ba
LastFMAPI
For BFS, you need all friends. Set “limit=500” and pull multiple pages if necessary. For Random Walks, you will need only the degree and one neighbor. Set “limit=1” and 1)learn the degree, 2) select the index i of the neighbor, 3) Get the name by setting “page=i”.
In Python
Surprises• Banned user (once reached, seem to have
0 friends)
• Server not responding
• Friendship graph not connected (solution: consider only the component connected to user 'rj'.)
• Case-sensitiveness? (rj == RJ ??)
• …
Your program has to deal with them!
C
DM
J
N
A
B
IE
K
F
LH
G
Data: LastFM, the component connected to user 'rj'
1) Random nodeUse MHRW of length L=50 to select a node uniformly at random from LastFM. Repeat it 100 times. Report the average degree of selected nodes, and of their neighbors. What changes if L counts only unique nodes in MHRW? Why? What happens if you use RW instead of MHRW?
2) RW vs RWRWRun RW in LastFM. What are the average <playcount>, <playlists>, <age>, <id>, and number of friends observed in the sample. How do they change after correcting for the degree bias (RWRW)?
3) Component size Based on RW, estimate the size of the component connected to user 'rj'. Use two approaches: [Katzir’11] and [Kurant’13?].
4) BFSCollect a BFS sample starting from user 'rj' in LastFM. What node degrees, <playcount>, <playlists>, <age>, <id>, do you sample as you collect more nodes? How about implementing it on multiple threads?
5) Barbarian samplingTry to download the entire component connected to user ‘rj’. You will probably need to use a cluster of machines, multiple threads, etc. Use your own API-key, please. Once you have it, report basic properties: size, average degree, degree distribution, etc (e.g., average <age>?). Compare with others.
Miniprojects