Social Networks at Scale
-
Upload
eoin-hurrell -
Category
Data & Analytics
-
view
159 -
download
0
Transcript of Social Networks at Scale
![Page 1: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/1.jpg)
Social Networks @ Scale
Eoin Hurrell, PhD Data Lead, Cohort
@eoinhurrell
![Page 2: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/2.jpg)
Cohort as a use-case
![Page 3: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/3.jpg)
Social Networks
👤
👤
👤
👤
👤
👤
👤
👤
![Page 4: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/4.jpg)
Social Network Analysis
📚 ☕📜
• Provides many tools to solve problems • Consider your problem before you consider your tools! • SNA has a long history in sociology
What do you want to know?
![Page 5: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/5.jpg)
Let's talk Graphs
![Page 6: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/6.jpg)
Let's talk Graphs
source: http://www.nltk.org/book_1ed/ch04.html Fig 4.16
![Page 7: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/7.jpg)
Social Networks as Big Data
![Page 8: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/8.jpg)
Options for Getting Networks
• Start a new social network from scratch
• Ahead-of-time scrape a bunch of data from target social networks.
![Page 9: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/9.jpg)
Options for Examining Networks• networkx
• Graph database like Neo4j:
• pandas, dask, standard PyData tools are not focused on networks or cause issues with production service issues
![Page 10: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/10.jpg)
Cohort as a use-case
## Cohort as a use-case - We want to understand friend of a friend relationships, and the knowledge of people in them, so any existing data is
important to us. Python is excellent because sklearn, networkx and other data science libraries exist. It also allows for Spark and Kafka usage as we scale.
We need to get existing data from social networks and be able to process large amounts of data intelligently
Over half a billion relationships, 72+ million people
![Page 11: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/11.jpg)
Streaming architecture
👤` ` `
👤👤 👤👤
Single Source of Truth
www.kappa-architecture.com
🤖
=
Realised Views
![Page 12: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/12.jpg)
Streaming architecture
👤` ` `
👤👤 👤👤
www.kappa-architecture.com
🤖🤖🤖🤖
![Page 13: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/13.jpg)
In production
![Page 14: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/14.jpg)
Batch calculation
👤
👤
👤
👤
👤
👤
👤
👤
👤
Community detection
![Page 15: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/15.jpg)
Batch calculation
👤
👤
👤
👤
👤
👤
👤
👤
👤
Popularity models (e.g. PageRank)
![Page 16: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/16.jpg)
Handling Batch calculation
One Trillion Edges: Graph Processing at Facebook-ScaleVLDB '15, A Ching et al.
![Page 17: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/17.jpg)
How to handle messages like Twitter
SELECT * FROM posts WHERE user_id IN :friend_list ORDER BY timestamp DESC LIMIT 100;
This does not scale 💀
![Page 18: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/18.jpg)
How to handle messages like Twitter
Redis
👤:1
✉✉✉✉✉✉
✉✉✉
✉✉✉✉✉✉✉✉
✉✉✉✉
✉✉✉✉✉✉✉
📨
📨
📨
📨
❤
❤
❤
posts a new
Single Source of Truth
📨
![Page 19: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/19.jpg)
How to handle messages like Twitter
SELECT * FROM posts WHERE id IN :timeline_ids
This scales 😻
![Page 20: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/20.jpg)
Conclusion
• Networks are dense but useful data • Scalable data science depends on usage, not just
traditional form • Python is useful and powerful at every level of this
stack
![Page 21: Social Networks at Scale](https://reader031.fdocuments.net/reader031/viewer/2022030305/5871a1c91a28ab044e8b6fcb/html5/thumbnails/21.jpg)
Thank You!
🔬
Cohort helps you find what you need through the people you know and trustcohort.is
Eoin Hurrell, PhD Data Lead, Cohort
@eoinhurrell