Automatic Extraction of Soccer Game Event Data from Twitter
-
Upload
marieke-van-erp -
Category
Documents
-
view
916 -
download
4
description
Transcript of Automatic Extraction of Soccer Game Event Data from Twitter
![Page 1: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/1.jpg)
Automa'c extrac'on of soccer game event data
from Twi6er
Guido van Oorschot, Marieke van Erp and Chris Dijkshoorn
Monday, November 12, 12
![Page 2: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/2.jpg)
Soccer data
Monday, November 12, 12
![Page 3: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/3.jpg)
Theory
1. Fair body of research on automated sports highlight extraction
2. Twitter data can offer interesting insights in real world phenomena
Monday, November 12, 12
![Page 4: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/4.jpg)
Automated highlight detec@on
Let’s Use Twitter data!
Monday, November 12, 12
![Page 5: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/5.jpg)
1.Detecting events What minutes did events occur?
2.Classifying events Is the event a goal, card or substitution?
3.Assigning events to teams Is the event for the home team or away team?
3 Tasks
Monday, November 12, 12
![Page 6: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/6.jpg)
5 types of events
- Goal
- Own Goal
- Red Card
- Yellow Card
- Substitution
Monday, November 12, 12
![Page 7: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/7.jpg)
Methodology
1. Gathering the data
2. Exploring and cleaning the data
3. Classifying interesting data points
Monday, November 12, 12
![Page 8: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/8.jpg)
Gathering data
- Collect all tweets with game hashtags
#ajafey #nacgro #psvutr
- Collect official data for each match
Goals, cards, substitutions
Monday, November 12, 12
![Page 9: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/9.jpg)
Our data
6 months61 games
661 events10,643 tweets
Monday, November 12, 12
![Page 10: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/10.jpg)
1. Detecting events
2. Classifying events
3. Assigning events to teams
Three Experiments
Monday, November 12, 12
![Page 11: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/11.jpg)
1. Detecting events
Monday, November 12, 12
![Page 12: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/12.jpg)
1. Detecting events
Monday, November 12, 12
![Page 13: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/13.jpg)
1. Experimental Setup
- Goal: detect peaks in # tweets per minute signal to extract events
- Setup: Test three peak detection methods:
1. LocMaxNoBaseLineCorr2. IntThresNoBaseLineCorr3. IntThresWithBaseLineCorr
Monday, November 12, 12
![Page 14: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/14.jpg)
1. Results
Monday, November 12, 12
![Page 15: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/15.jpg)
1. Findings
- Goals and red cards are detected better than yellow cards and substitutions
- None of the three peak selection methods works well.
- Highlights can be extracted, but not precise enough
Monday, November 12, 12
![Page 16: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/16.jpg)
1. Detecting events
2. Classifying events
3. Assigning events to teams
Three Experiments
Monday, November 12, 12
![Page 17: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/17.jpg)
2. Classifying Events
minute “goal” “1” “red” “card” “boring” class
34 0 2 0 1 20 nothing
35 23 34 0 0 0 goal
12 1 2 0 0 5 nothing
13 1 0 22 11 0 red card
- Goal: Classify minutes into event classes
Monday, November 12, 12
![Page 18: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/18.jpg)
Issues
Problem: Huge, sparse matrix
1. Reduce features Choose words/features smartly
2. Reduce instances Choose minutes smartly
Monday, November 12, 12
![Page 19: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/19.jpg)
2. Experimental Setup
- 3 Instance selection settings
1. AllMinutes2. PeakMinutes3. Eventminutes
Monday, November 12, 12
![Page 20: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/20.jpg)
2. Experimental Setup
- 7 Feature selection settings1. AllMoreThanOnce2. Top500TotalFreq3. Top10MinuteFreq4. Top500TotalTfIdf5. Top10MinuteTfIdf6. Top50Infogain7. Top50GainRatio
Monday, November 12, 12
![Page 21: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/21.jpg)
2. Experimental Setup
- 6 types of classifiers1. C4.52. RandomForest3. NaiveBayes4. NaiveBayesMultinomial5. libSVM6. IB1
Monday, November 12, 12
![Page 22: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/22.jpg)
2. Results
Monday, November 12, 12
![Page 23: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/23.jpg)
2. Discussion
- Top50GainRatio best feature selection- libSVM best classifier- EventMinutes results:
Class F-‐measure
OVERALL 0.822Goal 0.841
Own goal 0.000
Red card 0.848
Yellow card 0.785
Subs@tu@on 0.839
Monday, November 12, 12
![Page 24: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/24.jpg)
1. Detecting events
2. Classifying events
3. Assigning events to teams
Three Experiments
Monday, November 12, 12
![Page 25: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/25.jpg)
3. Experimental Setup
- Goal: Assign events to team
- Based on the ratio between tweets from fans for home and away team
- But first: extract fans
Monday, November 12, 12
![Page 26: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/26.jpg)
3. Extracting fans
- Hypothesis:
People that tweet for the same team each week are probably fan of that team
Monday, November 12, 12
![Page 27: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/27.jpg)
3. Extracting fans
- Extracted 38,527 fans from 146,326 users (26%)
- This method of extracting fans works well:
Right team Not clear Wrong team
88% 10% 2%
Monday, November 12, 12
![Page 28: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/28.jpg)
3. Results
Monday, November 12, 12
![Page 29: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/29.jpg)
3. Results
- Performance of assigning events to teams above baseline performance:
Class Baseline Performance
OVERALL 52% 58%Goal 58% 69%
Red card 50% 62%
Yellow card 63% 63%
Subs@tu@on 52% 57%
Monday, November 12, 12
![Page 30: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/30.jpg)
1. Detecting events => difficult
2. Classifying events => good results
3. Assigning events to teams=> promising results
Conclusion
Monday, November 12, 12
![Page 31: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/31.jpg)
Future Work
- Use sentiment in tweets (for detecting events and assigning events to teams)
- Player detection
- Other sports
Monday, November 12, 12
![Page 32: Automatic Extraction of Soccer Game Event Data from Twitter](https://reader034.fdocuments.net/reader034/viewer/2022042623/54830c86b4af9f730d8b4947/html5/thumbnails/32.jpg)
Ques@ons?Monday, November 12, 12