Video Games Presentation for Policymaking in the big data era conference
-
Upload
juan-mateos-garcia -
Category
Data & Analytics
-
view
123 -
download
0
Transcript of Video Games Presentation for Policymaking in the big data era conference
2
Talks about Nesta + Ukie’s research mapping the UK games industry with web (biggish) data.
Focuses on data collection and compares results with what one would obtain using standard (SIC-based) approaches.
Less focused on reviewing all our findings. For that, you can download the full report here:
https://www.nesta.org.uk/sites/default/files/map_uk_games_industry_wv.pdf
This presentation
3
Exam question
To measure and map a fast moving, innovative, entrepreneurial sector.
Opportunity
The ‘big data’ revolution:
Unstructured web inputs
Combining varied datasets
Open, interactive outputs (datasets + platforms)
Audiences:• Policymakers• Industry• Other innovation agents• Researchers
1. Context
4
2. How do we FIND UK games companies?Using official data
Business
Analyst
SIC Code
Data
Govt
Do these SIC codes capture games companies?Some issues:1. Inadequate SIC codes: Games SIC codes only appeared in
2007. 2. Misclassification:
• Companies have no incentives to select the right SIC code.• Companies straddle sectors (educational games, games app
developers etc.)
Is this data relevant?Some issues:1. It misses smaller companies2. Lags in the publication of the data (~1/2 years)3. Data only available in an aggregate way. Not possible to
identify companies (due to disclosure issues)4. Data doesn’t include industry-relevant questions
5
Industry expert
Analyst
Domain knowledge
Survey
Sample
Excellent source of data, tried and tested methodology• Used in many policy-relevant reports.• Allows targeting existing companies, and obtaining
very relevant information.Limitations:• Very expensive• Very low response rates• Snapshot
2. How do we FIND UK games companies?Using surveys
6
Business
Analyst
Activity
Data
Web
Advantages• Definition not based on
SIC codes but on economic/creative activity
• ‘Real-time’ data• Relevant dataNot a silver bullet… as we will see.
2. How do we FIND UK games companies?Using web data (our approach)
7
An illustration of the pitfalls of web data
Several academic papers have used a similar approach, to ours, but based on a single data source (MobyGames). But MobyGames is very skewed towards older, niche gaming platforms vs. new, mainstream ones. This reflects biases in the user-base of the platform.
8
2. How do we FIND UK games companies?Process
Data scraping carried out by external agency with IT + domain expertise. Analysis in-house
9
2. How do we FIND UK games companies?Some observationsNot all observations are born equal: • Matching companies from web sources with CH data is a probabilistic
process.• False positives/negatives costly not just in terms of accuracy, but also of
perceptions.• Strategies to address this:
– Manual (expensive, stringent) verification of companies using web information: only 23% companies verified (80% of those validated were correct):
– Decision tree (CHAID) to identify groups of companies similar to those verified positively: 546 companies added.
– Quality assurance with domain experts (Ukie): • Remove 17 companies (BBC, gambling companies)• Incorporate 184 companies with no web presence.
10
4. ResultsCoverage
We identify 1902 companies active in 2015 (cf. 1320 according to IDBR in 2013, ~500 in most domain-expert generated company lists).
Just over a third of companies covered by official SIC codes.
20% of the companies have no official SIC code yet, but are identified by our approach.
11
4. ResultsSpecialisation profile
Companies targeting emerging platforms (iOS) are less well-covered by games SICs than those targeting established platforms (consoles)
12
4. ResultsGeography [1]
breslq ark.lq idbrcount.lq
breslq 1.00 0.38 0.46
ark.lq 0.38 1.00 0.53
idbrcount.lq 0.46 0.53 1.00
Gini BRES
Gini IDBR Gini WD
0.929 0.898 0.801
Our approach shows a geography of the UK games industry echoing official data sources, but with less concentration
13
4. ResultsGeography [2]
Differences in “hotspots” when we compare our data and IDBR.
Conversations with Ukie suggest the extra hubs identified by our analysis are more credible than those using IDBR (Liverpool + Cardiff vs. Hull + Reading)
14
4. ResultsHub composition
One explanation
“New” games hubs with more diversified creative economies tend to include less companies covered by official SIC codes, compared with “longstanding” hubs.
15
4. ResultsMicro-geography
Our data allows us to map the games industry at the micro (company address) level -> this is policy relevant information.
16
4. ResultsSome issues
Poor availability of financial data (only 6% report it to CH) -> We can’t produce estimates of employment or value added.
We rely on inaccurate trading addresses for our mapping. We know there are issues here.
How many of our companies specialise in games vs. make some games? What goes in and what goes out?
17
Lessons learned
Structured domain-specific resources help: not available for all sectors.
It’s not web vs SIC, but web + SIC
Combining automated data collection and matching with domain knowledge is preferable.
Do not underestimate the risks of errors, or the costs of minimising them.
Next steps
Our strategy to improve the quality of the data is to open it up for the games industry by developing an interactive, dynamic platform: Watch this space
5. Conclusions