Public policy in the ‘big data’ age: Martin Ralphs presentation
-
Upload
youngpolicyprofessionals -
Category
News & Politics
-
view
197 -
download
0
Transcript of Public policy in the ‘big data’ age: Martin Ralphs presentation
Data Sciencein
Government
Government Data Science Partnership
Raise awareness of data science potential
Embed new approaches and new skills and improve existing capability
Engage with departments to understand opportunities and issues
Build and support a cross-government data science community to share expertise
Break down technical barriers and understand ethical issues
Learning by doing
Government Innovation GroupGovernment Digital Service
What is data science?
3
Data science
Volume
Velocity
Variety
New approach
New technology
New ApproachA ‘data first’ mindset; exploring
the data to find insights & potential improvements using new & innovative techniques
New technologyNew, low priced storage in the
cloud, with unrestricted technology capable of running
software which can gain speedy insights
How can data science improve government policy and operations?
4
Data visualisation
New data sets and collection methods
Machine learning
Social media Webscraping
Prediction
Clustering
Unstructured data
Real time data
Interactive web appsReal-time feeds
Personalisation
Data sources for official statisticsSurveys – e.g. of businesses and households
Census – every 10 years
Administrative data – by-product of government processes
Big Data?“Data that is difficult to collect, store or process within the conventional systems of statistical organizations. Either, their volume, velocity, structure or variety requires the adoption of new statistical software processing techniques and/or IT infrastructure to enable cost-effective insights to be made.”
(UNECE, 2013)
Big data sources
Social media: posts, pictures and videos
Purchase transaction records
Mobile phone GPS and cell tower signalsHigh volume administrative
& transactional records
Sensors gathering information: e.g. climate, traffic, internet of things etc.
Digital satellite images
7
New data sets and collection methods
Social media Webscraping
Real time dataWeb scraping supermarket prices
● Prices collection currently manual● Web scraping offers more detailed, more frequent
data at lower cost● Web scraped data provides an opportunity to gain
experience in processing high volume price data
8
Prototype web scrapers
● 3 supermarkets● 35 CPI/RPI item categories● Written using Python (scrapy)● Daily collection (around 6500 price quotes)● Item counts monitored daily
9
Classification challenge
“This is a dessert apple”
“This is fruit juice (not orange)”
“This is fruit juice (not orange)” and not a dessert apple!
Tesco Mango Juice Drink 1ltr
Tesco Pure Apple Juice 2 Litre
Training Set
Supervised machine learning
10
Price quote distributions
Whisky
Onions
Price Indices Publication 1st September 2015
http://bit.ly/1PRKMGx
“The real finding of the initial research was not that inflation is too high, but the method of collecting prices matters rather a lot”
Paul Johnson, IFS
Smart metersRationale: Smart-type electricity meter data to model occupancy or household composition with energy use profiles
Support more efficient field operations (in 2011, £6.6m spent trying to enumerate vacant properties)
Data from smart meter trials in Great Britain and Republic of Ireland
A range of potential methods identified
Significant issues around privacy and ethics
Electricity: smart meters
14
Half hourly electricity consumption over 7 days at one meter, through 28 consecutive 7 day periods.
TwitterRationale: Using geo-located Tweets to explore mobility and migration7 months of geo-located tweets within Great Britain (about 100 million data points)
Can infer place of usual residence
Significant issues around privacy and ethics
Geolocated Tweet penetration rate by local authority
Demographics and Twitter data
Geo-located Tweet volumes by Device Type Great Britain, 15 August to 31 October 2014
18
Predicting Norovirus (ahead of lab reports) using social media
19
Machine learning Prediction
ClusteringUnstructured data
Segmentation
Thank [email protected]: @GoodPracticeMR
ONS Big Data Project web page http://bit.ly/1OZAOzsGDS Data Science blog http://bit.ly/1QCT5Xs
Government Data Program
me
Policies andGovernance
Modern Data InfrastructureData Science
Open Data
Data Leaders Network
Data Steering Group
Inter-Ministerial Group for Digital Transformation
NationalInformationInfrastructure
Common Technology
Services
Platforms and Standards
Registers
Digital Services
Departmental Transformation
Governmentas a
Platform