Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n...

23
NICTA Copyright 2013 From imagination to impact Data Analytics at NICTA Stephen Hardy National ICT Australia (NICTA) [email protected]

Transcript of Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n...

Page 1: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2013 From imagination to impact

Data Analytics at NICTA

Stephen Hardy National ICT Australia (NICTA)

[email protected]

Page 2: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2013 From imagination to impact

Outline

•  Big data = science! •  Data analytics at NICTA

– Discrete – Finite –  Infinite

•  Machine Learning for the natural sciences

2

Page 3: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact 3

Data, Data, Everywhere…

Page 4: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Evolution vs. Revolution

4

Machine Learning Statistics

Computer Science

Scientific Challenges

Societal Challenges

Personal

Enterprise

Government

techniques

problems

problems

techniques

problems

techniques

Analysis of data to prove or disprove hypotheses = science!!

Page 5: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Not just the data…

5

Data Scale

Volume

Velocity Variety

Algorithmic complexity

Graphical models

Deep learning

Non-parametric statistics

Random forests

Graph learning

File systems

Distributed computation

SQL / NoSQL

Analytics Engines

Machine learning toolkits

Infrastructure

Big Data

Big Analytics Analytics

Data

Page 6: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2010 From imagination to impact

What is NICTA?

•  Australia�s National Centre of Excellence in Information and Communication Technology –  700 Staff, 5 labs, $100m/y revenue

•  NICTA objectives –  Research Excellence in ICT –  Wealth Creation for Australia

•  Transforming Industry –  $3bn/y direct impact on GDP from projects

•  New Industries –  Eleven spin-outs, working with ICT SMEs

•  Skills and Capacity –  17 University partners, 280 PhD Students

Page 7: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2013 From imagination to impact

Data Analytics: A summary

7

Events

People

Signals

Location

Spatial Fields

Temporal Fields

ℵDiscrete P(ni )

ℜnFinite P(xi )

ℑInfinite P( fi )

Page 8: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2013 From imagination to impact

NICTA Data Analytics (1)

Discrete ℵ P(ni ) Events, People, Text, Gene Sequences

Risk Estimation

Sentiment analysis

Behaviour prediction

Bioinfomatics

Biomedical informatics

Xenome GWIS

Opinion Watch

Biomedical texts

Event Watch

Patent analysis

Offer targeting “Scoobi” data mining / Active learning

Machine learning for Natural Language Processing

Efficient compressed storage and search for sequence data

Energy constrained machine learning Edge-distributed learning

8

Page 9: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Event watch •  Demo

–  http://pmo-eventwatch.research.nicta.com.au/demo/

9

Sentiment Analysis

40,000 world lexicon Part of Speech Sentiment Named Entity

Recognition

Key phase extractor

LDA: Latent Dirichlet Allocation

Differential topic modeling Supervised LDA

Page 10: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Key technology - Topic modeling

10

Documents consist of words

Document 5

Document 4

Document 3

Document 2

Document 1

Documents are modeled as a mixture of topics

A B C D

1

2

3

4

5

Probability distribution

Probability distribution

Probability distribution

Probability distribution

Probability distribution

Words are associated with topics

Topic A

Topic B

Topic C

Topic D

Vocabulary

Probability distribution

Probability distribution

Probability distribution

Probability distribution

“Latent Dirichlet Allocation” learns the distributions and allocates every word in each document to a topic

Page 11: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2013 From imagination to impact

NICTA Data Analytics (2)

Finite ℜn P(xi ) Signals, Location, Genetics

Fault Prediction

Preventative Maintenance

Disease expression SparSNP

Efficient distributed sparse regression method

Non-parametric Bayesian methods

distributed, autonomous, real-time data with

classification / clustering

Cri$cal(Water(Mains(

SmartGrid(

Structural(Health(Monitoring(

Service optimisation

11

Page 12: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact 12

Page 13: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Existing data

•  Age •  Type •  Material •  Size •  Length •  Failures •  Soil •  Pressure •  Location •  Weather •  … •  and many more

NICTA’s analysis

•  Hierarchical Beta Process

•  Complex data mix

Cond. Assessment

•  Accurate

•  Improved prediction

Risk / age Risk /

type Risk / size Age

profile

•  Data Driven prediction from multiple existing data sources •  Dynamic model update and aggregation

Machine Learning Process

13

Page 14: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact NICTA COPYRIGHT 2013

Improvement on failure prediction

14

•  Use 1998-2008 break records for modelling building

•  Use 2009-2011 break record for testing

•  Multiple factors –  Laid year, material,

size, coating, and soil

NICTA

Weibull

NICTA

Weibull Fa

ilure

s de

tect

ed

Length of condition assessment

Wollongong

zoom in (2.5%)

Page 15: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Risk Map Risk ranking of pipes based on likelihood of failure

Top 10% pipes

10% ~ 40% pipes

40% ~ 60% pipes

Last 40% pipes

Actual breaks in the following year

Red = highest

Blue = lowest

Page 16: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2013 From imagination to impact

NICTA Data Analytics (3)

Infinite ℑ P( fi ) Spatial Fields, Temporal Fields

Geothermal( Groundwater(

Data Fusion with uncertainty estimation Resource exploration

Soils(

Non-parametric Bayesian methods

Resource management

((((((((Air(quality( Solar!

Solar Energy Forecast Software

Research Excellence in ICT Wealth Creation for Australia

Technical Contact [email protected] Business Contact [email protected]

The Solar Energy Forecast Software project is part of NICTA’s Security and Environment Business Team, providing security for people, resources and critical systems.

The Problem

Did you know failure to predict solar energy production will  mean  we  won’t  fully  capture  available  solar  resources?

Impact NICTA aims to lower the costs of solar monitoring systems to allow for fast, affordable forecast systems to be installed all over Australia. Specifically, we aim to: • Develop low-cost devices ($500) that measure current

levels of rooftop solar power production by monitoring 150 households across the ACT.

• Utilise low-cost sky cameras ($250) to detect cloud cover. From these images, NICTA’s researchers will project the motion of the clouds and estimate the 'darkness' of their shadows, thereby predicting their inhibitive effect on power output.

• Develop software that will predict solar energy production by suburb within minutes and hours rather than days.

Electricity grids around the world were not designed to manage large fluctuations of supply in power generation. Traditional forms of power supply such as coal-fired stations provide a stable, non-fluctuating form of power supply. However, the energy we receive from the sun is much more unpredictable and grids are not designed to cope with the dynamic nature of renewable energy production.

Current prediction methods are not accurate enough at the suburb level and not fine-grained enough (i.e. currently a matter of days, not minutes). Current methods also require expensive (up to $75,000) and obtrusive equipment in a large area to collect the required data.

Renewable Energy

Collaborators

goog

le.c

om.a

u/im

ages

en.w

ikip

edia

.org

Resource discovery Plant system diversity Non-linear laser physics

Big(Data(Knowledge(Discovery(

16

Transparent Machine Learning

Page 17: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Engineered Geothermal Systems

Page 18: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Geophysical Data

Gravity Magnetics Core Samples Temperature Reflection Seismic Magnetotellurics Gravity Gradiometry Down-hole Geophysics Stress Porosity Passive Seismic Micro Seismic ...

Page 19: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Distributions of geologies

Magneto-Telleurics Seismic Magnetism Gravity

Probability Distribution

Page 20: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Results – fusing gravity & boreholes

20

Predicted mean density and uncertainty

Page 21: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Reuse

21

Machine Learning Statistics

Computer Science

Scientific Challenges

Societal Challenges

Personal

Enterprise

Government

techniques

problems

problems

techniques

problems

techniques

How can we apply new techniques of machine learning / analytics to science?

Page 22: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

Machine Learning in the Natural Sciences •  Big Data Knowledge Discovery •  Science and Industry Endowment Fund (www.sief.org) project

•  Collaboration between •  NICTA (machine learning) •  SIRCA (big data) •  Sydney Uni (plate tectonics) •  Macquarie Uni (forest ecosystems, non-linear laser physics)

•  How do we make machine learning easier to use in the natural sciences?

Page 23: Data Analytics at NICTA - University of Tasmania€¦ · NICTA Data Analytics (1) Discrete ℵP(n i)Events, People, Text, Gene Sequences Risk Estimation Sentiment analysis Behaviour

NICTA Copyright 2012 From imagination to impact

The End