25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda...

55
Advanced Data Analytics An Introduction Data Mining in Advanced Analytics Dr Paul Kennedy [email protected] Centre for Quantum Computation & Intelligent Systems School of Software, Faculty of Engineering & IT 1 Friday, 5 July 2013
  • date post

    21-Oct-2014
  • Category

    Business

  • view

    593
  • download

    0

description

 

Transcript of 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda...

Page 1: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Advanced Data AnalyticsAn Introduction

Data Mining in Advanced AnalyticsDr Paul Kennedy

[email protected] for Quantum Computation & Intelligent Systems

School of Software, Faculty of Engineering & IT

1Friday, 5 July 2013

Page 2: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Outline

• What is Data Analytics (DA)?

• Motivation for DA

• Main approaches

• DA professionals

• Links to other topics

• Overview of techniques

Paul Kennedy - [email protected]

2Friday, 5 July 2013

Page 3: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

What is Data Analytics?

Paul Kennedy - [email protected]

3Friday, 5 July 2013

Page 4: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

• Data Analytics is the analysis of large databases to find novel, commercially valuable and exploitable patterns.

• Aim: discover meaningful insights and knowledge from data.

• Discoveries expressed as models.

• Data mining = process of building models.

Paul Kennedy - [email protected]

4Friday, 5 July 2013

Page 5: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

• A model

• Captures the essence of the discovered knowledge.

• Can assist in understanding the world.

• Can be used to make predictions.

Models

Paul Kennedy - [email protected]

5Friday, 5 July 2013

Page 6: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Where applied?• Who by?

• Business, government, financial services, biology, medicine, risk and intelligence, science and engineering.

• Data collected about

• Businesses, customers, human resources, products, manufacturing processes, suppliers, business partners, local and international markets & competitors.

• Why?

• Better support managers, find fraudulent behaviour, understand scientific processes, finding opportunities.

Paul Kennedy - [email protected]

6Friday, 5 July 2013

Page 7: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Motivation for DA

Paul Kennedy - [email protected]

7Friday, 5 July 2013

Page 8: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Collecting Data• We have always collected, checked and organised

data.

• 5500 years ago Sumerians marked tax records onto dried mud tablets.

• Scientists have looked through microscopes and telescopes and drawn what they saw.

• Market researchers ran surveys or had TV diaries

• Medical laboratories take dozens of measurements per patient

Paul Kennedy - [email protected]

8Friday, 5 July 2013

Page 9: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data

• Analysing

• Since then, people have sought ways to use the recorded information to improve their lives (financially, health, ...)

• Understanding

• People can understand these amounts of data.

• But nowadays, there is a data explosion.

Paul Kennedy - [email protected]

9Friday, 5 July 2013

Page 10: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data explosion• Most data now goes straight to computers without

humans seeing them.

• Tax records submitted electronically

• Telescopes operated remotely and digital images goes to computer files.

• Market and POS data go to data warehouses.

• High throughput technology make simultaneous measurements of 1000s of genes per patient.

• This deluge of data is useless to unaided people!

Paul Kennedy - [email protected]

10Friday, 5 July 2013

Page 11: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

TechAmerica Foundation: Federal Big Data Commission !

Cover Page

A Practical Guide To Transforming The Business of Government

DEMYSTIFYINGBIG DATA

"#$%&#$'()*(+$,-./$#0,&(1234'&502467(1$'$#&8(90:(;&5&(<2//077024

Big Data ...• Huge global interest

currently.

• Obama administration in 2011 announced $200m for Big Data R&D in US

• TechAmerica Foundation released report describing “transformational” power of Big Data and recommendations for training huge number of data scientist & analysts urgently needed.

Paul Kennedy - [email protected]

Source: http://www.techamericafoundation.org/bigdata

11Friday, 5 July 2013

Page 12: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Is it really an “explosion”?

• 2011: 1.8 zetabytes of information created globally and expected to double each year

• = 200 billion 2-hour HD movies that one person could watch for 47 million years straight!

• From sensors, satellites, social media, mobile comms, email, RFID and enterprise applications.

• Source: Demystifying Big Data, TechAmerica Foundation, 2012.

Paul Kennedy - [email protected]

12Friday, 5 July 2013

Page 13: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Analytics Successes

Paul Kennedy - [email protected]

13Friday, 5 July 2013

Page 14: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Helping to catch the backpacker killer

• Australia’s most notorious serial murder case

• Early 1990s, 7 young backpackers murdered.

• Police had developed a profile.

• Huge dataset generated of vehicle records, gym memberships, gun licensing and police records.

• Link analysis software from Sydney company NetMap Analytics, narrowed list of suspects from 18 million to 32, which included the murderer: Ivan Milat.

Paul Kennedy - [email protected]

14Friday, 5 July 2013

Page 15: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Predicting the 2012 US election result

• Nate Silver used predictive analytics & statistics to correctly predict outcomes of 50 out of 50 states from polling and related data.

• Republican pundits were confident in their landslide-win predictions. Democrat pundits predicted razor-thin victory.

• Shows the power of a data-centric approach over “gut-feeling”.

Paul Kennedy - [email protected]

15Friday, 5 July 2013

Page 16: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

How does it fit to business?

Paul Kennedy - [email protected]

16Friday, 5 July 2013

Page 17: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Fitting to the business• Understand the business context, and stronger, framing

a business question.

• Translating the business question into a data analytics question.

• Collecting, understanding and processing data from across the business and possibly externally.

• Build models and evaluate them.

• Deploying the results in the business to deliver benefits.

• Iterative process.

Paul Kennedy - [email protected]

17Friday, 5 July 2013

Page 18: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Fitting to the business

Mathematical Model

Predict ‘class’ of unseen rows

e.g. customers

Find relationships between rows or

columns

e.g. to target

e.g. customer groupsPaul Kennedy - [email protected]

18Friday, 5 July 2013

Page 19: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Two main approaches

• Unsupervised methods

• Model tries to make sense of the data set or characterise it.

• Supervised methods

• Model learns a relationship between inputs and outputs from historical data.

• Model can then be used to predict output for new data.

Paul Kennedy - [email protected]

19Friday, 5 July 2013

Page 20: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Fitting to the business

Mathematical Model

Predict ‘class’ of unseen rows

e.g. customers

Find relationships between rows or

columns

e.g. to target

e.g. customer groupsPaul Kennedy - [email protected]

20Friday, 5 July 2013

Page 21: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Warehousing to Data Mining

• Data Warehouse: an organisation-wide integrated access to a centralised repository + data models

• On-Line Analytic Processing (OLAP):

• statistical summaries and basic analytical modeling

• build and cache fixed ‘cubes’ (business intelligence)

• restructure data for efficient analysis

• Fast summarisation and aggregation at different levels

Paul Kennedy - [email protected]

21Friday, 5 July 2013

Page 22: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Mining to Knowledge Discovery• Data: raw uninterpreted facts

e.g. Tom, 20 years old, student

• Information relates items of Data togethere.g. Tom is 20 years old

• Knowledge relates items of Information togetherTom is 20 years old → Tom pays > $1500 insurance

• Modeling the world (= generalising)[18 - 25] years old → P(accident) = high

Paul Kennedy - [email protected]

22Friday, 5 July 2013

Page 23: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Mining - a Business Intelligence view

Data Mining

Data mining problem(s)

PatternsBusiness

IntelligenceBusiness Problem

Paul Kennedy - [email protected]

23Friday, 5 July 2013

Page 24: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Mining - a Business Intelligence view

Data Mining

Data mining problem(s)

PatternsBusiness

IntelligenceBusiness Problem

Domain Domain

Paul Kennedy - [email protected]

24Friday, 5 July 2013

Page 25: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Mining - a Business Intelligence view

Data Mining

Data mining problem(s)

PatternsBusiness

IntelligenceBusiness Problem

Domain Domain

Data & Information Visualisation

Data Warehousing

Methods and Frameworks

Knowledge Discovery Techniques

Paul Kennedy - [email protected]

25Friday, 5 July 2013

Page 26: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

CRISP-DM viewPaul Kennedy - [email protected]

Source: Kenneth Jensen / Wikimedia Commons / Public Domain

26Friday, 5 July 2013

Page 27: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

DA professionals

Paul Kennedy - [email protected]

27Friday, 5 July 2013

Page 28: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

The rising profession of Data Analyst

• “Data mining as a profession is definitely growing because data is growing. Data is becoming more and more usable because of data warehousing (where information from many locations can be centrally mined). So the only way is up.” - Eugene Dubossarsky (Ernst & Young)

• If you are looking for a career where your services will be in high demand, you should find something where you provide a scarce, complementary service to something that is getting ubiquitous and cheap. So what’s getting ubiquitous and cheap? Data. And what is complementary to data? Analysis. - Prof. Hal Varian, UC Berkeley, Chief Economist at Google.

• The ATO has a network of 30+ data miners working with another 70 or so analytics staff. - Dr Warwick Graco, Australian Taxation Office

Paul Kennedy - [email protected]

28Friday, 5 July 2013

Page 29: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Miners / Data Analysts

• Typical data mining jobs pay six-figure salaries. The required blend of skills makes good data miners a rare breed. - Ronnie Chan, senior IT specialist IBM's DB2 team

• Data miners are the SAS of the IT industry, and it's not a job for beginners. Demand is strong for people who have the technical skills combined with business knowledge. “To produce useable results, data miners must draw on advanced analytical approaches such as predictive modelling, association discovery and sequence discovery.” - Peter Norris, Business ManagerComputer Associates

Paul Kennedy - [email protected]

29Friday, 5 July 2013

Page 30: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

10 Hot IT Skills for 2013

• ComputerWorld, 24/9/12

• #5 Business Intelligence / Analytics

• “Big data is one of the top priorities for many companies, but getting the right people to analyze all that information is challenging, says Jerry Luftman, managing director at the Global Institute for IT Management and a leader in the Society for Information Management.

• The best candidates have technical know-how, business knowledge and strong statistical and mathematical backgrounds -- an uncommon mix of skills, Luftman says. In fact, some companies are hiring statisticians and teaching them about technology and business.”

Paul Kennedy - [email protected]

30Friday, 5 July 2013

Page 31: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Gartner Top 10 Strategic Technology Trends for 2013• Gartner identifies the Top 10 Strategic

Technology Trends for 2013, October 23, 2012

• Of the 10 strategic trends, two were for data analytics.

• Strategic Big Data

• Actionable Analytics

Paul Kennedy - [email protected]

31Friday, 5 July 2013

Page 32: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Gartner Top 10 Strategic Technology Trends for 2013• Strategic Big Data

• “Big Data is moving from a focus on individual projects to an influence on enterprises’ strategic information architecture. Dealing with data volume, variety, velocity and complexity is forcing changes to many traditional approaches. This realization is leading organizations to abandon the concept of a single enterprise data warehouse containing all information needed for decisions. Instead they are moving towards multiple systems, including content management, data warehouses, data marts and specialized file systems tied together with data services and metadata, which will become the "logical" enterprise data warehouse.”

Paul Kennedy - [email protected]

32Friday, 5 July 2013

Page 33: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Gartner Top 10 Strategic Technology Trends for 2013• Actionable Analytics

• “Analytics is increasingly delivered to users at the point of action and in context. With the improvement of performance and costs, IT leaders can afford to perform analytics and simulation for every action taken in the business. The mobile client linked to cloud-based analytic engines and big data repositories potentially enables use of optimization and simulation everywhere and every time. This new step provides simulation, prediction, optimization and other analytics, to empower even more decision flexibility at the time and place of every business process action.” 

Paul Kennedy - [email protected]

33Friday, 5 July 2013

Page 34: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Institute of Analytics Professionals of Australia

• “Our mission is to unite, inform, support and promote analytics professionals in Australia. We provide information sources, a virtual community, a networking hub and a professional identity. We promote the benefits of analytics in modern business.”

• www.iapa.org.au

Paul Kennedy - [email protected]

34Friday, 5 July 2013

Page 35: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Privacy

• Privacy is important and it is an ethical concern for data analysts.

• Laws directly govern data mining in Australia and overseas.

• Some basic principles from OECD:

• Collection limitation: data should be obtained lawfully and fairly

• Data quality: data should be relevant to the stated purposes, accurate, complete and up-to-date.

• Purpose specification: should give purpose for use of data and data should be destroyed if it no longer serves the purpose.

• Use limitation: use of data for other purposes than specified is forbidden

Paul Kennedy - [email protected]

35Friday, 5 July 2013

Page 36: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Some examples

Paul Kennedy - [email protected]

36Friday, 5 July 2013

Page 37: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Market analysis & management

• Data sources?

• Credit card transactions, loyalty cards, discount coupons, customer complaint calls, social media, plus (public) lifestyle studies

• Target marketing

• Find clusters of ‘model’ customers who share same characteristics: interest, income level, spending habits, etc.

• Determine customer purchasing patterns over time

• e.g. conversion of single to joint bank account: marriage, ...

• Cross-market analysis

• Associations / co-relations between product sales

• Prediction based on the association information.Paul Kennedy - [email protected]

37Friday, 5 July 2013

Page 38: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Market analysis & management (cont’d)

• Customer profiling

• Data analytics can tell you what types of customers buy what products (clustering or classification)

• Identifying customer requirements

• Identifying the best products for different customers

• Use prediction to find what factors will attract new customers.

• Provide summary information

• Various multidimensional summary reports

• Statistical summary information (mean and variance ...)Paul Kennedy - [email protected]

38Friday, 5 July 2013

Page 39: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Links to other topics

Paul Kennedy - [email protected]

39Friday, 5 July 2013

Page 40: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Databases

Data Warehouse

Task-relevant Data

Patterns

Knowledge

Note: iterative processnot waterfall!

DataCleaning &Integration

DataSelection

DataMining

PatternEvaluation

The KnowledgeDiscoveryProcess

Paul Kennedy - [email protected]

40Friday, 5 July 2013

Page 41: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

The KDD Process• Learn the application domain (prior knowledge & goals)

• Create target data set: data selection

• Data cleaning and preprocessing (may take 60% of effort!)

• Data reduction and transformation

• Find useful features, dimensionality/variable reduction, invariant representation

• Choose functions of data mining: the “data mining problem”

• Summarisation, classification, regression, association, clustering

• Choose the data mining algorithm(s)

• Data Mining: find patterns of interest

• Pattern evaluation and knowledge presentation

• Visualisation, transformation, remove redundant patterns, ...

• Use of discovered knowledgePaul Kennedy - [email protected]

41Friday, 5 July 2013

Page 42: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Mining

Other Disciplines

Information Science

VisualisationArtificial

Intelligence

StatisticsDatabase

Technology

•HCI•High Perfomance Computing•Software Engineering

42Friday, 5 July 2013

Page 43: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Database technology

• OLTP → OLAP →OLAM

• Data Warehouses

• Subject-oriented, integrated, time-variant, non-volatile

• Excellent starting point for data mining

• Data Marts: specialised, smaller data store

• OLAP: drill-down, roll-up, slice-n-dice, data cubes

Paul Kennedy - [email protected]

43Friday, 5 July 2013

Page 44: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

OLAP vs Data MiningOLAP - On-Line Analytical Processing

• Emphasis on Query

• Generally know what you want to find.

• Expressible in SQL

• Drill-down, data cubes

Data Mining

Emphasis on Exploration

General idea of target but not how to find.

Let the machine drive the exploration

Paul Kennedy - [email protected]

44Friday, 5 July 2013

Page 45: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Statistics• Data, Counting, Probabilities, Hypothesis Testing

• Correlation and regression analyses

• Exploratory data analysis

• Predictive models

• CART : Classification And Regression Trees

• MARS: Multi Adaptive Regression Splines

• TreeNet

• Random Forest

• Important foundations for data mining and knowledge discovery

• Ensemble methods

• Computational requirements → Sampling

Paul Kennedy - [email protected]

45Friday, 5 July 2013

Page 46: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Artificial Intelligence (AI)

• Brings to data analytics

• The inductive approach (machine learning) - the design cycle for predictive modeling

• Knowledge representation

• Inference

• Generalisation: everyone who drank beer in Sydney in 1900 is now dead.

• Inference: Therefore, beer is fatal.

• Warning: it’s easy to get into a similar situation in data analytics!

• Uses Data Analytics

• e.g. as supporting components in multi-agent systems.

• e.g. in multi-agent electronic markets: negotiation agents request information about their opponents & text mining bots deliver that kind of information.

Paul Kennedy - [email protected]

46Friday, 5 July 2013

Page 47: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Artificial Intelligence (AI)

• The design cycle for predictive modeling

• Issues:

• Algorithms developed for toy datasets (< few hundred points)

• Prior knowledge (e.g. bias)

• Model deviation from true model

• Sampling distributions

• Computational complexity

Collect data

Select features

Select model type

"Train" classifier

Evaluate classifier

Paul Kennedy - [email protected]

47Friday, 5 July 2013

Page 48: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Visualisation• Deals with visual

presentation of the data.

• “A picture is worth a thousand words” - true?

• Taps into human strengths

• In Data Analytics

• Understanding data

• Visualising the process

• Visualising and communicating the results

48Friday, 5 July 2013

Page 49: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Overview of approaches

Paul Kennedy - [email protected]

49Friday, 5 July 2013

Page 50: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Analytics: Techniques (unsupervised)

• Association analysis (correlation and causality)

• Identify attribute-value conditions that frequently occur in the data

• Examples:

• age(P, “20..29”) ^ income(P, “20..29K”) → buys(P, “DVDs”)[support = 2%, confidence = 60%]

• contains(T, “MP3 player”) → contains(T, “sound processing software”)[1%, 75%]

• Support: fraction of data with ‘attribute’ and ‘value’.

• Confidence: fraction of data with ‘attribute’ where the rule holds (i.e. where attribute → value.

Paul Kennedy - [email protected]

50Friday, 5 July 2013

Page 51: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Analytics: Techniques (unsupervised)

• Clustering (cluster analysis)

• Identify groups within data where data points in the group are similar to one another but different to those in other groups.

• Identify groups within data that maximise intraclass similarity and minimise interclass similarity.

• Examples:

• cluster crime locations based on characteristics of the crimes.

• cluster students based on their marks in assignments for all the core subjects of their degree.

• Building models from unlabelled data: unsupervised learning Paul Kennedy - [email protected]

51Friday, 5 July 2013

Page 52: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Analytics: Techniques (supervised)• Classification and Prediction

• Using historical data find a model which describes and distinguishes data classes or concepts for the purpose of using the model to classify or predict the class of unknown entities.

• Examples:

• Build a model to classify countries based on climate or cars based on engine efficiency and on-road behaviour.

• Build a model to predict whether customer are likely to purchase a download of a particular music file.

• Build a model to predict the grade (Z, P, C, D, H) of a student based on students who previously did a subject.

• Building models from labelled data: supervised learning.

Paul Kennedy - [email protected]

52Friday, 5 July 2013

Page 53: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Data Analytics: Techniques

• Outlier analysis

• Identify entities that are different to other entities or to a model of data.

• i.e. Find exceptions to the rule!

• Example: odd patterns can be easily hidden among 10 million transactions, but may indicate fraud.

• Usually statistics consider them as noise or an exception.

• Data analytics: rare and unusual events or items are generally interesting.

• Time-series analysis

• Identify similar patterns over time - trends, deviation, sequential patterns, periodicity analysis

• Example: predicting trends in share pricesPaul Kennedy - [email protected]

53Friday, 5 July 2013

Page 54: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Understandableby Humans

“Understandable”by Computers

Association Rules

Bayesian Networks

Decision Trees

Neural Networks

Paul Kennedy - [email protected]

54Friday, 5 July 2013

Page 55: 25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy PowerpointAda intro-kennedy-slides

Questions ...

Paul Kennedy - [email protected]

55Friday, 5 July 2013