No big data without small data

18
No Big Data without Small Data Norman Manley IT Analyst

Transcript of No big data without small data

No Big Data without Small Data

Norman ManleyIT Analyst

15/04/2023 2

Big Data - a definition

Big data is the name we give to a collection of data so large and complex that that it can’t be processed using traditional IT applications and programs. In general the volume is at least 1000 times larger than traditional sources of data.

© Decision Support Systems 2014

“You can have data without information but you can’t have information without data.” Daniel Keys Moran – American author

15/04/2023 © Decision Support Systems 2014 3

Small Data -a definition

Data (small data) is a synonym for facts; and Merriam-Webster defines it as:“facts or information used usually to calculate, analyse, or plan something”.

But if the facts are not right then the analysis will be incorrect and we will make the wrong decisions!

Invoice Date Customer Country Euros excl VAT VAT Total Invoice Number Payment

22/01/2014 Mondea Netherlands 795.00 166.95 961.95 2014.416 iDeal24/01/2014 Physter Technology Czech Republic 795.00 0 795.00 2014.417 Visa27/01/2014 Copenhagen Airports A/S Denmark 795.00 166.95 961.95 2014.421 MC28/01/2014 Vista Group Finland 575.00 120.75 695.75 2014.423 MC07/02/2014 Global Information USA 709.01 0 709.01 2014.441 Invoice14/02/2014 DataPad Inc United States 795.00 0 795.00 2014.451 MC21/02/2014 Scrip Companies USA 795.00 795.00 2014.464 PayPal

15-04-2023 © Decision Support Systems 2014 4

the creation of a consistent, accurate and timely source of processed data (information) that can be used to support the decision making process

the creation of an historic information source which can be used uniquely as the basis of both comparative and predictive analysis

the integration of data from different sources (both internal and external)

the creation of “one source of the truth” which we need as the basis of making better decisions

The goals of working with data

15-04-2023 © Decision Support Systems 2014 5

Where does the data come from?

Small Data - structured Internal applications Spreadsheets!

Big Data – often unstructured Organizational processes:

measurements, websites, machines

Communication:e-mail, reports, presentations

Social media:Facebook, LinkedIn, Twitter

Sensors:temperature, weather, traffic, rainfall

Archives:old documents, old films

15/04/2023 © Decision Support Systems 2014 6

Unstructured data - a definition

Unstructured data is not directly accessible in a database. Examples are various sorts of documents like Office documents, PDF, XML, email messages, pictures , videos and sound clips. The contents are often dates, numbers and other facts but are difficult to interpret directly with the current technology

A Letter from the Chairman of IBM

The market for data and analytics is estimated at $187 billion by 2015. To capture this growth potential, we have built the world’s broadest and deepest capabilities in Big Data and analytics—both technology and expertise. We have invested more than $24 billion,including $17 billion of gross spend on more than 30 acquisitions. We have 15,000 consultants and 400 mathematicians. Two thirds of IBM Research’s work is now devoted to data, analytics and cognitive computing. IBM has earned 4,000 analytics patents.

15/04/2023 © Decision Support Systems 2014 7

Big Data is an addition

Big Data is an additional source, not something that just exists independently

the goal is to complement the existing data“revenue” from Big Data must have the same

definition as “revenue” from Small Dataquality is just as important; if that is not the

case Big Data is just a lot of Bad Data

What we call things is important

15/04/2023 © Decision Support Systems 2014 8

How much didI sell?

How much canI book?

Revenue

= € 100,000 = € 96,422

15/04/2023 © Decision Support Systems 2014 9

Just like most of the other IT analysts I am convinced that data quality forms a huge risk for our decision making processes – the problem is that the quality of the data is so bad that we can’t prove it!

Norman Manley, IT analyst

Data quality – a problem?

15-04-2023 © Decision Support Systems 2014 10

the files have many different formats which makes them very difficult to read

it is often unclear what the contents of a field are (and also what they mean)

privacy is a problem – are we allowed to see some things and are we allowed to do anything with the data?

data is often missing (both individual fields and parts of files)

data is not up to date

Small data – what are the problems?

ETL Process - the basic elements

15-04-2023 © Decision Support Systems 2014 11

15/04/2023 © Decision Support Systems 2014 12

Big Data:What can we use it for?

How useful is big data?

It seems that the 4 engines of a Boeing 747 generate more data on a flight to New York than most companies in do a year.

The question remains, do we need to save all the data, for how long, what are we going to do with it and can we generate “actionable information”?

15-04-2023 © Decision Support Systems 2014 13

15-04-2023 © Decision Support Systems 2014 14

Very Big Data!

At up to 500MB per flight this is a huge amount of data

15/04/2023 © Decision Support Systems 2014 15

Big Data successesVestas, a Danish wind turbine manufacturer collects data from 35,000 meteorological stations and 45,000 of their own turbines. This allows them to choose the best locations, in terms of wind conditions, for placing new windmills. They expect to collect as much as 24 petabytes of data (they already have 2.8 petabytes). The time needed to analyse the suitability of a new location has reduced form several weeks to 15 minutes.

15/04/2023 © Decision Support Systems 2014 16

Big Data successesLos Angeles and Santa Cruz police, together with PredPol (a software vendor) and a mathematician from the University of Santa Clara have developed a system that can predict where criminal activity will take place, accurate to an area of 50 m2. A combination of historical data and feeds from “live” cameras is used to predict where the local police should patrol to prevent (amongst other crimes) burglary. The number of burglaries has been reduced by 33% in the last year. This is called “predictive policing”

15/04/2023 © Decision Support Systems 2014 17

Conclusionsif Small Data doesn’t work properly then Big

Data has no chanceBig Data in itself has no value - but it does give

us the possibility to generate new insightsthe crux is accuracy– bad data quality leads to

information that is even worse

15/04/2023 © Decision Support Systems 2014 18

"Not everything that can be counted counts, and not everything that counts can be counted."

William Bruce Cameron “Informal Sociology: A Casual Introduction to Sociological Thinking” 1963