Big data intro.pptx
-
Upload
sreenidhi-kotha -
Category
Data & Analytics
-
view
139 -
download
0
Transcript of Big data intro.pptx
1.Introduction to Big Data
1
Glance at Data in Modern Era
2
Data Classification
Structured
• Granular Queryability
• Tables with rows & columns
• Eg: RDBMS like SQL
• Contribution: 5%
Semi-Structured
• Spectrum between Structured & Unstructured
• Contains tags, schema contained within the data
• Eg:XML,JSON,NO SQL
Unstructured
• Not Queryable
• Eg: Audio,Videos,Text,Images,E-Mail
• Contribution : 80%
3
Overview of Big Data
• What?
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.
• Why?
i. Data sets so complex and huge that it becomes tough to process by making use of traditional data processing methods.
ii. Warrants innovative solutions for a variety of new and existing data to provide real business benefits.
• Where?
i. Analyse for insights that lead to better decisions and strategic business moves.
ii. Processing large volumes or wide varieties of data remains merely a technological solution unless it is tied to business goals and objectives.
iii. Larger operational efficiencies, reduced risk and cost reductions.
iv. Reveal patterns, trends and associations related to human behavior and interactions.
v. Better understand consumer habits and target marketing campaigns
4
Characteristics of Big Data
While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000’swhen industry analyst Doug Laney articulated it “
5
Big Data
Velocity
Variety
Veracity
Volume
Volume : Data will grow from 4.4 zettabytes today to
around 44 zettabytes.
Velocity: By 2020, about 1.7 megabytes of new
information will be created every second for every human
being on the planet.
Variety: Smart phones will be shipped – all packed with
sensors capable of collecting all kinds of data, not to
mention the data the users create themselves.
4 V’s
Volume
Enormous amount of data generated by machines, networks and human interaction on systems like social media.
Velocity
The pace at which data flows in from sources like business processes, machines, networks and human interaction with things like social media sites, mobile devices, etc. The flow of data is massive and continuous.
Variety
Variety refers to the many sources and types of data both structured and unstructured. Now data comes in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. This variety of unstructured data creates problems for storage, mining and analyzing data.
Veracity
Refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed
6
7