Tools and techniques adopted for big data analytics
-
Upload
joseph-francis -
Category
Data & Analytics
-
view
249 -
download
1
Transcript of Tools and techniques adopted for big data analytics
WHAT IS BIG DATA?
WHY ANALYSIS ON BIG DATA IS CRUCIAL FOR VALUE BASED
SERVICES AND PRODUCTS ?
BIG DATA CHARACTERISTICS
A BRIEF HISTORY ON ORIGINS OF BIG DATA
PHASES IN BIG DATA ANALYSIS
CHALLENGES IN BIG DATA ANALYSIS
TOOLS AND TECHNIQUES FOR DATA ANALYTICS
CASE STUDIES
CONCLUSION
WHAT IS BIG DATA?
Extremely large data sets that may be analysed
computationally to reveal patterns, trends, and
associations, especially relating to human
behaviour and interactions.
WHY ANALYSIS ON BIG DATA IS CRUCIAL FOR VALUE
BASED SERVICES AND PRODUCTS ?
BUSINESS INTELLIGENCE
DECISION SUPPORT
PREDICTIVE ANALYTICS
GOVERNMENTS
HEALTHCARE
RESEARCH
MARKETTING STRATEGIES
A brief history on origins of big data
1880 - The Start of Information Overload
8years to complete US census
1932 - The Population Boom
1956 - Virtual Memory
1966 - Centralized Computing Systems Enter the Scene
1970 - Relational Database
1985 - Manufacturing Resources Planning Systems
1989 - Business Intelligence
1995 - The World Wide Web Explodes
1999 - Predictive Analysis Changes Business as Usual
http://www.winshuttle.com/big-data-timeline/
Data Acquisition and Recording
Information Extraction and Cleaning
Data Integration, Aggregation, and Representation
Query Processing, Data Modelling, and Analysis
Interpretation
Phases in Big Data analysis
Tools and Techniques
a/b testing Crowdsourcing
Genetic algorithms Machine learning
Natural language processing Time series analysis
Visualization Data mining
Association rule learning Classification tree analysis
Regression analysis
a/b testing
It is a form of statistical hypothesis testing with two
variants leading to the technical term, Two-sample hypothesis
testing, used in the field of statistics.
a = H0: NULL HYPOTHESIS
b = H1:ALTERNATE HYPOTHESIS
Crowdsourcing
Crowdsourcing represents the act of a company or institution
taking a function once performed by employees and outsourcing
it to an undefined (and generally large) network of people in the
form of an open call.
Analysis of the reviews for opinion
Analysis of the interactions for need and intent
Analysis of social network interactions
Machine learning
- scientific discipline that explores the construction and
study of algorithms.
- by building a model from example inputs and using that
to make predictions or decisions.
- by dynamic instructions.
Machine learning is closely related to and often overlaps
with computational statistics; a discipline which also specializes
in prediction-making.
Indian Elections 2014
- size of the Indian electorate. With 814 million voters, in
comparison to the USA’s 193.6 million and the UK’s 45.5
million.
0
100
200
300
400
500
600
700
800
900
INDIA USA UK
- variety of data – India’s voter rolls in 12 different
languages and 900,000 PDF’s amounting to 25
million pages made for a heterogeneous, non-
uniform and deeply diverse information set.
- the veracity of the information was often questionable
one report noted that some voters were listed as 19,545
years old, and others a confounding 0 years old. Name
overlapping (there are 327,000 women named “Sita” in
Bihar alone) only further complicated the process.
-Airbnb’s team had a hunch that better photos would
increase rentals.
-They tested the idea by putting the least effort
possible into a test that would give them valid results.
-When the experiment showed good results, they
built the necessary components and rolled it out to all
customers.
Shoppers stop
Shoppers Stop stores retails clothing,
accessories, handbags, shoes, jewelry, fragrances,
cosmetics, health and beauty products, home furnishing
and decor products.
Shoppers Stop launched its e-store with delivery
across major cities in India in 2008. The website retails
all the products available at Shoppers Stop stores,
including apparel, cosmetics and accessories. Shoppers
Stop opened stores in Amritsar, Bhopal and
Aurangabad.
After analysing its First Citizen base, the company had
observed that not all those who buy shirts also buy trousers.
But those who buy both men’s shirts and trousers
spend 60% more a year on average than those who buy only
shirts, and thrice as much as those who don’t buy men’s shirts at
all
9,00,000
- included customers who showed a pattern
of being interested in new brands in other non
trouser categories. They were sent information on
new trouser brand launches and fits.
- exhibited multiple buying patterns in
other categories. They were sent attractive deals if
they bought two or more trousers.
- “control group” to measure success or
failure of the promotions.
The campaign proved 30 % increase in sales equivalent to
30 crore