Unit-6 · HIVE is a infrastructure tool for processing the structured data. Summarization of Big...
Transcript of Unit-6 · HIVE is a infrastructure tool for processing the structured data. Summarization of Big...
Introduction2
Prepared By: Prof. V. K. Wani
Big Data
High Velocity
Huge Volume
Varity of Data
• BIG Data
Hadoop is a open source framework for big data, is used for reliable storing and processing of large scale data.
Hadoop3
Prepared By: Prof. V. K. Wani
Map reduce provides the parallel processing of hugeamount of data.
It can balance the job execution by dividing a big job intosmaller task automatically.
Load balancing and automatic recovery from failure are theimportant feature of Map reduce.
It has two functionality which are Maper & Reducer.
Map Reduce4
Prepared By: Prof. V. K. Wani
Hadoop Distributed File System
Is designed to store very large data sets reliably, and tostream those data sets at high bandwidth to userapplications
Is a Fault tolerant System
Huge Data Storage and easy access to data are mainfeatures of HDFS.
Having two main Component Name node & Data Node.
HDFS8
Prepared By: Prof. V. K. Wani
HIVE
Prepared By: Prof. V. K. Wani
10
HIVE is a infrastructure tool for processing the structured data.
Summarization of Big data, Querying and analysis are the functionality of HIVE.
It is designed for OLAP.
Querying language for hive is known as HQL or HIVEQL.
HIVE is designed by Facebook but now used by Apache Software hence now known as apache HIVE.
11
Prepared By: Prof. V. K. Wani
HIVE Cont…
Hive Execution
engine
User Interface
Web User Interface
HIVE Command
line
HD insight
Metastore
HQL Process Engine
Map Reduce
HDFS or HBASE Data Storage
PIG
Prepared By: Prof. V. K. Wani
12
Apache Pig is a platform for analyzing large data sets.
It is used for High level data processing with scalability andreliability.
Also used for data manipulation in Hadoop.
PIG Applications are
1. Web log processing
2. Data Processing for Web search platform
3. Ad-hoc queries across large data set
Etc.
It is a Combination of Hardware and software which isdesigned for analytical Processing.
Is a single platform for all types of functionality.
Its Physical Design and auto tuning between software andhardware are the main functionalities.
It Contains Operating System, Servers, Storage, Hardware,Software & Database
Designed with Massively Parallel Processing Architecture,which provide high performance and Scalability.
Data Warehouse Appliance15
Prepared By: Prof. V. K. Wani
Data Warehouse Appliance Benefits
Prepared By: Prof. V. K. Wani
17
Reduction in Administration
Reduction in Cost
Parallel Performance
Built in High Availability
Scalability
Rapid time to value
Simplified Support
1. Netezza
2. Teradata
3. Oracle Exadata
4. Datallegro
5. Vertica
6. Infobright
etc
Leading DW Appliance Vendors18
Prepared By: Prof. V. K. Wani
Asymmetric Massively Parallel processing Architecture is used in Netezza unlike Symmetric Multi processing.
It is a combination of Open and blade based servers and disk storage with data filtering properties.
Netezza is suitable for big data warehouse with quick response time.
AMPP is two tiered architecture.
Netezza19
Prepared By: Prof. V. K. Wani
Smart Change Data Capture
Prepared By: Prof. V. K. Wani
23
B
A
C
CaptureChange Table
Apply DW
It is the process of capturing the time to time changes/ Updates from data sources.
Smart Change Data Capture
Prepared By: Prof. V. K. Wani
24
Source System
Source Table
Change Table
Change set
Real Time BI
Prepared By: Prof. V. K. Wani
25
Real time system is a system that update the information assoon as it is received or generated.
Real Time BI is the process of delivering the information assoon as it appears.
Real Time means delivering the information in millisecondor few second.
It contains the current state of information.
Latency Latency is the Delay between the time of aninitiation of action and the time at which action’s impact isidentified.
Latency
Prepared By: Prof. V. K. Wani
26
1. Data Latency: Latency in Collecting Data from source.
2. Analytic Latency: Latency in accessing & analyze the data.
3. Decisional Latency: Latency in deciding the action to be taken.
Data Extraction Data Loading Data Aggregation Querying
Real Time BI Latency
Operational BI
Prepared By: Prof. V. K. Wani
27
Operational BI is also called real time BI. Is an approach to data analysis that enables decision
based on real time data that is generated and uses on dayto day basis .
Features of Operational BI1. Real time Monitoring2. Real time situation detection3. Correlations of events4. Multidimensional analysis5. Root cause analysisetc
Embedded BI
Prepared By: Prof. V. K. Wani
28
Embedded BI is useful for improving analysis andreporting.
Operational BI takes the data from local; storage whileEmbedded BI takes data from centralized storage.
Having additional functionalities such as
1. Real time Visual alerts
2. Customized and Personalized Dashboard reports
3. Tracking Key performance Indicator (KPI)
Etc..
Agile BI
Prepared By: Prof. V. K. Wani
29
It is method to reduces the time taken by traditional BI Processes.
It helps in changing and adapting business requirement.
Agile BI executes the BI processes in Overlapped manner unlike to traditional process which causes faster BI process execution.
With Agile BI the BI Functionalities can be done in small, manageable chunks using shorter development cycle.
Agile BI
Prepared By: Prof. V. K. Wani
30
Agile Development
Methodologies
Agile BI Environment
Agile Project Mgmt
Methodologies
Agile Infrastructure
Cloud Agile BI
IT Organization &
Agile BI
Agile BI Cont…
Prepared By: Prof. V. K. Wani
31
Build Data warehouse
Rearrange Data using ETL
Model Data Using Modeling Tools
Report Data Using ReportingTools
BI On Cloud
Prepared By: Prof. V. K. Wani
32
BI Applications can be hosted on Cloud this enables it to host applications on shared location.
BI On Cloud
Prepared By: Prof. V. K. Wani
33
Easy asses with optimum results
Better Returns of Investment
Lower Implementation Cost
Scalability
Flexibility
DW on Cloud
etc
BI Tools My Report
Prepared By: Prof. V. K. Wani
34
It is Reporting solution for small and medium business, its solution can be presented in Excel or Open office.
Its various modules are
1. My Report Data
2. My Report Data Run
3. My Report Builder
4. My Report Viewer
5. My Report Messenger
Penthaho
Prepared By: Prof. V. K. Wani
35
It is Business Analytics suite offers following functionalities
Data Integration
ETL
OLAP services
Data Mining
Reporting Dashboard