Unit-6 · HIVE is a infrastructure tool for processing the structured data. Summarization of Big...

37
Unit-6 Recent Trends 1 Prepared By: Prof. V. K. Wani

Transcript of Unit-6 · HIVE is a infrastructure tool for processing the structured data. Summarization of Big...

Unit-6Recent Trends

1

Prepared By: Prof. V. K. Wani

Introduction2

Prepared By: Prof. V. K. Wani

Big Data

High Velocity

Huge Volume

Varity of Data

• BIG Data

Hadoop is a open source framework for big data, is used for reliable storing and processing of large scale data.

Hadoop3

Prepared By: Prof. V. K. Wani

Map reduce provides the parallel processing of hugeamount of data.

It can balance the job execution by dividing a big job intosmaller task automatically.

Load balancing and automatic recovery from failure are theimportant feature of Map reduce.

It has two functionality which are Maper & Reducer.

Map Reduce4

Prepared By: Prof. V. K. Wani

.

Map Reduce5

Prepared By: Prof. V. K. Wani

Map Reduce Cont…

Prepared By: Prof. V. K. Wani

6

Map Reduce Cont…

Prepared By: Prof. V. K. Wani

7

Hadoop Distributed File System

Is designed to store very large data sets reliably, and tostream those data sets at high bandwidth to userapplications

Is a Fault tolerant System

Huge Data Storage and easy access to data are mainfeatures of HDFS.

Having two main Component Name node & Data Node.

HDFS8

Prepared By: Prof. V. K. Wani

HDFS Cont…9

Prepared By: Prof. V. K. Wani

HIVE

Prepared By: Prof. V. K. Wani

10

HIVE is a infrastructure tool for processing the structured data.

Summarization of Big data, Querying and analysis are the functionality of HIVE.

It is designed for OLAP.

Querying language for hive is known as HQL or HIVEQL.

HIVE is designed by Facebook but now used by Apache Software hence now known as apache HIVE.

11

Prepared By: Prof. V. K. Wani

HIVE Cont…

Hive Execution

engine

User Interface

Web User Interface

HIVE Command

line

HD insight

Metastore

HQL Process Engine

Map Reduce

HDFS or HBASE Data Storage

PIG

Prepared By: Prof. V. K. Wani

12

Apache Pig is a platform for analyzing large data sets.

It is used for High level data processing with scalability andreliability.

Also used for data manipulation in Hadoop.

PIG Applications are

1. Web log processing

2. Data Processing for Web search platform

3. Ad-hoc queries across large data set

Etc.

PIG Data Types

Prepared By: Prof. V. K. Wani

13

PIG Commands

Prepared By: Prof. V. K. Wani

14

It is a Combination of Hardware and software which isdesigned for analytical Processing.

Is a single platform for all types of functionality.

Its Physical Design and auto tuning between software andhardware are the main functionalities.

It Contains Operating System, Servers, Storage, Hardware,Software & Database

Designed with Massively Parallel Processing Architecture,which provide high performance and Scalability.

Data Warehouse Appliance15

Prepared By: Prof. V. K. Wani

Data Warehouse Appliance

Prepared By: Prof. V. K. Wani

16

Data Warehouse Appliance Benefits

Prepared By: Prof. V. K. Wani

17

Reduction in Administration

Reduction in Cost

Parallel Performance

Built in High Availability

Scalability

Rapid time to value

Simplified Support

1. Netezza

2. Teradata

3. Oracle Exadata

4. Datallegro

5. Vertica

6. Infobright

etc

Leading DW Appliance Vendors18

Prepared By: Prof. V. K. Wani

Asymmetric Massively Parallel processing Architecture is used in Netezza unlike Symmetric Multi processing.

It is a combination of Open and blade based servers and disk storage with data filtering properties.

Netezza is suitable for big data warehouse with quick response time.

AMPP is two tiered architecture.

Netezza19

Prepared By: Prof. V. K. Wani

Netezza

Prepared By: Prof. V. K. Wani

20

Teradata

21

Prepared By: Prof. V. K. Wani

Teradata Cont..

Prepared By: Prof. V. K. Wani

22

Smart Change Data Capture

Prepared By: Prof. V. K. Wani

23

B

A

C

CaptureChange Table

Apply DW

It is the process of capturing the time to time changes/ Updates from data sources.

Smart Change Data Capture

Prepared By: Prof. V. K. Wani

24

Source System

Source Table

Change Table

Change set

Real Time BI

Prepared By: Prof. V. K. Wani

25

Real time system is a system that update the information assoon as it is received or generated.

Real Time BI is the process of delivering the information assoon as it appears.

Real Time means delivering the information in millisecondor few second.

It contains the current state of information.

Latency Latency is the Delay between the time of aninitiation of action and the time at which action’s impact isidentified.

Latency

Prepared By: Prof. V. K. Wani

26

1. Data Latency: Latency in Collecting Data from source.

2. Analytic Latency: Latency in accessing & analyze the data.

3. Decisional Latency: Latency in deciding the action to be taken.

Data Extraction Data Loading Data Aggregation Querying

Real Time BI Latency

Operational BI

Prepared By: Prof. V. K. Wani

27

Operational BI is also called real time BI. Is an approach to data analysis that enables decision

based on real time data that is generated and uses on dayto day basis .

Features of Operational BI1. Real time Monitoring2. Real time situation detection3. Correlations of events4. Multidimensional analysis5. Root cause analysisetc

Embedded BI

Prepared By: Prof. V. K. Wani

28

Embedded BI is useful for improving analysis andreporting.

Operational BI takes the data from local; storage whileEmbedded BI takes data from centralized storage.

Having additional functionalities such as

1. Real time Visual alerts

2. Customized and Personalized Dashboard reports

3. Tracking Key performance Indicator (KPI)

Etc..

Agile BI

Prepared By: Prof. V. K. Wani

29

It is method to reduces the time taken by traditional BI Processes.

It helps in changing and adapting business requirement.

Agile BI executes the BI processes in Overlapped manner unlike to traditional process which causes faster BI process execution.

With Agile BI the BI Functionalities can be done in small, manageable chunks using shorter development cycle.

Agile BI

Prepared By: Prof. V. K. Wani

30

Agile Development

Methodologies

Agile BI Environment

Agile Project Mgmt

Methodologies

Agile Infrastructure

Cloud Agile BI

IT Organization &

Agile BI

Agile BI Cont…

Prepared By: Prof. V. K. Wani

31

Build Data warehouse

Rearrange Data using ETL

Model Data Using Modeling Tools

Report Data Using ReportingTools

BI On Cloud

Prepared By: Prof. V. K. Wani

32

BI Applications can be hosted on Cloud this enables it to host applications on shared location.

BI On Cloud

Prepared By: Prof. V. K. Wani

33

Easy asses with optimum results

Better Returns of Investment

Lower Implementation Cost

Scalability

Flexibility

DW on Cloud

etc

BI Tools My Report

Prepared By: Prof. V. K. Wani

34

It is Reporting solution for small and medium business, its solution can be presented in Excel or Open office.

Its various modules are

1. My Report Data

2. My Report Data Run

3. My Report Builder

4. My Report Viewer

5. My Report Messenger

Penthaho

Prepared By: Prof. V. K. Wani

35

It is Business Analytics suite offers following functionalities

Data Integration

ETL

OLAP services

Data Mining

Reporting Dashboard

Penthaho Cont…

Prepared By: Prof. V. K. Wani

36

Thank You

Prepared By: Prof. V. K. Wani

37