BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop...

12
1 BIG DATA DIPLOMA 1) Learning Methodology Instructor-Led Classroom Training (ILT). 2) Prerequisites: Basic skills with at least one programming language are desirable. 3) Training Program Description: the capability of collecting and storing huge amounts of versatile data necessitate the development and use of new techniques and methodologies for processing and analyzing big data. this course provides a comprehensive covering of a number of technologies that are at the foundation of the big data movement. the Hadoop architecture and ecosystem of tools provides an introduction to machine learning and statistical data analysis. The course provides an introduction to the basic probability theory, statistics, and statistical data analysis. Topics such as parameter estimation, hypothesis testing and regression analysis will be covered in the course. In addition, the course will focus on machine learning topics including Bayes classifiers, K-nn, decision trees, SVM, K-means, principal component analysis, independent component analysis and Neural Nets. develop on their combined knowledge of Big Data technologies (e.g. Hadoop, Spark, etc.) and Data Science (e.g. Statistics, Machine Learning, etc.) and understand how such combination is used to solve real-world applications. In addition to this main goal, the program has the additional goal of familiarizing trainees with the latest technological and scientific trends in the field and how Big Data and data science are used in modern business enterprises. Use cases of real problems such as networking traffic, text analytics, and financial applications will be addressed in this course. link the machine learning theories and methods in a practical real-life use case context. The hands-on labs will reinforce) with deeper focus in applying them to enable customers realize business value. Length of Program: 150 Hrs.

Transcript of BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop...

Page 1: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

1

BIG DATA DIPLOMA

1) Learning Methodology

• Instructor-Led Classroom Training (ILT).

2) Prerequisites:

• Basic skills with at least one programming language are desirable.

3) Training Program Description:

• the capability of collecting and storing huge amounts of versatile data necessitate the development and use of new techniques and methodologies for processing and analyzing big data. this course provides a comprehensive covering of a number of technologies that are at the foundation of the big data movement. the Hadoop architecture and ecosystem of tools

• provides an introduction to machine learning and statistical data analysis. The course provides an introduction to the basic probability theory, statistics, and statistical data analysis. Topics such as parameter estimation, hypothesis testing and regression analysis will be covered in the course. In addition, the course will focus on machine learning topics including Bayes classifiers, K-nn, decision trees, SVM, K-means, principal component analysis, independent component analysis and Neural Nets.

• develop on their combined knowledge of Big Data technologies (e.g. Hadoop, Spark, etc.) and Data Science (e.g. Statistics, Machine Learning, etc.) and understand how such combination is used to solve real-world applications. In addition to this main goal, the program has the additional goal of familiarizing trainees with the latest technological and scientific trends in the field and how Big Data and data science are used in modern business enterprises. Use cases of real problems such as networking traffic, text analytics, and financial applications will be addressed in this course.

• link the machine learning theories and methods in a practical real-life use case context. The hands-on labs will reinforce) with deeper focus in applying them to enable customers realize business value.

• Length of Program: 150 Hrs.

Page 2: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

2

4) Projects

This program is comprised of many career-oriented projects. Each project you build will be an opportunity to demonstrate what you’ve learned in the lessons. Your completed projects will become part of a career portfolio that will demonstrate to potential employers that you have skills in data analysis and feature engineering, machine learning algorithms, and training and evaluating models. One of our main goals at ETI is to help you create a job-ready portfolio of completed projects. Building a project is one of the best ways to test the skills you’ve acquired and to demonstrate your newfound abilities to future employers or colleagues. Throughout this program, you’ll have the opportunity to prove your skills by building the following projects

Building a project is one of the best ways both to test the skills you've acquired and to demonstrate your newfound abilities to future employers. Throughout this program, you'll have the opportunity to prove your skills by building the following projects:

• Project 1: Exploring the Titanic Survival Data

• Project 2: Predicting Housing Prices

• Project 3: Finding Donors for Charity

• Project 4: Creating Customer Segments Deep learning

• project 5: Dog Breed Recognition

• Project 6: Teach a Quad copter to Fly

• Project 7: Explore Weather Trends

• Project 8: Investigate a Dataset

• Project 9: Analyze Experiment Results

• Project 10: Wrangle and Analyze Data

• Project 11: Communicate Data Findings

• Project 12: Crime Prediction

• Project 13: Simulating and Predicting Traffic

• Project 14: Fraud Detection

Capstone projects in many fields

1- Business 2- Trading

Page 3: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

3

5) Training Program Curriculum: I- Introduction to Big Data, Developing with Spark and Hadoop

• Introduction to Hadoop and MapReduce SQL JOINS

o Hadoop Ecosystems

o Hadoop Clusters

o MapReduce API Concepts

o Basic Writing and testing MapReduce programs

• Hadoop API

o ToolRunner Class

o HDFS programmatically

o Using the Hadoop API s Library of Mappers, Reducers and

Practitioners

• Managing Data Input and Output

• Common MapReduce Algorithms

o Sorting and Searching Large Data Sets

o Indexing Data

o Computing Term Frequency

o Inverse Document Frequency (TF4IDF)

o Calculating Word Co4Occurrence

• Joining Data Sets in MapReduce Jobs

• Hadoop Tools for Data Acquisition

• Practical Development Tips and Techniques

o Strategies for Debugging and Testing MapReduce Code

o Reusing Objects

o Creating Map4only MapReduce Jobs

• PIG

o Complex Data Analysis with Pig

o Multi Dataset Operations with Pig

o Extending Pig

o Pig Troubleshooting and Optimization

• Hive

o Relational Data Analysis with Hive

o Hive Data Management

Page 4: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

4

o Text Processing with Hive

o Hive Optimization

o Extending Hive

• Analyzing Data with Impala

• Introduction to Spark

o Spark Basics

o Working with Resilient Distributed Datasets (RDDs)

II- Introduction to Machine Learning and Statistical Analysis

A- Introduction

▪ APPLICATIONS

▪ Relation between Statistics and Learning

▪ Supervised, Unsupervised and Reinforcement

Learning

o Linear Algebra Review

▪ Vector and Matrix Operations

▪ Matrix Inverse and Decomposition

▪ The Eigenvalue Problem

o Analysis Tools

▪ Python Programming

▪ Waikato Environment for Knowledge Analysis (WEKA)

▪ Azure Platform

B- STATISTICS ANALYSIS

o probability Theory Review

▪ Marginal and joint Probabilities

▪ Conditional Probabilities

▪ Bayes’ Rule

▪ Prior and Posterior Probabilities

▪ Probability Distributions

Page 5: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

5

▪ Expected Value, Variance and Covariance

o Statistical Parameter Estimation:

▪ Types of Estimators

▪ Random Sampling of a Population

▪ Estimation of the Mean and Variance

▪ Detection of Outliers

▪ Data representation and Visualization

o Hypothesis Testing

▪ Confidence Interval and p-value

▪ Alternative Hypotheses

▪ Z-test and T-test

o Regression Analysis

▪ Assumptions of Linear Regression

▪ Simple Linear Regression

▪ Error Analysis

C- Machine Learning

o Linear Classification:

▪ Discriminant Functions

▪ Discriminant Functions Properties

▪ Least Squares Classifier

▪ Fisher’s Linear Discriminant

▪ Perceptron

o Probabilistic Generative Models

▪ Maximum Likelihood Estimation of Gaussian

Generative Model

▪ Naive Bayes Classifier

Page 6: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

6

o Probabilistic Discriminative Models

▪ Logistic Regression

o Non-linear Classification

▪ Instance-based Learning:

▪ K-nearest Neighbor Classifier

• Cross-validation

• Weighted K-nearest Neighbor Classifier

▪ Support Vector Machines

▪ Decision Tree Learning

▪ Artificial Neural Networks:

• Network Architecture

• Back-propagation Learning

o Introduction to Reinforcement Learning

▪ Markov Decision Process

▪ Q-learning

▪ Non-deterministic Rewards and Actions

III- Advanced Big Data Analytics Technologies and Applications

• Analyzing Data with Scala and Spark

• Predicting Forest Cover with Decision Trees

• Anomaly Detection in Network Traffic with K-means

Clustering

• Understanding Wikipedia with Latent Semantic Analysis

• Analyzing Co-occurrence Networks with GraphX

• Geospatial and Temporal Data Analysis on Taxi Trip Data

• Estimating Financial Risk through Monte Carlo Simulation

Page 7: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

7

IV- Practical Data Science Using Machine Learning Techniques

• Big Data and Data Science: Use Cases

o Walk through example use cases

o The use case: solution overview and architecture The

practice of the Data Sciences vs Traditional DW/BI

o Data Sciences and Big Data Applications: Value to the

Business

• Data Preprocessing

o Features Extraction and Transformations

o Dimensionality Reduction

o Visualization and Exploratory Data Analysis

o Data Integration, Quality and Implications.

o Handling Big Datasets

• Advanced Data Analysis Methods with Applications

o Unstructured Data Methods

o Association Rules: Understand Customers Behavior

o Clustering Techniques: Optimized Logistics

o Classifications Methods: Prediction of Traffic Status

o Network Analysis Techniques: Discover Social

Patterns

o Big Graph: Analyzing Electric Power Grids

o Ensemble Learning techniques

• Practicing Big Data Sciences in Real Life

o The Data Sciences modeling lifecycle

o Machine Learning modeling for Big Data applications

o Data Sciences application implementation lifecycle

o Deep Learning for Complex Data Science Models

o Machine Learning Agile modeling approach

Page 8: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

8

o Data Driven Transformation for Organizations

o Consulting Skills for the Data Sciences and Big Data

Solutions

o Deployment Considerations for the Big Data

Platforms V- Hands-on Group Project Based on Real-life Use Case

Page 9: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

9

FOR MORE INFORMATION:

Website: https://epsiloneg.com

E-mail: [email protected]

Mobile: +2 01122885566 / +2 01011933233 / +20 2 22749985

Address: Elserag Shopping Mall, Residential Building 1,

Entrance 1, Makram Ebeid, Nasr City, cairo, Egypt

Contact US

To get more details Regarding

special discount for groups.

Page 10: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

10

CERTIFICATE

• Participants will be granted a completion certificate from Epsilon Training

Institute, Delaware, USA if they attend a minimum of 80 percent of the direct

contact hours of the Program and after fulfilling program requirements (passing

both Final Exam and Project to obtain the Certificate)

REGISTRATION PROCEDURES

• Confirmation of registration is based on receipt of a Purchase Order or

Registration Form.

• Training Program registrations will not be confirmed until registration is complete

and billing information is received in full

PAYMENT TERMS AND METHODS

• Payment must be made prior to course commencement at Epsilon Training

Center, Nasr City HQ

• In-Person o In Cash to our address: Elserag shopping mall,

Residential Building 1, Entrance 1, Floor 11 o By cheque - Payable to: Epsilon Training center

• Bank transfer to our ACC in: QNB ALAHLI Acc /20318280579-69 EGP Branch code / 00078

• Vodafone Cash to 01011933233

Page 11: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

11

REFUND

• Any cancellation must be done three (3) weeks prior to course commencement in

order to receive a full refund of paid registration fees

• A 50% Cancellation Fees will be imposed for any course cancellation received

within two (2) weeks or on the date of course commencement.

o Refund Prior 3 weeks of the training program start date, 100% Refund

o Refund Prior 2 weeks of the training program start date, 50% Refund of

training program fees

o Refund Prior 1 week of the training program start date, No Refund

• Any refund request should be requested by a documented email or in writing.

RECAP

• Recap is available for only 1 session with the available dates

• If you need to recap a session you attended already it will be paid for 200 LE per

session with the available dates

POSTPONING

• Postponing only could be before the start of the training program with minimum

10 days

Page 12: BIG DATA DIPLOMA - epsiloneg.com · I- Introduction to Big Data, Developing with Spark and Hadoop • Introduction to Hadoop and MapReduce SQL JOINS o Hadoop Ecosystems o Hadoop Clusters

12

Get in Touch

Egypt

USA

Location: Elserag Shopping Mall, Residential

Building 1, Entrance 1, Makram Ebeid,

Nasr City, cairo 11762

Location: 919 N Market St,

Wilmington, DE 19801

Telephone: +2 (011) 2288-5566 / +2 (010) 1193-

3233 / +2 02 2274 9985

Telephone: +1 (408) 641-4068

Website: https://epsiloneg.com :Website https://epsilonti.org

Email: [email protected] Email: [email protected]

CR# 118268 TAX# 672-411-008 CR# 7078427 TAX# 38-4095665