Post on 03-Mar-2021
COURSE CURRICULUM
BIG DATAHADOOP
FULL
Pre-requisites for the Big Data Hadoop Training Course?There will be no pre-requisites but Knowledge of Java/ Python, SQL, Linux will be beneficial, but not mandatory. Ducat provides a crash course for pre-requisites required to initiate Big Data training.
Apache Hadoop on AWS CloudThis module will help you understand how to configure Hadoop Cluster on AWS Cloud:
Introduction to Amazon Elastic MapReducel�
AWS EMR Clusterl�
AWS EC2 Instance: Multi Node Cluster Configurationl�
AWS EMR Architecturel�
Web Interfaces on Amazon EMRl�
Amazon S3l�
Executing MapReduce Job on EC2 & EMRl�
Apache Spark on AWS, EC2 & EMRl�
Submitting Spark Job on AWSl�
Hive on EMRl�
Available Storage types: S3, RDS & DynamoDBl�
Apache Pig on AWS EMRl�
Processing NY Taxi Data using SPARK on Amazon EMR[Type text]l�
Learning Big Data and HadoopThis module will help you understand Big Data:
Common Hadoop ecosystem componentsl�
Hadoop Architecturel�
HDFS Architecturel�
Anatomy of File Write and Readl�
How MapReduce Framework worksl�
Hadoop high level Architecturel�
MR2 Architecturel�
Hadoop YARNl�
Hadoop 2.x core componentsl�
Hadoop Distributionsl�
Hadoop Cluster Formationl�
Hadoop Architecture and HDFSThis module will help you to understand Hadoop & HDFS ClusterArchitecture:
Configuration files in Hadoop Cluster (FSimage & editlog file)l�
Setting up of Single & Multi node Hadoop Clusterl�
HDFS File permissionsl�
HDFS Installation & Shell Commandsl�
Deamons of HDFSl�
Node Managerl�
Resource Managerl�
NameNodel�
DataNodel�
� l� Secondary NameNode YARN Deamonsl�
HDFS Read & Write Commandsl�
NameNode & DataNode Architecturel�
HDFS Operationsl�
Hadoop MapReduce Jobl�
Executing MapReduce Jobl�
Hadoop MapReduce FrameworkThis module will help you to understand Hadoop MapReduce framework:
How MapReduce works on HDFS data setsl�
MapReduce Algorithml�
MapReduce Hadoop Implementationl�
Hadoop 2.x MapReduce Architecturel�
MapReduce Componentsl�
YARN Workflowl�
MapReduce Combinersl�
MapReduce Partitionersl�
MapReduce Hadoop Administrationl�
MapReduce APIsl�
Input Split & String Tokenizer in MapReducel�
MapReduce Use Cases on Data setsl�
Advanced MapReduce ConceptsThis module will help you to learn:
Job Submission & Monitoringl�
Countersl�
Distributed Cachel�
Map & Reduce Joinl�
Data Compressorsl�
Job Configurationl�
Record Readerl�
PigThis module will help you to understand Pig Concepts:
Pig Architecturel�
Pig Installationl�
Pig Grunt shelll�
Pig Running Modesl�
Pig Latin Basicsl�
Pig LOAD & STORE Operators[Type text]l�
Diagnostic Operatorsl�
DESCRIBE Operatorl�
EXPLAIN Operatorl�
ILLUSTRATE Operatorl�
DUMP Operatorl�
Grouping & Joiningl�
GROUP Operatorl�
COGROUP Operatorl�
JOIN Operatorl�
CROSS Operatorl�
Combining & Splittingl�
UNION Operatorl�
SPLIT Operatorl�
Filteringl�
FILTER Operatorl�
DISTINCT Operatorl�
FOREACH Operatorl�
l� Sorting ORDERBYFIRSTl�
LIMIT Operatorl�
Built in Fuctionsl�
EVAL Functionsl�
LOAD & STORE Functionsl�
Bag & Tuple Functionsl�
String Functionsl�
Date-Time Functionsl�
MATH Functionsl�
Pig UDFs (User Defined Functions)l�
Pig Scripts in Local Model�
Pig Scripts in MapReduce Model�
Analysing XML Data using Pigl�
Pig Use Cases (Data Analysis on Social Media sites, Banking, Stock Market & Others)l�
Analysing JSON data using Pigl�
Testing Pig Sctiptsl�
HiveThis module will build your concepts in learning:
Hive Installationl�
Hive Data typesl�
Hive Architecture & Componentsl�
Hive Meta Storel�
Hive Tables(Managed Tables and External Tables)l�
Hive Partitioning & Bucketingl�
Hive Joins & Sub Queryl�
Running Hive Scriptsl�
Hive Indexing & Viewl�
Hive Queries (HQL); Order By, Group By, Distribute By, Cluster By, Examplesl�
Hive Functions: Built-in & UDF (User Defined Functions)l�
Hive ETL: Loading JSON, XML, Text Data Examplesl�
Hive Querying Datal�
Hive Tables (Managed & External Tables)l�
Hive Used Casesl�
Hive Optimization Techniquesl�
Partioning(Static & Dynamic Partition) & Bucketingl�
Hive Joins > Map + BucketMap + SMB (SortedBucketMap) + Skewl�
Hive FileFormats ( ORC+SEQUENCE+TEXT+AVRO+PARQUET)l�
CBOl�
Vectorizationl�
Indexing (Compact + BitMap)l�
Integration with TEZ & Sparkl�
Hive SerDer ( Custom + InBuilt)l�
Hive integration NoSQL (HBase + MongoDB + Cassandra)l�
Thrift API (Thrift Server)l�
UDF, UDTF & UDAFl�
Hive Multiple Delimitersl�
XML & JSON Data Loading HIVE.l�
Aggregation & Windowing Functions in Hivel�
Hive Connect with Tableaul�
SqoopSqoop Installationl�
Loading Data form RDBMS using Sqoopl�
Sqoop Import & Import-All-Tablel�
Fundamentals & Architecture of Apache Sqoopl�
Sqoop Jobl�
Sqoop Codegenl�
Sqoop Incremental Import & Incremental Exportl�
l�Sqoop MergeImport Data from MySQL to Hive using Sqoopl�
Sqoop: Hive Importl�
Sqoop Metastorel�
Sqoop Use Casesl�
Sqoop- HCatalog Integrationl�
Sqoop Scriptl�
Sqoop Connectorsl�
FlumeThis module will help you to learn Flume Concepts:
Flume Introductionl�
Flume Architecturel�
Flume Data Flowl�
Flume Configurationl�
Flume Agent Component Typesl�
Flume Setupl�
Flume Interceptorsl�
Multiplexing (Fan-Out), Fan-In-Flowl�
Flume Channel Selectorsl�
Flume Sync Processorsl�
Fetching of Streaming Data using Flume (Social Media Sites: YouTube, LinkedIn, Twitter)l�
Flume + Kafka Integrationl�
Flume Use Casesl�
KAFKAThis module will help you to learn Kafka concepts:
Kafka Fundamentalsl�
Kafka Cluster Architecturel�
Kafka Workflowl�
Kafka Producer, Consumer Architecturel�
Integration with SPARKl�
Kafka Topic Architecturel�
Zookeeper & Kafkal�
Kafka Partitionsl�
Kafka Consumer Groupsl�
KSQL (SQL Engine for Kafka)l�
Kafka Connectorsl�
Kafka REST Proxyl�
Kafka Offsetsl�
OozieThis module will help you to understand Oozie concepts:
Oozie Introductionl�
Oozie Workflow Specificationl�
Oozie Coordinator Functional Specificationl�
Oozie H-catalog Integrationl�
Oozie Bundle Jobsl�
Oozie CLI Extensionsl�
Automate MapReduce, Pig, Hive, Sqoop Jobs using Ooziel�
Packaging & Deploying an Oozie Workflow Applicationl�
HBaseThis module will help you to learn HBase Architecture:
HBase Architecture, Data Flow & Use Casesl�
Apache HBase Configurationl�
HBase Shell & general commandsl�
HBase Schema Designl�
HBase Data Modell�
HBase Region & Master Serverl�
HBase & MapReducel�
l� Bulk Loading in HBase Create, Insert, Read Tables in HBasel�
HBase Admin APIsl�
HBase Securityl�
HBase vs Hivel�
Backup & Restore in HBasel�
Apache HBase External APIs (REST, Thrift, Scala)l�
HBase & SPARKl�
Apache HBase Coprocessorsl�
HBase Case Studiesl�
HBase Trobleshootingl�
Data Processing with Apache SparkSpark executes in-memory data processing & how Spark Job runs faster then Hadoop MapReduce Job. Course will also help you understand the Spark Ecosystem & it related APIs like Spark SQL, Spark Streaming, Spark MLib, Spark GraphX & Spark Core concepts as well.This course will help you to understand Data Analytics & Machine Learning algorithms applying to various datasets to process & to analyze large amount of data.
Spark RDDs.l�
Spark RDDs Actions & Transformations.l�
Spark SQL : Connectivity with various Relational sources & its convert it into Data Frame using Spark SQL.l�
Spark Streamingl�
Understanding role of RDDl�
Spark Core concepts : Creating of RDDs: Parrallel RDDs, MappedRDD, HadoopRDD, JdbcRDD.l�
Spark Architecture & Components.l�
Project #1: Working with MapReduce, Pig, Hive & FlumeProblem Statement : Fetch structured & unstructured data sets from various sources like Social Media Sites, Web Server & structured source like MySQL, Oracle & others and dump it into HDFS and then analyze the same datasets using PIG,HQL queries & MapReduce technologies to gain proficiency in Hadoop related stack & its ecosystem tools.Data Analysis Steps in :
Dump XML & JSON datasets into HDFS.l�
Convert semi-structured data formats(JSON & XML) into structured format using Pig,Hive & MapReduce.l�
Push the data set into PIG & Hive environment for further analysis.l�
Writing Hive queries to push the output into relational database(RDBMS) using Sqoop.l�
Renders the result in Box Plot, Bar Graph & others using R & Python integration with Hadoop.l�
Project #2: Analyze Stock Market DataIndustry: FinanceData : Data set contains stock information such as daily quotes ,Stock highest price, Stock opening price on New York Stock Exchange.Problem Statement: Calculate Co-variance for stock data to solve storage & processing problems related to huge volume of data.
Positive Covariance, If investment instruments or stocks tend to be up or down during the same time l�
periods, they have positive covariance.Negative Co-variance, If return move inversely,If investment tends to be up while other is down, this l�
shows Negative Co-variance.
Project #3: Hive,Pig & MapReduce with New York City Uber TripsProblem Statement: What was the busiest dispatch base by trips for a particular day on entire month?l�
What day had the most active vehicles.l�
What day had the most trips sorted by most to fewest.l�
Dispatching_Base_Number is the NYC taxi & Limousine company code of that base that dispatched the l�
UBER.active_vehicles shows the number of active UBER vehicles for a particular date & company(base). l�
Trips is the number of trips for a particular base & date.
BIG DATA PROJECTS
Partners :
PITAMPURA (DELHI)NOIDAA-43 & A-52, Sector-16,
GHAZIABAD1, Anand Industrial Estate, Near ITS College, Mohan Nagar, Ghaziabad (U.P.)
GURGAON1808/2, 2nd floor old DLF,Near Honda Showroom,Sec.-14, Gurgaon (Haryana)
SOUTH EXTENSION
www.facebook.com/ducateducation
Java
Plot No. 366, 2nd Floor, Kohat Enclave, Pitampura,( Near- Kohat Metro Station)Above Allahabad Bank, New Delhi- 110034.
Noida - 201301, (U.P.) INDIA 70-70-90-50-90 +91 99-9999-3213 70-70-90-50-90 70-70-90-50-90
70-70-90-50-90
70-70-90-50-90
D-27,South Extension-1New Delhi-110049
+91 98-1161-2707
(DELHI)
Project #4: Analyze Tourism DataData: Tourism Data comprises contains : City Pair, seniors travelling,children traveling, adult traveling, car booking price & air booking price.Problem Statement: Analyze Tourism data to find out :
Top 20 destinations tourist frequently travel to: Based on given data we can find the most popular l�
destinations where people travel frequently, based on the specific initial number of trips booked for a particular destination
Top 20 high air-revenue destinations, i.e the 20 cities that generate high airline revenues for travel, so l�
that the discount offers can be given to attract more bookings for these destinations.Top 20 locations from where most of the trips start based on booked trip count.l�
Project #5: Airport Flight Data Analysis : We will analyze Airport Information System data that gives information regarding flight delays,source & destination details diverted routes & others.Industry: AviationProblem Statement: Analyze Flight Data to:
List of Delayed flights.l�
Find flights with zero stop.l�
List of Active Airlines all countries.l�
Source & Destination details of flights.l�
Reason why flight get delayed.l�
Time in different formats.l�
Project #6: Analyze Movie RatingsIndustry: MediaData: Movie data from sites like rotten tomatoes, IMDB, etc. Problem Statement: Analyze the movie ratings by different users to:
Get the user who has rated the most number of moviesl�
Get the user who has rated the least number of moviesl�
Get the count of total number of movies rated by user belonging to a specific occupationl�
Get the number of underage usersl�
Project #7: Analyze Social Media Channels :Facebookl�
Twitterl�
Instagram l�
YouTubel�
Industry: Social Medial�
Data: DataSet Columns : VideoId, Uploader, Internal Day of establishment of You tube & the date of l�
uploading of the video,Category,Length,Rating, Number of comments.Problem Statement: Top 5 categories with maximum number of videos uploaded.l�
Problem Statement: Identify the top 5 categories in which the most number of videos are uploaded, l�
the top 10 rated videos, and the top 10 most viewed videos.Apart from these there are some twenty more use-cases to choose: Twitter Data Analysisl�
Market data Analysisl�