IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
-
Upload
leons-petrazickis -
Category
Technology
-
view
110 -
download
1
description
Transcript of IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
![Page 1: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/1.jpg)
Crunch Big Data in the Cloud with IBM BigInsights and Hadoop IBD-3475
Leons Petrazickis, IBM Canada
@leonsp
© 2013 IBM Corporation
![Page 2: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/2.jpg)
Please note
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
![Page 3: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/3.jpg)
First step
Request a lab environment
http://bit.ly/requestLab
![Page 4: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/4.jpg)
BigDataUniversity.com
![Page 5: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/5.jpg)
Hadoop Architecture
![Page 6: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/6.jpg)
Agenda
• Terminology review
• Hadoop architecture
– HDFS
– Blocks
– MapReduce
– Type of nodes
– Topology awareness
– Writing a file to HDFS
6
![Page 7: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/7.jpg)
7
Hadoop cluster
Rack 1
Node 2
Node n
…
Terminology review
Node 1
Node 2
Node n
…
Rack 2
Node 1
Node 2
Node n
…
Rack n
Node 1
…
![Page 8: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/8.jpg)
Hadoop architecture
• Two main components:
– Hadoop Distributed File System (HDFS)
8
– MapReduce Engine
![Page 9: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/9.jpg)
Hadoop distributed file system (HDFS)
9
• Hadoop file system that runs on top of existing file system
• Designed to handle very large files with streaming data access patterns
• Uses blocks to store a file or parts of a file
![Page 10: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/10.jpg)
HDFS - Blocks
10
• File Blocks
– 64MB (default), 128MB (recommended) – compare to 4KB in UNIX
– Behind the scenes, 1 HDFS block is supported by multiple operating system (OS) blocks
• Advantages of blocks:
– Fixed size – easy to calculate how many fit on a disk
– A file can be larger than any single disk in the network
– If a file or a chunk of the file is smaller than the block size, only needed space is used. Eg: 420MB file is split as:
• Fits well with replication to provide fault tolerance and availability
128MB 128MB 36MB 128MB
128 MB
OS Blocks
HDFS Block
![Page 11: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/11.jpg)
HDFS - Replication
• Blocks with data are replicated to multiple nodes
• Allows for node failure without data loss
11
Node 1
Node 2
Node 3
![Page 12: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/12.jpg)
MapReduce engine
12
• Technology from Google
• A MapReduce program consists of map and reduce
functions
• A MapReduce job is broken into tasks that run in
parallel
![Page 13: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/13.jpg)
Types of nodes - Overview
13
• HDFS nodes
– NameNode
– DataNode
• MapReduce nodes
– JobTracker
– TaskTracker
• There are other nodes not discussed in this course
![Page 14: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/14.jpg)
Types of nodes - Overview
14
![Page 15: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/15.jpg)
Types of nodes - NameNode
15
• NameNode
– Only one per Hadoop cluster
– Manages the filesystem namespace and metadata
– Single point of failure, but mitigated by writing state to
multiple filesystems
– Single point of failure: Don’t use inexpensive
commodity hardware for this node, large memory
requirements
![Page 16: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/16.jpg)
Types of nodes - DataNode
16
• DataNode
– Many per Hadoop cluster
– Manages blocks with data and
serves them to clients
– Periodically reports to name
node the list of blocks it stores
– Use inexpensive commodity
hardware for this node
![Page 17: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/17.jpg)
Types of nodes - JobTracker
17
• JobTracker node
– One per Hadoop cluster
– Receives job requests submitted by client
– Schedules and monitors MapReduce jobs on task
trackers
![Page 18: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/18.jpg)
Types of nodes - TaskTracker
18
• TaskTracker node
– Many per Hadoop cluster
– Executes MapReduce operations
– Reads blocks from DataNodes
![Page 19: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/19.jpg)
19
…lesson continued in the next video>
![Page 20: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/20.jpg)
Topology awareness
20
Bandwidth becomes progressively smaller in the following scenarios:
![Page 21: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/21.jpg)
Topology awareness
21
Bandwidth becomes progressively smaller in the following scenarios:
1. Process on the same node.
![Page 22: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/22.jpg)
Bandwidth becomes progressively smaller in the following scenarios:
1. Process on the same node
2. Different nodes on the same rack
Topology awareness
22
![Page 23: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/23.jpg)
Bandwidth becomes progressively smaller in the following scenarios:
1. Process on the same node
2. Different nodes on the same rack
3. Nodes on different racks in the same data center
Topology awareness
23
![Page 24: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/24.jpg)
Bandwidth becomes progressively smaller in the following scenarios:
1. Process on the same node
2. Different nodes on the same rack
3. Nodes on different racks in the same data center
4. Nodes in different data centers
Topology awareness
24
![Page 25: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/25.jpg)
Writing a file to HDFS
25
![Page 26: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/26.jpg)
Writing a file to HDFS
26
![Page 27: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/27.jpg)
Writing a file to HDFS
27
![Page 28: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/28.jpg)
Writing a file to HDFS
28
![Page 29: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/29.jpg)
Writing a file to HDFS
29
![Page 30: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/30.jpg)
Writing a file to HDFS
30
![Page 31: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/31.jpg)
Writing a file to HDFS
31
![Page 32: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/32.jpg)
Writing a file to HDFS
32
![Page 33: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/33.jpg)
Writing a file to HDFS
33
![Page 34: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/34.jpg)
Writing a file to HDFS
34
![Page 35: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/35.jpg)
Writing a file to HDFS
35
![Page 36: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/36.jpg)
Thank You
![Page 37: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/37.jpg)
What is Hadoop?
![Page 38: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/38.jpg)
Agenda
38
• What is Hadoop?
• What is Big Data?
• Hadoop-related open source projects
• Examples of Hadoop in action
• Big Data solutions and the Cloud
![Page 39: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/39.jpg)
What is Hadoop?
39
Relational Database
1GB
![Page 40: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/40.jpg)
What is Hadoop?
40
Relational Database
1GB
10GB
![Page 41: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/41.jpg)
What is Hadoop?
41
Relational Database
1GB
10GB
100GB
![Page 42: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/42.jpg)
What is Hadoop?
42
Relational Database
1GB
10GB
100GB
![Page 43: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/43.jpg)
What is Hadoop?
43
Relational Database
1TB
![Page 44: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/44.jpg)
What is Hadoop?
44
Relational Database
1TB
10TB 100TB
![Page 45: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/45.jpg)
What is Hadoop?
45
Relational Database
1TB
10TB 100TB
![Page 46: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/46.jpg)
What is Hadoop?
46
Relational Database
1TB
10TB 100TB
RFIDs
Sensors
![Page 47: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/47.jpg)
What is Hadoop?
47
• Written in Java
• Using inexpensive commodity hardware
• A variety of data (structured, unstructured, semi-structured)
• Massive amounts of data through parallelism
• Optimized to handle
• Not for OLTP, not for OLAP/DSS, good for Big Data
• Open source project
• Reliability provided through replication
• Current version: 0.20.2
• Great performance
![Page 48: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/48.jpg)
What is Big Data?
48
RFID Readers
![Page 49: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/49.jpg)
What is Big Data?
49
2 Billion internet users
![Page 50: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/50.jpg)
What is Big Data?
50
4.6 Billion mobile phones
![Page 51: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/51.jpg)
What is Big Data?
51
7TB of data processed by Twitter every day
7TB
a day
![Page 52: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/52.jpg)
What is Big Data?
52
10TB of data processed by Facebook every day
10TB
a day
![Page 53: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/53.jpg)
What is Big Data?
53
About 80% of this data is unstructured
![Page 54: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/54.jpg)
Hadoop-related open source projects
54
jaql PIG
ZooKeeper
![Page 55: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/55.jpg)
Examples of Hadoop in action – IBM Watson
55
![Page 56: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/56.jpg)
Examples of Hadoop in action
56
• In the telecommunication industry
• In the media
• In the technology industry
![Page 57: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/57.jpg)
Hadoop is not for all types of work
57
• Not to process transactions (random access)
• Not good when work cannot be parallelized
• Not good for low latency data access
• Not good for processing lots of small files
• Not good for intensive calculations with little data
![Page 58: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/58.jpg)
Big Data solutions and the Cloud
58
• Big Data solutions are more than just Hadoop
– Add business intelligence/analytics functionality
– Derive information of data in motion
• Big Data solutions and the Cloud are a perfect fit.
– The Cloud allows you to set up a cluster of systems in minutes and it’s relatively inexpensive.
![Page 59: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/59.jpg)
Thank You
![Page 60: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/60.jpg)
HDFS – Command Line
![Page 61: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/61.jpg)
Agenda
• HDFS Command Line Interface
• Examples
61
![Page 62: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/62.jpg)
HDFS Command line interface
62
• File System Shell (fs)
• Invoked as follows:
hadoop fs <args>
• Example:
Listing the current directory in hdfs
hadoop fs –ls .
![Page 63: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/63.jpg)
HDFS Command line interface
63
• FS shell commands take paths URIs as argument
• URI format:
scheme://authority/path
• Scheme:
• For the local filesystem, the scheme is file
• For HDFS, the scheme is hdfs
hadoop fs –copyFromLocal file://myfile.txt hdfs://localhost/user/keith/myfile.txt
• Scheme and authority are optional
• Defaults are taken from configuration file core-site.xml
![Page 64: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/64.jpg)
HDFS Command line interface
64
• Many POSIX-like commands
• cat, chgrp, chmod, chown, cp, du, ls, mkdir, mv, rm, stat, tail
• Some HDFS-specific commands
• copyFromLocal, copyToLocal, get, getmerge, put, setrep
![Page 65: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/65.jpg)
HDFS – Specific commands
65
• copyFromLocal / put
• Copy files from the local file system into fs
hadoop fs -copyFromLocal <localsrc> .. <dst>
hadoop fs -put <localsrc> .. <dst>
Or
![Page 66: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/66.jpg)
HDFS – Specific commands
66
• copyToLocal / get
• Copy files from fs into the local file system
hadoop fs -copyToLocal [-ignorecrc] [-crc] <src> <localdst>
hadoop fs -get [-ignorecrc] [-crc] <src> <localdst>
Or
![Page 67: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/67.jpg)
HDFS – Specific commands
67
• getMerge
• Get all the files in the directories that match the source file pattern
• Merge and sort them to only one file on local fs
• <src> is kept
hadoop fs -getmerge <src> <localdst>
![Page 68: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/68.jpg)
HDFS – Specific commands
68
• setRep
• Set the replication level of a file.
• The -R flag requests a recursive change of replication level for an entire tree.
• If -w is specified, waits until new replication level is achieved.
hadoop fs -setrep [-R] [-w] <rep> <path/file>
![Page 69: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/69.jpg)
Thank You
![Page 70: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/70.jpg)
Hadoop MapReduce
![Page 71: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/71.jpg)
Agenda
71
• Map operations
• Reduce operations
• Submitting a MapReduce job
• Distributed Mergesort Engine
• Two fundamental data types
• Fault tolerance
• Scheduling
• Task execution
![Page 72: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/72.jpg)
What is a Map operation?
72
• Doing something to every element in an array is a common operation:
var a = [1,2,3];
for (i = 0; i < a.length; i++)
a[i] = a[i] * 2;
![Page 73: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/73.jpg)
What is a Map operation?
73
• Doing something to every element in an array is a common operation:
var a = [1,2,3];
for (i = 0; i < a.length; i++)
• New value for variable a would be:
var a = [2,4,6];
a[i] = a[i] * 2;
![Page 74: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/74.jpg)
What is a Map operation?
74
• Doing something to every element in an array is a common operation:
var a = [1,2,3];
for (i = 0; i < a.length; i++)
• New value for variable a would be:
var a = [2,4,6];
This can
be written as
a function
a[i] = a[i] * 2;
![Page 75: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/75.jpg)
What is a Map operation?
75
• Doing something to every element in an array is a common operation:
var a = [1,2,3];
for (i = 0; i < a.length; i++)
• New value for variable a would be:
var a = [2,4,6];
a[i] = a[i] * 2; a[i] = fn(a[i]);
Like this,
where fn
is
a function
defined
as:
function
fn(x)
{return
x*2;}
![Page 76: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/76.jpg)
What is a Map operation?
76
• Doing something to every element in an array is a common operation:
var a = [1,2,3];
for (i = 0; i < a.length; i++)
a[i] = fn(a[i]);
Now, all of this can also be
converted into a “map” function
![Page 77: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/77.jpg)
What is a Map operation?
77
• …like this, where fn is a function passed as an argument:
function map(fn, a) {
for (i = 0; i < a.length; i++)
a[i] = fn(a[i]);
}
![Page 78: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/78.jpg)
What is a Map operation?
78
• …like this, where fn is a function passed as an argument:
function map(fn, a) {
for (i = 0; i < a.length; i++)
a[i] = fn(a[i]);
}
• You can invoke this map function like this:
map(function(x){return x*2;}, a);
![Page 79: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/79.jpg)
What is a Map operation?
79
• …like this, where fn is a function passed as an argument:
function map(fn, a) {
for (i = 0; i < a.length; i++)
a[i] = fn(a[i]);
}
• You can invoke this map function like this:
map(function(x){return x*2;}, a);
This is function fn whose definition is included in the call
![Page 80: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/80.jpg)
What is a Map operation?
80
for (i = 0; i < a.length; i++)
a[i] = a[i] * 2;
}
• In summary, now you can rewrite:
as a map operation:
map(function(x){return x*2;}, a);
![Page 81: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/81.jpg)
What is a Reduce operation?
81
• Another common operation on arrays is to combine all their values:
function sum(a) {
var s = 0;
for (i = 0; i < a.length; i++)
s += a[i];
return s;
}
![Page 82: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/82.jpg)
What is a Reduce operation?
82
• Another common operation on arrays is to combine all their values:
function sum(a) {
var s = 0;
for (i = 0; i < a.length; i++)
s += a[i];
return s;
}
This can
be written
as a
function
![Page 83: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/83.jpg)
What is a Reduce operation?
83
• Another common operation on arrays is to combine all their values:
function sum(a) {
var s = 0;
for (i = 0; i < a.length; i++)
s = fn(s,a[i]);
return s;
}
Like this, where function fn is defined so it adds its arguments: function fn(a,b){ return a+b; }
![Page 84: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/84.jpg)
What is a Reduce operation?
84
• Another common operation on arrays is to combine all their values:
function sum(a) {
var s = 0;
for (i = 0; i < a.length; i++)
s = fn(s, a[i]);
return s;
}
The whole function sum can also be rewritten so that fn is passed as an
argument
![Page 85: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/85.jpg)
What is a Reduce operation?
85
• Another common operation on arrays is to combine all their values:
function reduce(fn, a, init) {
var s = init;
for (i = 0; i < a.length; i++)
s = fn(s, a[i]);
return s;
}
Like this… The function name was changed to reduce, and now it takes
three arguments, a function, an array, and an initial value
![Page 86: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/86.jpg)
What is a Reduce operation?
86
• Another common operation on arrays is to combine all their values:
function sum(a) {
var s = 0;
for (i = 0; i < a.length; i++)
s += a[i];
return s;
}
as a reduce operation:
reduce(function(a,b){return a+b;},a,0);
![Page 87: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/87.jpg)
87
…lesson continued in the next video>
![Page 88: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/88.jpg)
Submitting a MapReduce job
88
![Page 89: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/89.jpg)
Submitting a MapReduce job
89
![Page 90: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/90.jpg)
Submitting a MapReduce job
90
![Page 91: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/91.jpg)
Submitting a MapReduce job
91
![Page 92: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/92.jpg)
Submitting a MapReduce job
92
![Page 93: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/93.jpg)
Submitting a MapReduce job
93
![Page 94: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/94.jpg)
Submitting a MapReduce job
94
![Page 95: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/95.jpg)
Submitting a MapReduce job
95
![Page 96: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/96.jpg)
Submitting a MapReduce job
96
![Page 97: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/97.jpg)
Submitting a MapReduce job
97
![Page 98: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/98.jpg)
98
…lesson continued in the next video>
![Page 99: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/99.jpg)
MapReduce – Distributed Mergesort Engine
99
![Page 100: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/100.jpg)
MapReduce – Distributed Mergesort Engine
100
![Page 101: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/101.jpg)
MapReduce – Distributed Mergesort Engine
101
![Page 102: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/102.jpg)
MapReduce – Distributed Mergesort Engine
102
![Page 103: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/103.jpg)
MapReduce – Distributed Mergesort Engine
103
![Page 104: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/104.jpg)
MapReduce – Distributed Mergesort Engine
104
![Page 105: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/105.jpg)
MapReduce – Distributed Mergesort Engine
105
![Page 106: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/106.jpg)
MapReduce – Distributed Mergesort Engine
106
![Page 107: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/107.jpg)
MapReduce – Distributed Mergesort Engine
107
![Page 108: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/108.jpg)
MapReduce – Distributed Mergesort Engine
108
![Page 109: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/109.jpg)
MapReduce – Distributed Mergesort Engine
109
![Page 110: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/110.jpg)
110
…lesson continued in the next video>
![Page 111: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/111.jpg)
Two Fundamental data types
111
Input Output
map
reduce
• Key/value pairs
• Lists
![Page 112: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/112.jpg)
Two Fundamental data types
112
Input Output
map <k1, v1>
reduce
• Key/value pairs
• Lists
![Page 113: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/113.jpg)
Two Fundamental data types
113
Input Output
map <k1, v1> list(<k2, v2>)
reduce
• Key/value pairs
• Lists
![Page 114: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/114.jpg)
Two Fundamental data types
114
Input Output
map <k1, v1> list(<k2, v2>)
reduce <k2, list(v2)>
• Key/value pairs
• Lists
![Page 115: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/115.jpg)
Two Fundamental data types
115
Input Output
map <k1, v1> list(<k2, v2>)
reduce <k2, list(v2)> list(<k3, v3>)
• Key/value pairs
• Lists
![Page 116: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/116.jpg)
Simple data flow example
116
![Page 117: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/117.jpg)
Simple data flow example
117
![Page 118: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/118.jpg)
Simple data flow example
118
![Page 119: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/119.jpg)
Simple data flow example
119
![Page 120: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/120.jpg)
Simple data flow example
120
![Page 121: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/121.jpg)
121
…lesson continued in the next video>
![Page 122: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/122.jpg)
Fault tolerance
122
![Page 123: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/123.jpg)
Fault tolerance
123
• Task Failure
![Page 124: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/124.jpg)
Fault tolerance
124
• Task Failure
• If a child task fails, the child JVM reports to the TaskTracker before it exits. Attempt is marked failed, freeing up slot for another task.
![Page 125: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/125.jpg)
Fault tolerance
125
• Task Failure
• If a child task fails, the child JVM reports to the TaskTracker before it exits. Attempt is marked failed, freeing up slot for another task.
• If the child task hangs, it is killed. JobTracker reschedules the task on another machine.
![Page 126: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/126.jpg)
Fault tolerance
126
• Task Failure
• If a child task fails, the child JVM reports to the TaskTracker before it exits. Attempt is marked failed, freeing up slot for another task.
• If the child task hangs, it is killed. JobTracker reschedules the task on another machine.
• If task continues to fail, job is failed.
![Page 127: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/127.jpg)
Fault tolerance
127
• TaskTracker Failure
![Page 128: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/128.jpg)
Fault tolerance
128
• TaskTracker Failure
• JobTracker receives no heartbeat
![Page 129: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/129.jpg)
Fault tolerance
129
• TaskTracker Failure
• JobTracker receives no heartbeat
• Removes TaskTracker from pool of TaskTrackers to schedule tasks on.
![Page 130: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/130.jpg)
Fault tolerance
130
• TaskTracker Failure
• JobTracker receives no heartbeat
• Removes TaskTracker from pool of TaskTrackers to schedule tasks on.
• JobTracker Failure
![Page 131: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/131.jpg)
Fault tolerance
131
• TaskTracker Failure
• JobTracker receives no heartbeat
• Removes TaskTracker from pool of TaskTrackers to schedule tasks on.
• JobTracker Failure
• Singe point of failure. Job fails
![Page 132: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/132.jpg)
132
…lesson continued in the next video>
![Page 133: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/133.jpg)
Scheduling
133
![Page 134: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/134.jpg)
Scheduling
134
• FIFO scheduler (with priorities)
![Page 135: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/135.jpg)
Scheduling
135
• FIFO scheduler (with priorities)
• Each job uses the whole cluster, so jobs wait their turn.
![Page 136: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/136.jpg)
Scheduling
136
• FIFO scheduler (with priorities)
• Each job uses the whole cluster, so jobs wait their turn.
• Fair scheduler
![Page 137: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/137.jpg)
Scheduling
137
• FIFO scheduler (with priorities)
• Each job uses the whole cluster, so jobs wait their turn.
• Fair scheduler
• Jobs placed in pools. If a user submits more jobs than another user, he will not get any more cluster resources than the other user, on average. Can define custom pools with guaranteed minimum capacity.
![Page 138: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/138.jpg)
Scheduling
138
• FIFO scheduler (with priorities)
• Each job uses the whole cluster, so jobs wait their turn.
• Fair scheduler
• Jobs placed in pools. If a user submits more jobs than another user, he will not get any more cluster resources than the other user, on average. Can define custom pools with guaranteed minimum capacity.
• Capacity scheduler
![Page 139: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/139.jpg)
Scheduling
139
• FIFO scheduler (with priorities)
• Each job uses the whole cluster, so jobs wait their turn.
• Fair scheduler
• Jobs placed in pools. If a user submits more jobs than another user, he will not get any more cluster resources than the other user, on average. Can define custom pools with guaranteed minimum capacity.
• Capacity scheduler
• Allows Hadoop to simulate, for each user, a separate MapReduce cluster with FIFO scheduling.
![Page 140: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/140.jpg)
Task execution
140
![Page 141: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/141.jpg)
Task execution
141
• Speculative Execution
![Page 142: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/142.jpg)
Task execution
142
• Speculative Execution
• Job execution is time sensitive to slow-running tasks. Hadoop detects slow-running tasks and launches another, equivalent task as a backup. The output from the first of these tasks to finish is used.
![Page 143: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/143.jpg)
Task execution
143
• Speculative Execution
• Job execution is time sensitive to slow-running tasks. Hadoop detects slow-running tasks and launches another, equivalent task as a backup. The output from the first of these tasks to finish is used.
• Task JVM Reuse
![Page 144: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/144.jpg)
Task execution
144
• Speculative Execution
• Job execution is time sensitive to slow-running tasks. Hadoop detects slow-running tasks and launches another, equivalent task as a backup. The output from the first of these tasks to finish is used.
• Task JVM Reuse
• Tasks run in their own JVMs for isolation. Jobs that have a large number of short-lived tasks or tasks with lengthy initialization can benefit from sequential JVM reuse through configuration.
![Page 145: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/145.jpg)
Thank You
![Page 146: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/146.jpg)
Pig, Hive, and JAQL
![Page 147: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/147.jpg)
Agenda
147
• Overview
• Pig
• Hive
• Jaql
![Page 148: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/148.jpg)
Agenda
148
• Overview
• Pig
• Hive
• Jaql
![Page 149: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/149.jpg)
Similarities of Pig, Hive and Jaql
149
All translate their respective high-level languages to MapReduce jobs
All offer significant reductions in program size over Java
All provide points of extension to cover gaps in functionality
All provide interoperability with other languages
None support random reads/writes or low-latency queries
![Page 150: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/150.jpg)
Comparing Pig, Hive, and Jaql
150
Pig Hive Jaql
Developed by Yahoo! Facebook IBM
Language name Pig Latin HiveQL Jaql
Type of language Data flow
Declarative
(SQL dialect) Data flow
Data structures it
operates on Complex
Geared
towards
structured data
Loosely structured
data, JSON
Schema optional? Yes
No, but data
can have many
schemas Yes
Turing complete?
Yes when
extended with
Java UDFs
Yes when
extended with
Java UDFs Yes
![Page 151: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/151.jpg)
Agenda
151
• Overview
• Pig
• Hive
• Jaql
![Page 152: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/152.jpg)
Pig components
• Two Components
Language (called Pig Latin)
Compiler
• Two execution environments
Local (Single JVM)
pig -x local
Distributed (Hadoop cluster)
pig -x mapreduce, or simply pig
152
Pig Latin
Compiler
Local
Distributed
Pig
Execution Environment
152
![Page 153: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/153.jpg)
Running Pig
Script
pig scriptfile.pig
Grunt (command line)
pig (to launch command line tool)
Embedded
Call in to Pig from Java
153 153
![Page 154: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/154.jpg)
Pig Latin sample code
154
#pig
grunt> records = LOAD ‘econ_assist.csv’
using PigStorage (‘,’)
AS (country:chararray, sum:long);
grunt> grouped = GROUP records BY country;
grunt> thesum = FOREACH grouped
GENERATE group,
SUM(records, sum);
grunt> DUMP thesum;
154
![Page 155: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/155.jpg)
Pig Latin – Statements, operations & commands
155
Pig Latin program
… LOAD ‘input.txt’;
… ls *.txt
…
… DUMP…
An operation
as a statement A
command
as a
statement
Logical Plan
Compile Physical
Plan
Execute
155
![Page 156: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/156.jpg)
Pig Latin statements
UDF Statements
REGISTER, DEFINE
Commands
Hadoop Filesystem (cat, ls, etc.)
Hadoop MapReduce (kill)
Utility (exec, help, quit, run, set)
Operators
Diagnostic: DESCRIBE, EXPLAIN, ILLUSTRATE
Relational: LOAD, STORE, DUMP, FILTER, etc.
156 156
![Page 157: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/157.jpg)
Pig Latin – Relational operators
Loading and storing
Eg: LOAD (into a program), STORE (to disk), DUMP (to the screen)
Filtering Eg: FILTER, DISTINCT, FOREACH...GENERATE, STREAM, SAMPLE
Grouping and joining Eg: JOIN, COGROUP, GROUP, CROSS
Sorting Eg: ORDER, LIMIT
Combining and splitting Eg: UNION, SPLIT
157 157
![Page 158: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/158.jpg)
Pig Latin – Relations and schema
Result of a relational operator is a relation
A relation is a set of tuples
Relations can be named using an alias (Eg: “x”)
158
x = LOAD ‘sample.txt’ AS (id: int, year:int);
DUMP x
Output is a tuple. Eg: (1,1987)
158
![Page 159: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/159.jpg)
Pig Latin – Relations and schema
Structure of a relation is a schema
Use the DESCRIBE operator to see the schema. Eg:
The output is the schema:
159
DESCRIBE x
x: {id: int, year: int}
159
![Page 160: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/160.jpg)
Pig Latin expressions
Statements that contain relational operators may also contain expressions.
Kinds of expressions:
Constant Field Projection
Map lookup Cast Arithmetic
Conditional Boolean Comparison
Functional Flatten
160 160
![Page 161: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/161.jpg)
Pig Latin – Data types
• Simple types:
int float bytearray
long double chararray
Complex types:
Tuple – Sequence of fields of any type
Bag – Unordered collection of tuples
Map – Set of key-value pairs. Keys must be chararray.
161 161
![Page 162: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/162.jpg)
Pig Latin – Function types
Eval
Input: One or more expressions
Output: An expression
Example: MAX
Filter
Input: Bag or map
Output: boolean
Example: IsEmpty
162 162
![Page 163: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/163.jpg)
Load
Input: Data from external storage
Output: A relation
Example: PigStorage
Store
Input: A relation
Output: Data to external storage
Example: PigStorage
163
Pig Latin – Function types
163
![Page 164: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/164.jpg)
Pig Latin – User-Defined Functions
• Written in Java
Packaged in a JAR file
Register JAR file using the REGISTER statement
Optionally, alias it with DEFINE statement
164 164
![Page 165: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/165.jpg)
Agenda
165
• Overview
• Pig
• Hive
• Jaql
![Page 166: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/166.jpg)
Hive architecture
166
Metastore
(Relational
database
for metadata)
Hadoop
JDBC/ODBC
CLI
Web
Interface
Parser,
Planner
Optimizer
DDL Queries
166
![Page 167: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/167.jpg)
Running Hive
Hive Shell
Interactive
hive
Script
hive -f myscript
Inline
hive -e 'SELECT * FROM mytable'
167 167
![Page 168: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/168.jpg)
Hive services
hive --service servicename
where servicename can be:
hiveserver
server for Thrift, JDBC, ODBC clients
hwi
web interface
jar
hadoop jar with Hive jars in classpath
metastore
out of process metastore
168 168
![Page 169: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/169.jpg)
Hive - Metastore
Stores Hive metadata
Configurations
Embedded
in-process metastore, in-process database
Local
in-process metastore, out-of-process database
Remote
out-of-process metastore, out-of-process database
169 169
![Page 170: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/170.jpg)
Hive – Schema-On-Read
Faster loads into the database (simply copy or move)
Slower queries
Flexibility – multiple schemas for the same data
170 170
![Page 171: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/171.jpg)
Hive - Configuration
• Three ways to configure hive:
• hive-site.xml
- fs.default.name
- mapred.job.tracker
- Metastore configuration settings
hive –hiveconf
“Set” command in the Hive Shell
171 171
![Page 172: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/172.jpg)
Hive Query Language (HiveQL)
SQL dialect
Does not support full SQL92 specification
No support for:
HAVING clause in SELECT
Correlated subqueries
Subqueries outside FROM clauses
Updateable or materialized views
Stored procedures
172 172
![Page 173: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/173.jpg)
Sample code
173
#hive
hive> CREATE TABLE foreign_aid
(country STRING, sum BIGINT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE;
hive> SHOW TABLES;
hive> DESCRIBE foreign_aid;
hive> LOAD DATA INPATH ‘econ_assist.csv’
OVERWRITE INTO TABLE foreign_aid;
hive> SELECT * FROM foreign_aid LIMIT 10;
hive> SELECT country, SUM(sum) FROM foreign_aid
GROUP BY country;
173
![Page 174: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/174.jpg)
Hive Query Language (HiveQL)
Extensions
MySQL-like extensions
MapReduce extensions
Multi-table insert, MAP, REDUCE, TRANSFORM clauses
Data Types
Simple
TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, BOOLEAN, STRING
Complex
ARRAY, MAP, STRUCT
174 174
![Page 175: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/175.jpg)
Hive Query Language (HiveQL)
Built-in Functions SHOW FUNCTIONS
DESCRIBE FUNCTION
175 175
![Page 176: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/176.jpg)
Hive – User-Defined Functions
Written in Java
Three UDF types:
UDF
Input: single row, output: single row
UDAF
Input: multiple rows, output: single row
UDTF
Input: single row, output: multiple rows
Register UDF using ADD JAR
Create alias using CREATE TEMPORARY FUNCTION
176 176
![Page 177: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/177.jpg)
Agenda
177
• Overview
• Pig
• Hive
• Jaql
![Page 178: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/178.jpg)
Jaql architecture
178
Interactive shell / Applications
Script
Compiler / Parser / Rewriter
File Systems
(HDFS, GPFS, Local)
Databases
(DBMS, HBase)
Streams
(Web, Pipes)
Storage layer
I/O layer
178
![Page 179: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/179.jpg)
Jaql data model: JSON
JSON = JavaScript object Notation
Flexible (Schema is optional)
Powerful modeling for semi-structured data
Popular exchange format
179 179
![Page 180: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/180.jpg)
JSON example
180
[
{ACCT_NUM:18,AUTH_DATE:”2011-01-29”,
AUTH_AMT:”111.11”,ZIP:98765,MERCH_NAME:”Acme”},
{ACCT_NUM:19,AUTH_DATE:”2011-01-29”,
AUTH_AMT:”222.22”,ZIP:98765,MERCH_NAME:”Exxme”,
NICKNAME:”Xyz”},
{ACCT_NUM:20,AUTH_DATE:”2011-01-30”,
AUTH_AMT:”3.33”,ZIP:12345,MERCH_NAME:”Acme”,
ROUTE:[”68.86.85.188”,”64.215.26.111”]},
… ]
180
![Page 181: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/181.jpg)
Running Jaql
Jaql Shell
Interactive. Eg: jaqlshell
Batch Eg: jaqlshell -b myscript.jaql
Inline Eg: jaqlshell -e jaqlstatement
Modes
Cluster Eg: jaqlshell -c
Minicluster Eg: jaqlshell
181 181
![Page 182: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/182.jpg)
Jaql query language
• Sources and sinks
Eg: Copy data from a local file to a new file on HDFS
source sink
read(file(“input.json”)) -> write(hdfs(“output”))
Core Operators
Filter Group Tee
Transform Join Sort
Expand Union Top
182
source sink operator operator …
182
![Page 183: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/183.jpg)
Jaql query language
• Variables
Equal operator (=) binds source output to a variable
e.g. $tweets = read(hdfs(“twitterfeed”))
Pipes, streams, and consumers
Pipe operator (->) streams data to a consumer
Pipe expects array as input
e.g. $tweets → filter $.from_src == 'tweetdeck';
$ – implicit variable referencing current array value
183 183
![Page 184: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/184.jpg)
Jaql query language
• Categories of Built-in Functions
system schema agg
core xml number
hadoop regex string
io binary function
array date random
index nil record
184 184
![Page 185: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/185.jpg)
Jaql – Data Storage
Data store examples Amazon S3 DB2 HBase HDFS
HTTP JDBC Local FS
Data format examples JSON AVRO CSV XML
185 185
![Page 186: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/186.jpg)
Jaql sample code
186
#jaqlshell -c
jaql> $foreignaid =
read(del(“econ_assist.csv”,
{schema: schema
{country: string, sum: long}
} )
)
jaql> $foreignaid
-> group by $country = ($.country)
into {$country.country, sum($[*].sum)};
186
![Page 187: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/187.jpg)
Hadoop core lab – Part 3
![Page 188: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/188.jpg)
BigDataUniversity.com
![Page 189: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/189.jpg)
Acknowledgements and Disclaimers
Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in
which IBM operates.
The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for
informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant.
While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without
warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this
presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or
representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use
of IBM software.
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have
achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended
to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other
results.
© Copyright IBM Corporation 2013. All rights reserved.
•U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM
Corp.
IBM, the IBM logo, ibm.com, InfoSphere and BigInsights, Streams, and DB2 are trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on
their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law
trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law
trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.
![Page 190: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/190.jpg)
Communities
• On-line communities, User Groups, Technical Forums, Blogs, Social networks, and more
o Find the community that interests you …
• Information Management bit.ly/InfoMgmtCommunity
• Business Analytics bit.ly/AnalyticsCommunity
• Enterprise Content Management bit.ly/ECMCommunity
• IBM Champions
o Recognizing individuals who have made the most outstanding contributions to Information Management, Business Analytics, and Enterprise Content Management communities
• ibm.com/champion
![Page 191: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c664884a79594b538b4714/html5/thumbnails/191.jpg)
Thank You Your feedback is important!
• Access the Conference Agenda Builder to complete your session surveys
oAny web or mobile browser at http://iod13surveys.com/surveys.html
oAny Agenda Builder kiosk onsite