ROMMEL GARCIA VIRTUALIZING HADOOPfiles.meetup.com/1444478/Virtualizing Hadoop.pdf · Cluster01...
Transcript of ROMMEL GARCIA VIRTUALIZING HADOOPfiles.meetup.com/1444478/Virtualizing Hadoop.pdf · Cluster01...
VIRTUALIZING HADOOPROMMEL GARCIA
HADOOP USAGE
3
40% 28%
39% 51%
21% 21%
Today In 2 Years
On public cloud infrastructure such as AWS or Google
Virtualized servers in your data center
Unvirtualized servers in your data center
Off-premise pCAGR: 1%
On-premise, Virtualized
pCAGR: 14%
On-premise, Unvirtualized qCAGR: -16%
26%
21%
8%
30%
9%
5%
2%
0%
Currently use
Actively evaluating
Have evaluated but decided not to use
May consider it in the future
No interest whatsoever
Never heard of it
Don't Know
Other
Source: Internal VMware Core Metrics Study, July 2015
COMMODITY VS. APPLIANCEVIRTUALIZATION HARDWARE
VIRTUALIZATION PLATFORM
SCENARIO 1
▸ SAN Storage (LUN)
▸ Generic Blade Servers for Compute
▸ 1/10Gbe Network
▸ vm sizes are typically small
▸ 4 vCPU
▸ 32GB vRAM
VIRTUALIZATION PLATFORM
SCENARIO 2
▸ Storage Appliance for Hadoop
▸ EMC Isilon
▸ NetApp Open Solution
▸ Purpose-built Virtualization Blade Servers for Compute
▸ Fabric Interconnect/Infiniband
▸ vm sizes are typically bigger
▸ up to 16 vCPU
▸ up to 120GB vRAM
VIRTUALIZATION PLATFORM
SCENARIO 3
▸ Local Storage for Hadoop
▸ Rack Mounted Servers
▸ 1/10Gbe Network
▸ vm sizes are typically bigger
▸ up to 16 vCPU
▸ up to 120GB vRAM
VIRTUALIZATION PLATFORM OF CHOICE
COMMON CHOICE
▸ VMWare vSphere
▸ ahead of the curve, a lot more mature
▸ BDE provisions Hadoop
▸ OpenStack
▸ new, only open source choice which provides a lot of promise
CAN WE USE IT FOR POC, DEV, UAT, PROD???
THE ANSWER IS YES.
REAL-WORLD SETUPVIRTUALIZATION ARCHITECTURE
QUICK REVIEW ON HADOOP ARCHITECTURE
HADOOP ARCHITECTURE
WorkerNode1 WorkerNode2 WorkerNode3
InputFile Resourcemanager Job
Datanode
NodemanagerSplit1–64MB
AppMaster-1
Split2–64MB Split3–64MB
Nodemanager Nodemanager
Datanode Datanode
Block1–64MB Block2–64MB Block3–64MB
Container-2 Container-3
Namenode
Master Roles
Image credit: VMware
VIRTUALIZATION ARCHITECTURE
HADOOP WITH ISILON
Shared storage/NAS
Hadoop Virtual Node 2
NN
NN
NN
NN
NN
NN datanode
Isilon
Virtualization Host
VMDK OS Image – VMDK OS Image – VMDK VMDK
VMDK
Hadoop Virtual Node 1
Ext4
Resourcemanager
Ext4
Temp OS Image –
VMDK
Ext4
Nodemanager
Ext4 Hadoop Virtual Node 3
Ext4
Nodemanager
Ext4
Temp
Image credit: VMware
VIRTUALIZATION ARCHITECTURE
DAS WITH HADOOP
Virtualization Host Server
VMDK
Hadoop Node 1 Virtual Machine
Datanode
Ext4
Nodemanager
Ext4 Ext4 Ext4
Six Local DAS disks per Virtual Machine
VMDK VMDK VMDK VMDK VMDK VMDK VMDK
Hadoop Node 2 Virtual Machine
Datanode
Ext4
Nodemanager
Ext4 Ext4 Ext4 Ext4
VMDK VMDK VMDK VMDK
Ext4 Ext4 Ext4
Image credit: VMware
VIRTUALIZATION ARCHITECTURE
STORAGE DISK LAYOUT
vSAN
Ext4
MasterRole
VMDKOSimage
Hadoopmasternode
LocalDisks
Hypervisor
vSAN
Ext4 Ext4 Ext4
Datanode NodeManager
VMDK VMDK VMDKOSimage
Hadoopslavenode
Virtualmachine
Hardware
Image credit: VMware
SOME BENCHMARKS
SPHERE 6 RESULTS - 32 HOSTS, 23 DISKS PER HOST - 2014 REPORT
CONFIDENTIAL http://www.vmware.com/resources/techresources/10452
VM LAYOUT MATTERSLARGE DEPLOYMENT ARCHITECTURE
DEPLOYMENT LAYOUT
LAYOUT 1: ONE VSPHERE CLUSTER PER RACK
Rack01 Rack02 Rack03 Rack04 Rack05 Rack06 Rack07 Rack08 Cluster01 Cluster02 Cluster03 Cluster04 Cluster05 Cluster06 Cluster07 Cluster08
host001 host005 host009 host013 host017 host021 host025 host029 host002 host006 host010 host014 host018 host022 host026 host030 host003 host007 host011 host015 host019 host023 host027 host031 host004 host008 host012 host016 host020 host024 host028 host032
host033 host037 host041 host045 host049 host053 host057 host061 host034 host038 host042 host046 host050 host054 host058 host062 host035 host039 host043 host047 host051 host055 host059 host063 host036 host040 host044 host048 host052 host056 host060 host064
host065 host069 host073 host077 host081 host085 host089 host093
host066 host070 host074 host078 host082 host086 host090 host094 host067 host071 host075 host079 host083 host087 host091 host095 host068 host072 host076 host080 host084 host088 host092 host096
host097 host101 host105 host109 host113 host117 host121 host125 host098 host102 host106 host110 host114 host118 host122 host126 host099 host103 host107 host111 host115 host119 host123 host127 host100 host104 host108 host112 host116 host120 host124 host128
host129 host133 host137 host141 host145 host149 host153 host157 host130 host134 host138 host142 host146 host150 host154 host158 host131 host135 host139 host143 host147 host151 host155 host159 host132 host136 host140 host144 host148 host152 host156 host160
Image credit: VMware
DEPLOYMENT LAYOUT
LAYOUT 2: CROSS-RACK CLUSTER LAYOUT
Rack01 Rack02 Rack03 Rack04 Rack05 Rack06 Rack07 Rack08
Clusrter1
host001 host005 host009 host013 host017 host021 host025 host029
host002 host006 host010 host014 host018 host022 host026 host030
host003 host007 host011 host015 host019 host023 host027 host031
host004 host008 host012 host016 host020 host024 host028 host032
Cluster2
host033 host037 host041 host045 host049 host053 host057 host061
host034 host038 host042 host046 host050 host054 host058 host062
host035 host039 host043 host047 host051 host055 host059 host063
host036 host040 host044 host048 host052 host056 host060 host064
Cluster3
host065 host069 host073 host077 host081 host085 host089 host093
host066 host070 host074 host078 host082 host086 host090 host094
host067 host071 host075 host079 host083 host087 host091 host095
host068 host072 host076 host080 host084 host088 host092 host096
Cluster4
host097 host101 host105 host109 host113 host117 host121 host125
host098 host102 host106 host110 host114 host118 host122 host126
host099 host103 host107 host111 host115 host119 host123 host127
host100 host104 host108 host112 host116 host120 host124 host128
Cluster5
host129 host133 host137 host141 host145 host149 host153 host157
host130 host134 host138 host142 host146 host150 host154 host158
host131 host135 host139 host143 host147 host151 host155 host159
host132 host136 host140 host144 host148 host152 host156 host160
Image credit: VMware
DEPLOYMENT LAYOUT
VIRTUAL MACHINE ROLES - MASTERS AND CLIENTS
MasterVMs
host001 host037 host073 host109 host145 mst01 mst02 mst03 mst04 mst05
Disk:8192GB,RAM:120GB,
vCPU:16 Disk:8192GB,RAM:120GB,
vCPU:16 Disk:8192GB,RAM:120GB,
vCPU:16 Disk:8192GB,RAM:120GB,
vCPU:16 Disk:8192GB,RAM:120GB,
vCPU:16 NAMENODE RESOURCEMANAGER HIVE_METASTORE OOZIE_SERVER NAGIOS_SERVER RESOURCEMANAGER NAMENODE HIVE_SERVER FALCON_SERVER GANGLIA_SERVER JOURNALNODE JOURNALNODE JOURNALNODE OOZIE_SERVER ZKFC ZKFC MYSQL_SERVER* APP_TIMELINE_SERVER* HISTORYSERVER WEBHCAT_SERVER* SECONDARY_NAMENODE* ZOOKEEPER_SERVER ZOOKEEPER_SERVER ZOOKEEPER_SERVER ZOOKEEPER_SERVER ZOOKEEPER_SERVER GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR cln01 cln01 cln01 cln01 cln01
Disk:8192GB,RAM:120GB,
vCPU:16 Disk:8192GB,RAM:120GB,
vCPU:16 Disk:8192GB,RAM:120GB,
vCPU:16 Disk:8192GB,RAM:120GB,
vCPU:16 Disk:8192GB,RAM:120GB,
vCPU:16 PIG PIG PIG PIG PIG SQOOP SQOOP SQOOP SQOOP SQOOP HIVE_CLIENT HIVE_CLIENT HIVE_CLIENT HIVE_CLIENT HIVE_CLIENT MAPREDUCE2_CLIENT MAPREDUCE2_CLIENT MAPREDUCE2_CLIENT MAPREDUCE2_CLIENT MAPREDUCE2_CLIENT HDFS_CLIENT HDFS_CLIENT HDFS_CLIENT HDFS_CLIENT HDFS_CLIENT YARN_CLIENT YARN_CLIENT YARN_CLIENT YARN_CLIENT YARN_CLIENT ZOOKEEPER_CLIENT ZOOKEEPER_CLIENT ZOOKEEPER_CLIENT ZOOKEEPER_CLIENT ZOOKEEPER_CLIENT OOZIE_CLIENT OOZIE_CLIENT OOZIE_CLIENT OOZIE_CLIENT OOZIE_CLIENT FALCON_CLIENT FALCON_CLIENT FALCON_CLIENT FALCON_CLIENT FALCON_CLIENT GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR
Image credit: VMware
DEPLOYMENT LAYOUT
VIRTUAL MACHINE ROLES - WORKERS
Workers
host002 host003 host159 host160 wrk01 wrk01 wrk01 wrk01
Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 NODEMANAGER NODEMANAGER NODEMANAGER NODEMANAGER DATANODE DATANODE DATANODE DATANODE GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR
… wrk02 wrk02 wrk02 wrk02
Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 NODEMANAGER NODEMANAGER NODEMANAGER NODEMANAGER DATANODE DATANODE DATANODE DATANODE GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR host225 host226 host239 host240 wrk01 wrk01 wrk01 wrk01
Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 NODEMANAGER NODEMANAGER NODEMANAGER NODEMANAGER DATANODE DATANODE DATANODE DATANODE GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR
… wrk02 wrk02 wrk02 wrk02
Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 Disk:8192GB,RAM:120GB,vCPU:
16 NODEMANAGER NODEMANAGER NODEMANAGER NODEMANAGER DATANODE DATANODE DATANODE DATANODE GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR GANGLIA_MONITOR
Image credit: VMware
DEPLOYMENT LAYOUT
LAYOUT 3: EXPANDED RACK LAYOUT (HADOOP/ANALYTICS APPS)
Rack09 Rack10 Rack11 Rack12 Rack13 Rack14 Rack15 Rack16
Cluster6
host161 host165 host169 host173 host177 host181 host185 host189
host162 host166 host170 host174 host178 host182 host186 host190
host163 host167 host171 host175 host179 host183 host187 host191
host164 host168 host172 host176 host180 host184 host188 host192
Cluster7
host193 host197 host201 host205 host209 host213 host217 host221
host194 host198 host202 host206 host210 host214 host218 host222
host195 host199 host203 host207 host211 host215 host219 host223
host196 host200 host204 host208 host212 host216 host220 host224
Cluster8
host225 host227 host229 host231 host233 host235 host237 host239 host226 host228 host230 host232 host234 host236 host238 host240
ESXicluster
Powerrack
MasterNode
WorkerNode
1:1HighMem
MasterNode(AnalyKcsApp)
WorkerNode(AnalyKcsApp)
Image credit: VMware
?