Hadoopのシステム設計・運用のポイント

57
1 Hadoopのシステム構築・運用のポ イント Cloudera カスタマーオペレーションズエンジニア 嶋内 2012117

description

Cloudera World Tokyo 2012 で発表した、Hadoopのシステム設計と運用のポイントに関する資料です。

Transcript of Hadoopのシステム設計・運用のポイント

  • 1. Hadoop Cloudera 2012117 1

2. HDFS MapReduce HDFS MapReduce CDH 2 3. ( ) 20114Cloudera email: [email protected] twiBer: @shiumachi 4. 5. 1GbitLAN 10GbitLAN 6. : (HDFS) (MapReduce) (HBase) RAID : 1TB * 8 : 2TB * 8 : 3TB * 12 SSD 15000rpm 3.5inch 7200rpm SATA CPU : 4core * 2 : 6core * 2 CPU : 32GB (HBase : 48GB) : 48GB (HBase : 64GB) ECC 7. : NN, SNN(HDFS) HASNN (MapReduce) HMaster(HBase) Zookeeper(HBase, HA) JournalNode(HA) RAID 1TB * 4, RAID1+0 SSD 15000rpm 3.5inch 7200rpm SATA CPU11 4core * 2 24GB48GB ECC 8. : 4 (3) + 1 : 2 1NNHMasterZKJournalNode 1SNN, JT, HMasterZKJournalNode ZookeeperJournalNode:1 Zookeeper (3or5) (1GB) 1 9. RAID Hadoop RAID RAID0 (1TB) 128GB CPU 1MapReduceCPU CPU 1 ECC HBaseHBase HBase +1+16GB 10. : 11 Hadoop1 Yahoo.com 31 11. 11 12. CDH3CDH4 : fs.default.name fs.defaultFS deprecated WARN Cloudera Manager hBp://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project- dist/hadoop-common/DeprecatedProperbes.html hadoop hdfs/mapred hadoop deprecated WARN MapReduce1 CDH3JT/TT(hadoop-0.20) 12 13. OS Oracle Java 6 Oracle Java 7 () OpenJDKJava DNS /etc/hosts SELinux 64OS 13 14. 1: Hadoop 2: MapReduce MapReduce CPU (CPU) Snappy LZO LZO (GPL)ApacheHadoop 14 15. HDFS 15 16. HDFS DN DN DN: 1GB100 JVM-XX:+UseCompressedOops NN NN: 1GB100 SNN NN CDH3u2 2/DN HDFS-2379 () 128MB (PB)256MB 16 17. HDFS 3 MapReduce 4-5 OSHDFS 116TB3-4TB : 100400TB(DFS1.2PB) 17 18. NN NN60GB GC 128MB7.2PB 16TB458 500-600 18 19. 256MB1800-2000NN 1000/10PB 19 20. NN dfs.name.dir CDH3 dfs.namenode.name.dir CDH4 3(1NFS) QJMNFS (+ ) 20 21. dfs.data.dir CDH3 dfs.datanode.data.dir CDH4 JBOD 21 22. MapReduce 22 23. () map (wave) 100map 502wave10 2wave 23 24. TTmap/reduce map reduce 24 25. = CPU - (DN,TT2RS3) (HT)1.5 map:reduce = 4:3 or 2:1 > IO 25 26. CDH3 CDH4-MR1 (Map mapred.tasktracker.map.tasks.maximum + Reduce mapred.tasktracker.reduce.tasks.maximum) (mapred.child.java.opts-Xmx) TaskTracker DataNode RegionServer Hadoop(OS) 26 27. CDH4-MR2 (Map mapreduce.tasktracker.map.tasks.maximum + Reduce mapreduce.tasktracker.reduce.tasks.maximum) (mapred.child.java.opts-Xmx) TaskTracker DataNode RegionServer Hadoop(OS) 27 28. (1) 4*2(HT) 32GB HBase = 8*1.5 - 3 = 9 map 6 reduce 3 mapred.child.java.opts -Xmx1g 9GB TT/DN1GBRS16GB27GB 32GB CPU 48GB 28 29. (2) 6*2(HT)48GB HBase = 12*1.5 - 3 = 15 map 10 reduce 5 mapred.child.java.opts -Xmx1g 15GB TT/DN1GBRS16GB33GB 32GB48GB 29 30. HDFS 31. CDH3 HDFS hadoop fsck / hadoop dfsadmin -report hadoop fs -lsr / () hadoop fs -put []/tmp hadoop fs -get /tmp/ hadoop dfsadmin -safemode get hBp://dn:50075/blockScannerReport hBp://dn:50075/blockScannerReport?listblocks JMX hBp://dn:50075/jmx 31 32. CDH4 HDFS hdfs fsck / hdfs dfsadmin -report hdfs dfs -ls -R / () hdfs dfs -put []/tmp hdfs dfs -get /tmp/ hdfs dfsadmin -safemode get hBp://dn:50075/blockScannerReport hBp://dn:50075/blockScannerReport?listblocks JMX hBp://dn:50075/jmx 32 33. NN fsimage hdfs dfsadmin -fetchImage NN fsimage fsimage/edits () 33 34. HDFS HDFS Linuxfsckedits CDH3u4CDH4.0 3u33u4 3u3( ) () hBp://www.cloudera.co.jp/blog/namenode-recovery-tools-for- the-hadoop-distributed-le-system.html 34 35. hadoop namenode -recover CDH3 hdfs namenode -recover CDH4 edits4 conbnue stop editsfsimage quit fsimage always (conbnue ) 35 36. CDH4 hdfs hdfs oiv, hdfs oev oiv fsimageoev edits hdfs oiv -i -o -p -p hdfs oev -p stat NN hdfs getconf 36 37. Too Many Open Files ? Hadoop ? nole 1024 ? /etc/security/limits.conf hdfs - nole 32768 mapred - nole 32768 hbase - nole 32768 37 38. DNS /etc/hosts 38 39. MapReduce 40. Too many fetch-failures ? Reducer fetch mapper Too many fetch failures TT( = TT) ? DNS mapper reducer hBp JeBy(CDH3u1) 41. Too many fetch-failures map80% reduce reducer wait CDH3, MR1 mapred.reduce.slowstart.completed.maps = 0.80 CDH4(MR2) mapreduce.job.reduce.slowstart.completedmaps = 0.80 map reducer TT tasktracker.hBp.threads = 80 CDH3, MR1 mapreduce.tasktracker.hBp.threads = 80 CDH4(MR2) 42. Too many fetch-failures () map reducer SQRT() (: 50020, 100030) mapred.reduce.parallel.copies CDH3, MR1 mapreduce.reduce.shue.parallelcopies CDH4(MR2) CDH3u1 JeBy fetch failure CDH3u2 (MAPREDUCE-2980) 43. Reduce OOME mapred.tasktracker.reduce.tasks.maximum * ulimit RAM 50% RAMswap 44. JobTrackerOOME ? JT > RAM ? job 44 45. JobTrackerOOME sudo -u mapred jmap -J-d64 -histo:live JT JT NN HDFS(128256MB) 256MB mapred.min.split.size CDH3, MR1 mapreduce.input.leinpuvormat.split.minsize CDH4(MR2) mapred.jobtracker.completeuserjobs.maximum 5 JT RAM 45 46. Not Able to Place Enough Replicas ? NN DN ? dfs > DN DN MR 10 mapred.submit.replicabon CDH3, MR1 mapreduce.client.submit.le.replicabon CDH4(MR2) NN 22 DN DN xciever DN 256 46 47. Not Able to Place Enough Replicas DN 4096 DN dfs.datanode.max.xcievers CDH3, MR1 dfs.datanode.max.transfer.threads CDH4(MR2) = xceiver * DN log 47 48. ENOENT: No such le or dir dfs.datanode.du.reserved 10% 1TB 100GB DN (900GB) userlog mapred:mapred 755 48 49. JT core-site.xml default.name 49 50. CDH 51. CDH4.1 () hBps://ccp.cloudera.com/display/CDH4DOC/Upgrading +from+CDH3+to+CDH4 51 52. HDFS hadoop dfsadmin -nalizeUpgrade hadoop dfsadmin -rollBack 52 53. 53 54. Hadoop Cloudera Tom White 2 3(Hadoop2.0 HA) (910) 55. OreillyHadoop Operabons Cloudera Eric Sammer 56. Cloudera University h"p://university.cloudera.com h"p://www.jp.cloudera.com/university Hadoop/HBase562012 Cloudera, Inc. All Rights Reserved. 57. 57