CDHの歴史とCDH5新機能概要 #at_tokuben

40
1 CDH5 2014/01/23 Cloudera株式会社 嶋内 翔

description

@特勉(@IT 特集連動勉強会) で発表させていただきました、CDH5についての資料です。 http://atnd.org/events/46924

Transcript of CDHの歴史とCDH5新機能概要 #at_tokuben

  • 1. CDH5 2014/01/23 Cloudera 1
  • 2. ( ) 20114Cloudera 2
  • 3. Cloudera Impala PDF Cloudera John Russell HadoopHBaseHadoop Hive Cloudera Cloudera World Tokyo 3
  • 4. CDH CDH5 4 HDFS YARN MapReduce Cloudera Impala Cloudera Search Spark
  • 5. CDH 5
  • 6. Apache Hadoop + 6
  • 7. HDFS 1 2 3 4 5 HDFS 2 1 1 2 1 4 2 3 3 3 5 5 4 5 4 HDFS 7
  • 8. HDFS 1 3 1 1 2 1 4 2 3 3 3 5 5 4 5 4 1 8 2 3 4
  • 9. MapReduce 1 2 3 4 5 MR 2 1 1 2 1 4 2 3 3 3 5 5 4 5 4 9
  • 10. CDH Clouderas DistribuLon including Apache Hadoop 100% 10
  • 11. CDH MapReduce Cloudera Impala Cloudera Search etc MAPREDUC E, HIVE, PIG SQL CLOUDERA IMPALA CLOUDERA SEARCH 11 MAHOUT, DATAFU
  • 12. CDH 2013 YARN HDFSNFS Impala, Search, Spark, etc Q3 2009 2009 Q2 2011 2010 Q1 2010 12 2011 2012 Q2 2012 2013
  • 13. CDH2 (2010) 13 HadoopHivePig
  • 14. CDH3 (20114) Hadoop HBase Flume RDBMS Sqoop 14
  • 15. CDH4 (20126) HDFS (HA) MapReduceHA Mahout HBase, Flume, Hue 15
  • 16. CDH5 YARN HDFSNFS : Impala, Search(Solr)Sentry, Accumulo, Spark 16
  • 17. CDH5 17
  • 18. HDFS Hadoop SPOF! CDH4 18 mmap HDFS REST API NFSv3 CDH4 CDH5 CDH4 CDH5 CDH5
  • 19. CDH5 HDFS /path/to/dir/.snapshot : Cloudera Manager GUI 19
  • 20. HBase HDFS HDFS CDH4 CDH4 CDH4 CDH5 HBase CDH5 20
  • 21. HBase CDH CDH3 CDH4 CDH5 : : 21
  • 22. YARN CDH5 Yet-Another-Resource-Negotiator JobTracker MapReduceYARN ImpalaSparkYARN 22
  • 23. MapReduce 1.0 Job Client Submit Job JobTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker Map Slot Reduce Slot 23
  • 24. CDH5 YARN Client ResourceManager Submit Application Client NodeManager NodeManager NodeManager AppMaster Cotainer Container Container Container Cotainer Container Container AppMaster 24 NodeManager Container
  • 25. CDH5 MapReduce Cloudera Impala Spark Cloudera Search 25
  • 26. MapReduce MapReduce 2.0 (MRv2) 26 CDH5 YARN ResourceManager(RM) + ApplicationMaster(AsM) JobTracker NodeManager(AM) TaskTracker MRv1
  • 27. Cloudera Impala SQL HiveQL Hive MapReduce HDFS HBase x1030 x23 CDH5CDH CDH5 Llama () 27 ImpalaYARN
  • 28. Cloudera Search CDH Apache Solr CDH5CDH HDFS 29 MapReduce Flume( NRT) HBase
  • 29. 1: 30
  • 30. 2: Twicer 31
  • 31. CDH5 Spark Scala ScalaJavaPython val le = sc.textFile("hdfs://.../pagecounts-*.gz") val counts = le.atMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://.../word-count") Scala 32
  • 32. 33
  • 33. Hadoop 34
  • 34. Hadoop API API BI + JDBC/ODBC Web SQL Hadoop RDBMS 35 DWH
  • 35. CDH5 HDFS MapReduce CDH5 hcp://Lny.cloudera.com/cdh5doc 36
  • 36. Cloudera Manager 5 37 CDH YARNGUI Standard
  • 37. Cloudera Manager Hadoop 1001 YARN Cloudera Manager + CDH5 hcp://cloudera.com/content/support/en/downloads.html 38
  • 38. CDHML CDH () [email protected] CDH Cloudera hcp://www.cloudera.co.jp/newslecer Cloudera CDH/CM 39
  • 39. We are Hiring! Cloudera Hadoop Hadoop () [email protected] 40
  • 40. 41