Cloudera Impalaをサービスに組み込むときに苦労した話

17
Copyright © CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / Cloudera Impalaをサービスに 組み込む時に苦労した話 20141031 株式会社セラン R&D戦室 須幸憲 @sudabon

description

Impala Meetupで発表した資料

Transcript of Cloudera Impalaをサービスに組み込むときに苦労した話

  • 1. Cloudera Impala20141031 RD@sudabonCopyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p /

2. @sudabon) RD l 19972004 NEC / l 20052006 BIGLOBE / BtoBSNIPl 2012/8 Hadoop2HadoopHive2Copyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 3. 1. MOBYLOGl Webl 200512l PCl MOBYLOG ENGINE OmnitureAdobeSiteCatalystWebTrendsWebTrends AnalyticsOEM2. xross datal LITEl 20148l Web3Copyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 4. impala Web OLAPJOIN1 X ... 1. 2. 3. 4. EXISTSINget_json_object4Copyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 5. 1. 2. RDBMS3. 5Copyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 6. 61impala / Hive+MapReduceHDFSCopyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 7. 72Hive+MapReduceHDFSimpalaHDFSCopyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 8. 1. 2. RDBMS3. 8Copyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 9. xross data9ClouderaManagerHiveHiveMapReduce(CDH4.6)Hiveimpala 1.4(CDH5.1)RDBMSimpalaWebHDFSCopyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 10. RDBMSHDFS Sqoop import --direct 1. base_t2. Sqoop importRDBMStemp_t3. base_ttemp_tVIEWview_t4. view_t Four Steps Strategy for Incremental Updates in Apache Hive on Hadoophttp://jp.hortonworks.com/blog/four-step-strategy-incremental-updates-hive/10Copyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 11. 1. 2. RDBMS3. 11Copyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 12. HDFS Hive GzipTextFile impala SnappyRCFile ParquetFile+Snappy ParquetFile+Snappy ML12Copyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 13. 13impala version 1.036.6126.08329.73624.02419.58616.20510152025303540SnappySnappyGzipSnappyGzipNo Comp.ParquetFileFileRCFileSequenceTextFileAvg. Job Latency [sec]cf. Performance Evaluation of Cloudera Impala GAhttp://www.slideshare.net/sudabon/performance-evaluation-of-cloudera-impala-gaCopyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 14. 2 TextFile YYYYMMDD SnappyRCFile YYYYMMDD ID 2#110A 14#210#310#410BCXCopyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 15. 15Copyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 16. impala 2.0 2014/10/14 EXISTSIN RANK()LAG()LEAD() get_json_object spill to disk OutOfMemory User ML... Regexp_Replace/Extract... 2.0...impala 2.016Copyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p / 17. Copyright CELLNT Corp. All right Reserved. h t t p : / / w w w . x d a t a . j p /