GS ビッグデータ・ストラテジー (日本株) - Mizuho Bank · 2020. 8. 11. · 投資戦略に革新を 。 GS ビッグデータ・ストラテジー(日本株)
Hadoopビッグデータ基盤の歴史を振り返る #cwt2015
-
Upload
cloudera-japan -
Category
Technology
-
view
4.774 -
download
1
Transcript of Hadoopビッグデータ基盤の歴史を振り返る #cwt2015
-
1 Cloudera, Inc. All rights reserved.
Hadoop
-
2 Cloudera, Inc. All rights reserved.
20114ClouderaCloudera
email: [email protected] twitter: @shiumachi
-
6 Cloudera, Inc. All rights reserved.
(EDH)
1
Sqoop, Flume
MapReduce, Hive,
Pig, Spark
Impala
Solr
SAS, R, Spark,
Mahout
NoSQL
HBase
Spark Streaming
HDFS, HBase
YARN, Cloudera Manager,Cloudera Navigator
-
7 Cloudera, Inc. All rights reserved.
-
8 Cloudera, Inc. All rights reserved.
DISCLAIMERHadoopEDHDWH
ClouderaCloudera
(HA)
:
-
10 Cloudera, Inc. All rights reserved.
-
12 Cloudera, Inc. All rights reserved.
(1)
-
13 Cloudera, Inc. All rights reserved.
(1)
tar.gz
-
14 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
-
15 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
-
16 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
-
17 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
-
18 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
-
19 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
-
20 Cloudera, Inc. All rights reserved.
2009-2012Hadoop = MapReduceJavaHivePig
HiveBI
-
21 Cloudera, Inc. All rights reserved.
MapReduceHadoop
HDFS2012
Hive SQLMapReducePig
Avro
RCFile Parquet
FlumeSource - Channel - Sink 3Source Sink
-
22 Cloudera, Inc. All rights reserved.
(2)BI
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
-
23 Cloudera, Inc. All rights reserved.
(2)BI
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
-
24 Cloudera, Inc. All rights reserved.
(2)BI
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
-
25 Cloudera, Inc. All rights reserved.
(2)BI
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
-
26 Cloudera, Inc. All rights reserved.
(2)BI2012 ImpalaBIHadoopParquetHBaseBIHBase + Parquet
(HBase)
-
27 Cloudera, Inc. All rights reserved.
BI Impala201210HadoopSQLHadoopMapReduce
ParquetClouderaTwitter
HBaseNoSQLHBase2009Impala
-
28 Cloudera, Inc. All rights reserved.
Parquet
HBase
Parquet + HBase
(Parquet)
(HBase)
(HBase)
(HBase)
-
29 Cloudera, Inc. All rights reserved.
(3)SparkEDH
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
-
30 Cloudera, Inc. All rights reserved.
(3)SparkEDH
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
-
31 Cloudera, Inc. All rights reserved.
(3)SparkEDH
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
-
32 Cloudera, Inc. All rights reserved.
(3)SparkEDH
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
-
33 Cloudera, Inc. All rights reserved.
(3)SparkEDH
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Spark
-
34 Cloudera, Inc. All rights reserved.
(3)SparkEDH2013 Cloudera SearchHadoopSparkSQL
-
35 Cloudera, Inc. All rights reserved.
SparkEDHSolrOSSSolrHadoopClouderaSolrCloudera Search OSS
Lily HBase IndexerHBaseSolr
Spark MapReduceAPI
-
36 Cloudera, Inc. All rights reserved.
(4)Kafka
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Spark
-
37 Cloudera, Inc. All rights reserved.
(4)Kafka
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Kafka Broker
Flume Source
Kafka Source
Spark
-
38 Cloudera, Inc. All rights reserved.
(4)Kafka
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Kafka Broker
Flume Source
Kafka Source
Kafka Producer
Producer API
Spark
-
39 Cloudera, Inc. All rights reserved.
(4)Kafka
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Kafka Broker
Flume Source
Kafka Source
Kafka Producer
Producer API
Flume Sink
HBase Sink
Spark
-
40 Cloudera, Inc. All rights reserved.
(4)Kafka
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Kafka Broker
Flume Source
Kafka Source
Kafka Producer
Producer API
Flume Sink
HBase Sink
Spark Streaming
Spark
-
41 Cloudera, Inc. All rights reserved.
(4)Kafka2015 KafkaSpark Streaming end-to-end
-
42 Cloudera, Inc. All rights reserved.
KafkaKafkaFlumeKafka1
Spark StreamingSpark
-
43 Cloudera, Inc. All rights reserved.
SLA
-
44 Cloudera, Inc. All rights reserved.
SLA1SLA()
-
45 Cloudera, Inc. All rights reserved.
SLA1: SLAImpala51
2: SLAHadoopHadoopHadoop
3: SLA
-
46 Cloudera, Inc. All rights reserved.
end-to-endSLA
Hadoop
SLA
ImpalaParquetFlume(Parquet)
Impala
HBase()
Impala()
Impala
HadoopHadoop
Hadoopend-to-end
-
47 Cloudera, Inc. All rights reserved.
ParquetImpala
HBase
SparkMapReduce
-
48 Cloudera, Inc. All rights reserved.
-
49 Cloudera, Inc. All rights reserved.
tar.gz
HDFS
tar.gz
HDFS
Avro
HBase
HBase
Solr
Kafka Broker
HFDS
put
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
Impala
BI
Flume Sink
HBase Sink
HBase
get/put API
Lily HBase Indexer
NRT
Spark Streaming
Solr
Flume Source
Kafka Source
Kafka Producer
Producer API
Spark
Flume Sink
NRT
-
50 Cloudera, Inc. All rights reserved.
SLASLAend-to-endSLA
SLA
-
51 Cloudera, Inc. All rights reserved.
We are hiring!
-
52 Cloudera, Inc. All rights reserved.
Thank you