Hadoop Introduction

Hadoop IntroductionWang Xiaobo

2011-12-8

TeleNav Confidential

Outline

Install hadoopHDFSMapReduceWordCount AnalyzingCompile image data

Install hadoop

Download and unzip HadoopInstall JDK 1.6 or higher versionSSH Key Authenticationmaster/salvesConfig hadoop-env.sh

export JAVA_HOME=/usr/local/jdk1.6.0_16

core-site.xml/hdfs-site.xml/mapred-site.xmlStartup/Shutdown

sh start-all.shsh stop-all.sh

Install hadoop

Monitor Hadoophttp://172.16.101.227:50030http://172.16.101.227:50070

Shell commandshadoop dsf -lshadoop jar ../hadoop-0.20.2-examples.jar wordcount input/ output/

Single namenodeBlock storage (64M)ReplicationBig fileNot suit for low latency AppNot suit for large numbers of small file

150 millions files need 32G memory

Single user write

MapReduce

InputFormatInputSpliterRecordReader

CombinerSame as Reducer ， but run in Map local machine

PartitionerControl the load of each reducer, default is even

ReducerRecodWriter

OutputFormat

WrodCount

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = new Job(conf, “word count”); // 设置一个用户定义的 job 名称 job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); // 为 job 设置 Mapper 类 job.setCombinerClass(IntSumReducer.class); // 为 job 设置 Combiner 类 job.setReducerClass(IntSumReducer.class); // 为 job 设置 Reducer 类 job.setOutputKeyClass(Text.class); // 为 job 的输出数据设置 Key 类 job.setOutputValueClass(IntWritable.class); // 为 job 输出设置 value 类 FileInputFormat.addInputPath(job, new Path(otherArgs[0])); // 为 job 设置输入路径 FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));// 为 job 设置输出路径 System.exit(job.waitForCompletion(true) ? 0 : 1); // 运行 job}

WrodCount

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } }}

WrodCount

Inputthe Apache Hadoop software library is a framework that allows for the…

Map<the, 1><Apache, 1>…<the, 1>

Reducer<the, [1,1]><Apache, [1]>

Output<the, 2><Apache, 1>

WrodCount

Inputthe Apache Hadoop software library is a framework that allows for the…

Map<the, 1><Apache, 1>…<the, 1>

Reducer<the, [1,1]><Apache, [1]>

Output<the, 2><Apache, 1>

Use Hadoop to compile image data

Old compiler

DataCompiler

Data format layer

TXD files

Zoom TXD

MMD files

Cache work layer

Zoom work layer

1D link

2D merge...

Cache files

Hadoop

DataCompiler_distribute

Witer1D2D Job

LabelConfilict Job

WriterLabel Job

PrepareWork Job

TXD files

Prepare Mapper

Prepare Reduce

Zoom TXD

data.prepare.job

write.to.txd.job

traffic.job write.traffic.to.txd.job

collision.detection.job0write.to.label.job

collision.detection.job5

write.to.largelabel.jobcollision.detection.job6

write.to.dpoi.jobcollision.detection.job4

Reduce compile time from 5 days to 5 hours

TeleNav Confidential

Thanks ！

Hadoop Introduction

Documents

Transcript of Hadoop Introduction

Hadoop and friends : introduction

Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Hadoop Introduction

Hadoop-Quick introduction

Hadoop Spark Introduction-20150130

Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL

Introduction to Hadoop and HDFS. Table of Contents Hadoop – Overview Hadoop Cluster HDFS.

Hadoop An Introduction

Hadoop Introduction in Paris

An Introduction to Hadoop

Introduction to Hadoop Administration

Introduction to MapReduce & hadoop

Introduction to apache hadoop

Introduction to Hadoop and Hadoop component

Hadoop 2.0 Introduction – with HDP for Windows...2015/05/14 · Agenda • What is Big Data – The Need for Hadoop • Hadoop Introduction – What is Hadoop 2.0 • Hadoop Architecture

Introduction à Hadoop

Introduction To Hadoop Ecosystem

Hadoop 2.0 | Hadoop Admin & Development | - Introduction : Yarn (Inclusive)

Hadoop introduction 2

Hadoop introduction