Hadoop on AWScs230/lectures20/HadoopOnAWS.pdfDecoupled apps with automatic scaling and simplified...

21
Hadoop on AWS

Transcript of Hadoop on AWScs230/lectures20/HadoopOnAWS.pdfDecoupled apps with automatic scaling and simplified...

Hadoop on AWS

Cluster Starting up

Cluster Finished Startup

Master node public DNS

Upload your jar file to run a job using steps, you can run a job by doing ssh to the master node as well (shown later)

Location of jar file on s3

EMR started the master and worker nodes as EC2 instances

Create a key pair if you don’t already have one

Save the key pair file

Copy input files to master node using scp and the key pair

Create directories on hdfs and put you input files

Run your jar file as a hadoop job (provide proper arguments)

Check the output after the job is finished