Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline Our systems architecture Flow...

Cloud Computing projectCloud Computing project

NSYSU Sec. 1 DemoNSYSU Sec. 1 Demo

NSYSU EE IT_LABNSYSU EE IT_LAB 22

OutlineOutline

Our system’s architectureOur system’s architecture

Flow chart of the hadoop’s job(web crawler) Flow chart of the hadoop’s job(web crawler) working on hadoop clusterworking on hadoop cluster– Basic setupBasic setup– Flow chartFlow chart

Compare crawler’s efficiency on different typCompare crawler’s efficiency on different types’ hadoop clusteres’ hadoop cluster

ArchitectureArchitecture

HardwareHardware– 2 ASUS Servers, 2 ASUS Servers, Intel Xeon CPU X3330 2.66GHz,Intel Xeon CPU X3330 2.66GHz, 1TB HD & 3G ram (master, slave1)1TB HD & 3G ram (master, slave1)– 1 PC, 1 PC, Intel Core 2 Quad CPU Q6600 2.40GHz,Intel Core 2 Quad CPU Q6600 2.40GHz, 500G HD, 4G ram (slave2)500G HD, 4G ram (slave2)

SoftwareSoftware– CentOS 5.03CentOS 5.03– Hadoop 0.20.1Hadoop 0.20.1

ArchitectureArchitectureMachine 01

Machine 02 Machine 03

master (x.x.x.1)

slave2 (x.x.x.3)slave1 (x.x.x.2)

Namenode

JobTracker

DatanodeTaskTracker

administer

http://x.x.x.1:50070

http://x.x.x.1:50030

HDFS HDFS http://x.x.x.1:50070

Job admin Job admin http://x.x.x.1:50030

Basic setup (hadoop)Basic setup (hadoop)

1.1. Setup communication without password thrSetup communication without password through ough sshssh protocol protocol

2.2. Install Install javajava3.3. Import Import java pathjava path (or any files’ path needed) (or any files’ path needed)

in {hadoop dir}/conf/hadoop-env.shin {hadoop dir}/conf/hadoop-env.sh4.4. Import Import namenodenamenode and and JobtrackerJobtracker hosts’ na hosts’ na

me in {hadoop dir}/conf/hadoop-site.shme in {hadoop dir}/conf/hadoop-site.sh

Basic setup (hadoop)Basic setup (hadoop)

5.5. Setup Setup mastermaster file and file and slavesslaves file file6.6. Format HDFS Format HDFS (hadoop distributed file system)(hadoop distributed file system)7.7. Start HadoopStart Hadoop8.8. Check hadoop Check hadoop HDFS http://namenode’s ip:50070HDFS http://namenode’s ip:50070 Job admin http://Jobtracker’s ip:50030Job admin http://Jobtracker’s ip:50030

Basic setup (crawler)Basic setup (crawler)

1.1. Check your web robot agent fileCheck your web robot agent file

2.2. Setup urls filter fileSetup urls filter file

3.3. Set your seed urls file by manual input or weSet your seed urls file by manual input or web’s url package b’s url package

(Some details’ setting steps are ignored here.)(Some details’ setting steps are ignored here.)

Flow chartFlow chartSeed urls

Run crawl commandas a hadoop job

Assign job’s fragments to each tasktracker; go fetch web’s data

Store context to output dir. on HDFS

Link logNew fetch list

Doc. dataFetch log

Map &reduce

Hadoop cluster – 1 nodeHadoop cluster – 1 node

Machine 01master (x.x.x.1)

Namenode

JobTracker

DatanodeTaskTracker

Hadoop cluster – 2 nodesHadoop cluster – 2 nodes

Machine 01 Machine 02master (x.x.x.1) slave1 (x.x.x.2)

Namenode

JobTracker

DatanodeTaskTracker

Hadoop cluster – 3 nodesHadoop cluster – 3 nodesMachine 01

Machine 02 Machine 03

master (x.x.x.1)

slave2 (x.x.x.3)slave1 (x.x.x.2)

Namenode

JobTracker

DatanodeTaskTracker

Urls setUrls set

Get urls package from Get urls package from http://dmoz.org/http://dmoz.org/

select one out of every 500, so that we end select one out of every 500, so that we end up with around 10000 URLs up with around 10000 URLs

Crawler input (seeds.txt)Crawler input (seeds.txt)

Crawler ouputCrawler ouput

Output to HDFSOutput to HDFS

Speed compareSpeed compare

Hadoop job costs timeHadoop job costs time(9199 urls)(9199 urls)

1 work node1 work node 1888 seconds1888 seconds

2 work nodes2 work nodes 1679 seconds1679 seconds

3 work nodes3 work nodes 1628 seconds1628 seconds

Thanks for your attention!!Thanks for your attention!!

Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline Our systems architecture Flow...

Documents

Transcript of Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline Our systems architecture Flow...

Department of Materials and Optoelectronic Science, National Sun Yat-Sen University (NSYSU)

Crawler Spare Parts Crawler Spare Parts - HeliPal.comimage.helipal.com/helipal-hsp-1-10-rock-crawler.pdf · 02100-Wheel Hex. 4P ... Crawler Spare Parts Crawler Spare Parts 08027-Wheel

1 Topology and solid state physics Ming-Che Chang Department of Physics 11/10/2011 @ NSYSU.

Mason Williams - Classical Gas - easymusicnotes.com€¦WinJammer Demo WinJammer Demo WinJammer Demo WinJammer Demo 12 WinJammer Demo WinJammer Demo WinJammer Demo WinJammer Demo 14

How did the NSYSU College of Management assure students ’ learning outcomes

I-5-1 Basic Principles and Components of PCR NSYSU CHUNG-LUNG CHO.

Keywlker crawler

Slide 1 Chapter 13 Recursion Written by Walter Savitch, UCSD Modified by Tsung Lee, NSYSU EE.

Worksheets Demo 1 Demo 2 Demo 3 Demo 4.

Demo: dot_product_reflect Demo: dot_product_reflect_torus

Rooster Crawler

NSYSU Junior-Enterprise

Guide of Project in NSYSU

1. Syllabus and Class Overview - Amazon S3 · 4. 3D Math, DirectX 11.2, Demo 1 5. Demo 2, Demo 3 6. Demo 4, Demo 5 7. Demo 6, Demo 7 8. Demo 8, Demo 9 9. Demo 10 CSCE 4210/5250 SyllabusandClassOverview

Using Video Assignments to Support Remote Learning · How to create groups (6) Demo student1 Demo student2 Demo student3 Demo student4 Demo student5 Demo student6 Demo student7 When

Rocka Crawler

November 2005 NSYSU Junior-Enterprise Promoting Global Entrepreneurship NSYSU Junior-Enterprise Consulting Marketing Finance.

CHAT PHONE Omni-Channel let’s you take these business ......- Salesforce Lightn Demo Salesforce Omni- Demo - - Salesforce Omni Demo Demo - Microsoft Dynam Demo - Desk.com Demo -

PDF SIGNER DEMO VERSIONcmsv5.stoxplus.com/medialib/Crawler/2018/2018-10/...840.351.398 303.840.732 16.791.807.586 57.710.440.282 43.479.425 5. B. 1. 2. 3. 'I'ài sán ngän han khác

About National Sun Yat-sen Univ. The Logo of NSYSU.