O’Reilly – Hadoop : The Definitive Guide Ch.1 Meet Hadoop

O’Reilly – Hadoop: The Definitive GuideCh.1 Meet Hadoop

May 28th, 2010Taewhi Lee

Outline Data! Data Storage and Analysis Comparison with Other Systems

– RDBMS– Grid Computing– Volunteer Computing

The Apache Hadoop Project

‘Digital Universe’ Nears a Zettabyte

Digital Universe: the total amount of data stored in the world’s computers Zettabyte: 1021 bytes >> Exabyte >> Petabyte >> Terabyte

Flood of Data

NYSE generates 1TB new trade data / day

Flood of Data

Facebook hosts 10 billion photos (1 petabyte)

Flood of Data

Internet Archive stores 2 petabytes of data

Individuals’ Data are Growing Apace

It becomes easier to take more and more photos

Individuals’ Data are Growing Apace

LifeLog, my life in a terabyte

Capture and encoding

Microsoft Research’s MyLifeBits Project

Amount of Public Data Increases

Available Public Data Sets on AWS– Annotated Human Genome– Public database of chemical structures– Various census data and labor statistics

Large Data!

How to store & analyze large data?

“More data usually beats better algorithms”

Current HDD

How long it takes to read all the data off the disk?

capacity 1TBtransfer

rate 100MB/s

How about using multiple disks?

Problems with Multiple Disks Hardware Failure

Doing tasks need to combine the dis-tributed data

What Hadoop Provides– Reliable shared storage (HDFS)– Reliable analysis system (MapReduce)

* Low latency for point queries or updates** Update times of a relatively small amount

of data

Grid Computing

Shared storage (SAN) Works well for predominantly CPU-intensive jobs Becomes a problem when nodes need to access

large data

Volunteer Computing Volunteers donate CPU time from their idle

computers Work units are sent to computers around the

Suitable for very CPU-intensive work with small data sets

Risky due to running work on untrusted ma-chines

Brief History of Hadoop Created by Doug Cutting Originated in Apache Nutch (2002)

– Open source web search engine, a part of the Lucene project

NDFS (Nutch Distributed File System, 2004) MapReduce (2005)

Doug Cutting joins Yahoo! (Jan 2006) Official start of Apache Hadoop project (Feb 2006) Adoption of Hadoop on Yahoo! Grid team (Feb

Pig Chukwa Hive HBase

MapReduce HDFSZoo

Keeper

Core Avro

O’Reilly – Hadoop : The Definitive Guide Ch.1 Meet Hadoop

Documents

Transcript of O’Reilly – Hadoop : The Definitive Guide Ch.1 Meet Hadoop

THIRD EDITION Hadoop: The Definitive Guide •• ••• ••• ••• ••library02.embl.de/InmagicGenie/DocumentFolder/TableO… · · 2015-03-03THIRD EDITION Hadoop:

Business Cases mit SAP HANA - Cloud Object Storage ... Literatur und weiterführende Informationsguellen White, Tom: Hadoop: The Definitive Guide (E-Book). 3. Aufl. Sebastopol: Yahoo

O’Reilly – Hadoop: The Definitive Guide Ch.5 Developing a MapReduce Application 2 July 2010 Taewhi Lee.

Hadoop: The Definitive Guide Chap. 8 MapReduce Features Kisung Kim.

· Web viewfunción map y la función reduce. El lenguaje nativo de Hadoop es Java, la . Figura 2.1.1.1 Flujo de datos lógico de MapReduce (tomado del libro Hadoop The Definitive

THIRD EDITION Hadoop: The Definitive Guide

O’Reilly – Hadoop : The Definitive Guide Ch.3 The Hadoop Distributed Filesystem

Map Reduce Hadoop - Department of Computer Science ... · Map Reduce & Hadoop Recommended Text: Hadoop: The Definitive Guide Tom White O’Reilly 2 Big Data §Large datasets are becoming

i.iinfo.czi.iinfo.cz/files/root/k/AsteriskTFOT.pdf · Other resources from O’Reilly Related titles Ethernet: The Definitive Guide Switching to VoIP TCP/IP Network Administration

The Open Source Paradigm Shift Tim O’Reilly O’Reilly & Associates, Inc. June 2003.

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

O’Reilly – Hadoop: The Definitive Guide Ch.1 Meet Hadoop May 28 th, 2010 Taewhi Lee.

O’Reilly Auto Parts - LoopNet

JavaScript: Getting Started - Core Servletscourses.coreservlets.com/Course-Materials/pdf/... · • JavaScript the Definitive Guide – By David Flanagan, O’Reilly. The only really

September 26-29, New Yorkimages.nvidia.com/content/pdf/nvidia-strata-oreilly-show-guide.pdf · NVIDIA AT O’REILLY AI AND STRATA HADOOP September 26-29, New York Hear from NVIDIA,

Tytuł oryginału: Hadoop: The Definitive Guide, Fourth Edition · 2019-05-15 · Narzędzie Streaming Hadoop ... Avro i model MapReduce .....346 Sortowanie za pomocą modelu MapReduce

A Definitive Guide to Hadoop-Related Frameworks and Tools ...

This file has been cleaned of potential threats. If you ...cse.iitkgp.ac.in/~sourangshu/coursefiles/SDM19A/03-hadoop.pdf · Hadoop: The definitive Guide.OreillyPress. Title: 03-hadoop

O’Reilly – Hadoop: The Definitive Guide Ch.7 MapReduce Types and Formats 29 July 2010 Taikyoung Kim.

Tytuł oryginału: Hadoop: The Definitive Guide, Fourth Editionpdf.helion.pl/hadoop/hadoop.pdf · 6 _ Spis treści Narzędzie Streaming Hadoop .....57