Hadoop Hadoop & Spark meetup - Altiscale

download Hadoop Hadoop & Spark meetup - Altiscale

of 20

  • date post

    06-Jan-2017
  • Category

    Technology

  • view

    216
  • download

    1

Embed Size (px)

Transcript of Hadoop Hadoop & Spark meetup - Altiscale

PowerPoint Presentation

AltiscaleBig Data-as-a-ServicePaul Tibaldi RSD & Ajay Jha SA

Market BackgroundWho is Altiscale?Why are we different/better?Hadoop AdminApache Hadoop Stack Platform/Access/DemoQ/A

2Big Data As A Service

2

Market Background

4Interest in Big Data is growing fast

4

5Big Data in The Cloud is Accelerating

On-Premises

32%

Cloud Only

23%

Cloud Plus On-Premises

29%Source: Hadoop Expansion Boosts Cloud and Unsupported On-Premises Deployments, Merv Adrian, Nick Huedecker, 3 September 2015

From: Hadoop Expansion Boosts Cloud and Unsupported On-Premises DeploymentsAnalysts: Merv Adrian, Nick HuedeckerPublished: 3 September 2015

But the journey has dangersGartner: 70% of independent Big Data implementations will fail to meet revenue and cost objectives, through 2018.

Altiscale solves the big data challenge for companies by providing a Hadoop ecosystem that is immediately available, automatically scalable, high performance, and secure.

We are at the advent of the Big Data age, and Hadoop is rapidly emerging as the leading technology to store, process, and analyze massive amounts of information. However, Hadoop is difficult to implement, scale, manage, and secure - that is, if you can even find the right experienced people to do it. On-premises solutions are expensive, take months to get set up, and require significant resources to manage over time. Gartner expects 70% of Hadoop implementations to fail to meet objectives through 2018.

Forrester says Altiscale reduces job failures by 60%.6

Who is Altiscale?

Altiscale Data Cloud GA in 2014

Financed by top-tier technology investors

Recognized innovator in Hadoop-as-a-Service

About Altiscale

About AltiscaleLed by experienced, renowned Hadoop team from Yahoo!Raymie Stata, CEO. Former Yahoo! CTO, well-known advocate of Apache Software FoundationDavid Chaiken, CTO. Former Yahoo! Chief ArchitectBuilt and managed by veterans of Big Data, SaaS, and enterprise softwareFrom Google, Netflix, LinkedIn, VMware, Oracle, and Yahoo!

40,000 nodes500 PB1,000 users$ billions at stake

Raymie Stata, CEO

David Chaiken, CTO

Ricardo JenezVP of Engineering

Charles Wimmer Head of Operations

Big data built for speedFast time to valuedays not monthsEasier, faster scalabilitywith elastic scalingOperations supportso your jobs get doneLower TCOfor fast investment payback

11Unmatched SecurityAltiscale is the only provider that delivers integrated security encompassing its Big Data platform offering

Complete best of breed

Big Data is complex.It gets more complicated as you scale.

13Up until now everything has seemed easy, but sooner or later you realize that you cant run everything on one machine. Supporting multiple machines and concurrent jobs requires a totally different architecture. Now all of a sudden you start realizing that things arent so simple.

An additional note: when you have multiple resources to manage, they often come from different groups which makes it even harder to coordinate them.

Last update: June 11, 2007

Big Data-as-a-Service

14Up until now everything has seemed easy, but sooner or later you realize that you cant run everything on one machine. Supporting multiple machines and concurrent jobs requires a totally different architecture. Now all of a sudden you start realizing that things arent so simple.

An additional note: when you have multiple resources to manage, they often come from different groups which makes it even harder to coordinate them.

Last update: June 11, 2007

The Altiscale Data Cloud Core

Altiscale Data Cloud is 100% based on Apache open source. Our current Altiscale Data Cloud 4.0 release is composed of the following Apache components and versions:Apache Hadoop 2.7.1 Apache Spark 1.5* Apache Hive (& HCatalog) 1.2 Apache Tez 0.7.0 Apache Pig 0.15.1Apache Oozie 4.2.0 Apache Flume 1.5.2 Avro 1.7.4 JDK/JRE 7 (Sun/Oracle version) HttpFS

In addition to the above, we also support the three latest versions of Spark to our customers. That allows our customers the options of a conservative approach as well as a the option to work with the bleeding edge fast moving Spark community.

Concurrency with Apache Versioning

Hire an expert to take care of the clusterHardware setup and Cluster installationAddress hardware failure Upgrade Hadoop stackTuning config parametersyarn-site.xml ex : yarn.nodemanager.resource.memory-mb mapred-site.xml ex : mapreduce.task.io.sort.mbhdfs-site.xml ex : dfs.blocksize

Hadoop Administration

Accessing the cloud

Spark exampleBuild Spark code laptop using mavenBuild the jar and copy over Altiscales workbench (Gateway) node.Launch Spark job on YARN.Monitor using Resource ManagerQuick Spark Demo

20Thank You!