Cosmos fiware

Open APIs for Open Minds

Building your first application using FI-WARE

Cosmos, Big Data GE implementation

Big Data:What is it and how much data is there

What is big data?

> small data

What is big data?

> big data

http://commons.wikimedia.org/wiki/File:Interior_view_of_Stockholm_Public_Library.jpg

How much data is there?

Data growing forecast

2.3 3.612

Global users

(billions)

Global networked

devices(billions)

Global broadband speed(Mbps)

Global traffic(zettabytes)

http://www.cisco.com/en/US/netsol/ns827/networking_solutions_sub_solution.html#~forecast

20122012

1 zettabyte = 1021 bytes

1,000,000,000,000,000,000,000 bytes

It is not only about storing big data but using it!

> tools

> big data

http://commons.wikimedia.org/wiki/File:Interior_view_of_Stockholm_Public_Library.jpg

How to deal with it:The Hadoop reference

Hadoop was created by Doug Cutting at Yahoo!...

… based on the MapReduce patent by Google

Well, MapReduce was really invented by Julius Caesar

Divide etimpera*

* Divide and conquer

An example

How much pages are written in latin among the booksin the Ancient Library of Alexandria?

LATINREF1P45

GREEKREF2P128

EGYPTREF3P12

LATINpages 45

EGYPTIAN

LATINREF4P73

LATINREF5P34

EGYPTREF6P10

GREEKREF7P20

GREEKREF8P230

45 (ref 1)

still reading…

Mappers

Reducer

An example

GREEKREF2P128

stillreading…

EGYPTIAN

LATINREF4P73

LATINREF5P34

EGYPTREF6P10

GREEKREF7P20

GREEKREF8P230

45 (ref 1)

Mappers

Reducer

An example

LATINpages 73

EGYPTIAN

LATINREF4P73

LATINREF5P34

GREEKREF7P20

GREEKREF8P230

LATINpages 34

45 (ref 1)

+73 (ref 4)

+34 (ref 5)

Mappers

Reducer

An example

GREEKREF7P20

GREEKREF8P230

idle…

45 (ref 1)

+73 (ref 4)

+34 (ref 5)

Mappers

Reducer

An example

idle…

45 (ref 1)

+73 (ref 4)

+34 (ref 5)

152 TOTAL

Mappers

Reducer

Hadoop architecture

head node

FI-WARE proposal:Cosmos Big Data

What is Cosmos?

• Cosmos is Telefónica's Big Data platform• Dynamic creation of private computing clusters

as a service• Infinity, a cluster for persistent storage

• Cosmos is Hadoop ecosystem-based• HDFS as its distributed file system• Hadoop core as its MapReduce engine• HiveQL and Pig for querying the data• Oozie as remote MapReduce jobs and Hive

launcher

• Plus other proprietary features• Infinity protocol (secure WebHDFS)• Cygnus, an injector for context data coming from

Orion CB

Cosmos architecture

What can be done with Cosmos?

WhatLocally

(ssh’ing into the Head Node)

Remotely(connecting your app)

Clusters operation Cosmos CLI REST API

I/O operation‘hadoop fs’

command

REST API(WebHDFS, HttpFS,

Infinity protocol)

Querying tools(basic analysis)

Hive CLI JDBC, Thrift*

MapReduce(advanced analysis)

‘hadoop jar’ command

Oozie REST API

Clusters operation:Getting your own

roman legion

Using the RESTful API (1)

Using the Python CLI

• Creating a cluster$ cosmos create --name <STRING> --size <INT>

• Listing all the clusters$ cosmos list

• Showing a cluster details$ cosmos show <CLUSTER_ID>

• Connecting to the Head Node of a cluster$ cosmos ssh <CLUSTER_ID>

• Terminating a cluster$ cosmos terminate <CLUSTER_ID>

• Listing available services$ cosmos list-services

• Creating a cluster with specific services$ cosmos create --name <STRING> --size <INT>--services <SERVICES_LIST>

How to exploit the data:

Commanding your roman legion

1. Hadoop filesystem commands

• Hadoop general command$ hadoop

• Hadoop file system subcommand$ hadoop fs

• Hadoop file system options$ hadoop fs –ls$ hadoop fs –mkdir <hdfs-dir>$ hadoop fs –rmr <hfds-file>$ hadoop fs –cat <hdfs-file>$ hadoop fs –put <local-file> <hdfs-dir>$ hadoop fs –get <hdfs-file> <local-dir>

• http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html

2. WebHDFS/HttpFS REST API

• List a directoryGET http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS

• Create a new directoryPUT http://<HOST>:<PORT>/<PATH>?op=MKDIRS[&permission=<OCTAL>]

• Delete a file or directoryDELETE http://<host>:<port>/webhdfs/v1/<path>?op=DELETE [&recursive=<true|false>]

• Rename a file or directoryPUT http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=RENAME&destination=<PATH>

• Concat filesPOST http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CONCAT&sources=<PATHS>

• Set permissionPUT http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETPERMISSION [&permission=<OCTAL>]

• Set ownerPUT http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETOWNER [&owner=<USER>][&group=<GROUP>]

2. WebHDFS/HttpFS REST API (cont.)

• Create a new file with initial content (2 steps operation)PUT http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE [&overwrite=<true|false>][&blocksize=<LONG>][&replication=<SHORT>] [&permission=<OCTAL>][&buffersize=<INT>]HTTP/1.1 307 TEMPORARY_REDIRECTLocation: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE...Content-Length: 0PUT -T <LOCAL_FILE> http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE...

• Append to a file (2 steps operation) POST http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=APPEND[&buffersize=<INT>] HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=APPEND... Content-Length: 0 POST -T <LOCAL_FILE> http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=APPEND...

2. WebHDFS/HttpFS REST API (cont.)

• Open and read a file (2 steps operation)GET http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN [&offset=<LONG>][&length=<LONG>][&buffersize=<INT>]HTTP/1.1 307 TEMPORARY_REDIRECTLocation: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=OPEN...Content-Length: 0GET http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=OPEN...

• http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

• HttpFS does not redirect to the Datanode but to the HttpFS server, hidding the Datanodes (and saving tens of public IP addresses)

• The API is the same• http://

hadoop.apache.org/docs/current/hadoop-hdfs-httpfs/index.html

3. Local Hive CLI

• Hive is a querying tool• Queries are expresed in HiveQL, a SQL-like

language• https://

cwiki.apache.org/confluence/display/Hive/LanguageManual

• Hive uses pre-defined MapReduce jobs for• Column selection• Fields grouping• Table joining• …

• All the data is loaded into Hive tables

3. Local Hive CLI (cont.)

• Log on to the Master node• Run the hive command• Type your SQL-like sentence!

$ hive$ Hive history file=/tmp/myuser/hive_job_log_opendata_XXX_XXX.txthive>select column1,column2,otherColumns from mytable where column1='whatever' and columns2 like '%whatever%';Total MapReduce jobs = 1Launching Job 1 out of 1Starting Job = job_201308280930_0953, Tracking URL = http://cosmosmaster-gi:50030/jobdetails.jsp?jobid=job_201308280930_0953Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=cosmosmaster-gi:8021 -kill job_201308280930_09532013-10-03 09:15:34,519 Stage-1 map = 0%, reduce = 0%2013-10-03 09:15:36,545 Stage-1 map = 67%, reduce = 0%2013-10-03 09:15:37,554 Stage-1 map = 100%, reduce = 0%2013-10-03 09:15:44,609 Stage-1 map = 100%, reduce = 33%…

4. Remote Hive client

• Hive CLI is OK for human-driven testing purposes• But it is not usable by remote applications

• Hive has no REST API• Hive has several drivers and libraries

• JDBC for Java• Python• PHP• ODBC for C/C++• Thrift for Java and C++• https://

cwiki.apache.org/confluence/display/Hive/HiveClient

• A remote Hive client usually performs:• A connection to the Hive server (TCP/10000)• The query execution

4. Remote Hive client – Get a connection

private Connection getConnection( String ip, String port, String user, String password) { try { // dynamically load the Hive JDBC driver Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver"); } catch (ClassNotFoundException e) { System.out.println(e.getMessage()); return null; } // try catch try { // return a connection based on the Hive JDBC driver, default DB return DriverManager.getConnection("jdbc:hive://" + ip + ":" + port + "/default?user=" + user + "&password=" + password); } catch (SQLException e) { System.out.println(e.getMessage()); return null; } // try catch} // getConnection

https://github.com/telefonicaid/fiware-connectors/tree/develop/resources/hive-basic-client

4. Remote Hive client – Do the query

private void doQuery() { try { // from here on, everything is SQL! Statement stmt = con.createStatement(); ResultSet res = stmt.executeQuery("select column1,column2," + "otherColumns from mytable where column1='whatever' and " + "columns2 like '%whatever%'");

// iterate on the result while (res.next()) { String column1 = res.getString(1); Integer column2 = res.getInteger(2); // whatever you want to do with this row, here } // while

// close everything res.close(); stmt.close(); con.close(); } catch (SQLException ex) { System.exit(0); } // try catch} // doQuery

https://github.com/telefonicaid/fiware-connectors/tree/develop/resources/hive-basic-client

4. Remote Hive client – Plague Tracker demo

https://github.com/telefonicaid/fiware-connectors/tree/develop/resources/plague-tracker

5. MapReduce applications

• MapReduce applications are commonly written in Java

• Can be written in other languages through Hadoop Streaming

• They are executed in the command line

$ hadoop jar <jar-file> <main-class> <input-dir> <output-dir>

• A MapReduce job consists of:• A driver, a piece of software where to define inputs, outputs,

formats, etc. and the entry point for launching the job• A set of Mappers, given by a piece of software defining its

behaviour• A set of Reducers, given by a piece of software defining its

behaviour• There are 2 APIS

• org.apache.mapred old one• org.apache.mapreduce new one

• Hadoop is distributed with MapReduce examples• [HADOOP_HOME]/hadoop-examples.jar

5. MapReduce applications – Map

/* org.apache.mapred example */public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { /* use the input value, the input key is the offset within the file and it is not necessary in this example */ String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line);

/* iterate on the string, getting each word */ while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); /* emit an output (key,value) pair based on the word and 1 */ output.collect(word, one); } // while } // map} // MapClass

5. MapReduce applications – Reduce

/* org.apache.mapred example */public static class ReduceClass extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0;

/* iterate on all the values and add them */ while (values.hasNext()) { sum += values.next().get(); } // while

/* emit an output (key,value) pair based on the word and its count */ output.collect(key, new IntWritable(sum)); } // reduce} // ReduceClass

5. MapReduce applications – Driver

/* org.apache.mapred example */package my.org

import java.io.IOException;import java.util.*;

import org.apache.hadoop.fs.Path;import org.apache.hadoop.conf.*;import org.apache.hadoop.io.*;import org.apache.hadoop.mapred.*;import org.apache.hadoop.util.*;

public class WordCount { public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(MapClass.class); conf.setCombinerClass(ReduceClass.class); conf.setReducerClass(ReduceClass.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } // main} // WordCount

6. Launching tasks with Oozie

• Oozie is a workflow scheduler system to manage Hadoop jobs

• Java map-reduce• Pig and Hive• Sqoop• System specific jobs (such as Java programs and shell scripts)

• Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

• Writting Oozie applications is about including in a package

• The MapReduce jobs, Hive/Pig scritps, etc (exeutable code)• A Workflow• Parameters for the Workflow

• Oozie can be use locally or remotely• https://

oozie.apache.org/docs/4.0.0/index.html#Developer_Documentation

6. Launching tasks with Oozie – Java client

OozieClient client = new OozieClient("http://130.206.80.46:11000/oozie/");

// create a workflow job configuration and set the workflow application pathProperties conf = client.createConfiguration();conf.setProperty(OozieClient.APP_PATH, "hdfs://cosmosmaster-gi:8020/user/frb/mrjobs");conf.setProperty("nameNode", "hdfs://cosmosmaster-gi:8020");conf.setProperty("jobTracker", "cosmosmaster-gi:8021");conf.setProperty("outputDir", "output");conf.setProperty("inputDir", "input");conf.setProperty("examplesRoot", "mrjobs");conf.setProperty("queueName", "default");

// submit and start the workflow jobString jobId = client.run(conf);

// wait until the workflow job finishes printing the status every 10 secondswhile (client.getJobInfo(jobId).getStatus() == WorkflowJob.Status.RUNNING) {

System.out.println("Workflow job running ..."); Thread.sleep(10 * 1000);} // while

System.out.println("Workflow job completed");

Useful references

• Hive resources:• HiveQL language https://

cwiki.apache.org/confluence/display/Hive/LanguageManual• How to create Hive clients https

://cwiki.apache.org/confluence/display/Hive/HiveClient• Hive client example https

://github.com/telefonicaid/fiware-connectors/tree/develop/resources/hive-basic-client

• Plague Tracker demo https://github.com/telefonicaid/fiware-livedemoapp/tree/master/cosmos/plague-tracker

• Plague Tracker instance http://130.206.81.65/plague-tracker/

• Hadoop filesystem commands:• http://

hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html

• WebHDFS and HttpFS REST APIs:• http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Web

HDFS.html• http://hadoop.apache.org/docs/current/hadoop-hdfs-httpfs/index.html

• Oozie• https://oozie.apache.org/docs/4.0.0/index.html#Developer_Documentation

Cosmos place in FI-WARE:

Typical scenarios

General IoT platform

IoT BackendDevice Management

COSMOS(BIG DATA)

DATA PROCESSING

DATA QUERYING

SUBSOPEN DATA

CONTEXT BROKER

measures / commands

IoT/Sensor Open Data

SENSOR 2 THINGS

Accounting &

ent & B

illing

SHORT TERM HISTORIC

REAL TIME PRCSSING

BLNKRULES

DEFINITION

BLNKOPERATIONAL DASHBOARD

KPI GOVERNANCE OPEN DATA PORTALS

GISContext

Adapters

Service Orchrestation

CityServices

You don’t haveto use them all!

Real time context data persistence (architecture)

https://forge.fi-ware.eu/plugins/mediawiki/wiki/fiware/index.php/How_to_persist_Orion_data_in_Cosmos

https://github.com/telefonicaid/fiware-connectors/tree/develop/flume

Real time context data persistence (detail)

Real time context data persistence (examples)

• Information coming from city sensors• Presence map gradients, aglomerations…• Services usage distributions, top users (if

available), top POIs, unused resources…• Information generated by smartphones

• Geolocation routes, map gradients, aglomerations…

• Issues reporting top neighbourhooods in incidents, crimilality, noises, garbage, plagues…

• Any other real time information• Depending on your app, this could be product

likes, product consumption, user-2-user feedback… recommendations, advertisement…

Roadmap:More functionalities

and integrations

Roadmap

• Integrate the clusters creation with the cloud portal• No more REST API work

• Streaming analysis capabilities• Not all the analysis can wait for a batch

processing• Geolocation analysis capabilities

• An important source of data nowadays• Integrate with CKAN

• As a source of batch data• Integrate with the Marketplace

• Selling datasets• Selling analysis results• Selling applications and algorithms

fiware-lab-help@lists.fi-ware.orgfrancisco.romerobueno@telefonica.com

http://fi-ppp.eu

http://fi-ware.eu

Follow @Fiware on Twitter!

Thanks !

Cosmos fiware

Documents

Transcript of Cosmos fiware

FIWARE introduction

FIWARE and ICT-enabled smart communities · FIWARE technology and exploiting published open data . FIWARE Lab FIWARE and ICT-enabled smart communities 8 • 17 nodes in Europe providing

FIWARE MEXICO WorkShop 2016 - 3. FIWARE: Open APIs for Open Cities

FIWARE Technology

FIWARE General Presentation · 10 FIWARE Accelerate •The FIWARE Acceleration Programme promotes the take up of FIWARE technologies among solution integrators and application developers,

Smallsignals fiware

The FIWARE Marketplace€¦ · The FIWARE Marketplace Juanjo Hierro FIWARE CTO juanjose.hierro@fiware.org. 1. 2. FIWARE: ... Complex Event Processing

FIWARE Lab Cloud - inicio | Cudi. FIWARE Lab...FIWARE Lab no solamente permite experimentar con las tecnologías FIWARE, también permite mostrar y testear las aplicaciones con datos

FRACTALS FIWARE - FIspace Technical presentation -Final-basscom.org/...FIWARE...Technical_presentation.pdf · FIWARE GenericEnablers(GEs) A FIWARE Generic Enabler (GE): set of general-purpose

Fiware IoT_IDAS_intro_ul20_v2

Fiware cloud capabilities_and_setting_up_your_environment

FIWARE Lab

Kurento FIWARE

FIWARE FWD big_data_all_in_1_v1

FIWARE training fiwre-cosmos 1.0 · fiwre-cosmos 1.0.0 francisco.romerobueno@telefonica.com . Global instance of Cosmos Big Data in FIWARE Lab 2 • Hadoop-based – Hortonworks Data

Fiware Platform

Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 Minutes

Introduction to FIWARE Open Ecosystem - UPV · Introduction to FIWARE Open Ecosystem Fernando López, Fermín Galán, Sergio García Telefonica I+D. ... Cosmos / Big Data overview

FIWARE Based Application Development - UFUflavio/pds1/files/2016-01/FIWARE Based... · The FIWARE project will deliver at least one reference ... Architecture Overview 18. ... FIWARE

FIWARE Accelerator Programme