Big Data ODS - HighQSoft€¦ · ODS Server ODS Client ODS API 6.0 Definition ODS 6.0 Technology...

Big Data ODSSetting up of a prototype

HighQSoft GmbH | www.highqsoft.de | 22.12.2014

1


2

Performance und ScalabilityTopics

1. Why Big Data?2. General Overview3. HighQSoft Approach4. Summary


Physical Storage:Meta-Data

What is the ODS 6.0 Proposal?Overview

3

ODS Server

ODS Client

ODS API Definition CORBA Technology

Physical Storage:Mass-Data

Objectives of Proposal:Re-Work of ODS API Definition (streamlining in focus; may include enhancement of e.g. web-services)Replacement of CORBA Technology


Big Data Zoo

What is Big Data Integration?Overview

4

ODS Server

ODS Client

ODS API 6.0 Definition ODS 6.0 Technology

Objectives of Proposal: Integration of Big DataEnhancement of ASAM Base ModelEnhancement of ODS Server FunctionalityDefining an Interface to Big Data

The proposal is independent to the current ODS 6.0 proposal. It also covers other areas.

Big Data Enhancements


Why Big Data?Overview

5

Data produce gets large scale in volume. Currently, some MDF4.1 measurement files already can’t efficiently be integrated into ODS.

In terms of ODS, the files can inherit a great number of external components (millions per file)The files are too large to move them around (server to client)

There are limits within Oracle:1*10^50 entries is a limitWith 3*10^8 entries, a “select * from table where id = xy” takes 30 seconds (no indexing)The latency grows linear

We want to do 1000 vehicles with 100 measurements with 10^3 to 10^5 channels a day (2*10^10 / year).


6




Prototype DevelopmentThe ODS Setup

7

SPARK

Database ODS Server

YARN

TASK CPU

TASK CPU

TASK CPU

TASK CPU

HDFSUSA

HBASEUSA

SOLR / …USA

The general ODS setup remains pretty much alike:ODS Server remains as an organizing entitya database (not Oracle) will still be required

EnvironmentSecurity“Big Data Configuration”(Catalogs)


Prototype DevelopmentObjective 1: Defining a BIG ODS interface

8

SPARK

Database ODS Server

YARN

TASK CPU

TASK CPU

TASK CPU

TASK CPU

HDFSUSA

HBASEUSA

SOLR / …USA

The integration of BIG ODS requires:Definition of ODS Request Interface (may have impact on base model), e.g.

Information: Ask for value matrixLocation: USATechnology / Physical Storage: HDFS / MDF4…

Driver implementation

Here: SPARK is used as an middle-ware / umbrella technology


Prototype DevelopmentUsing middle-ware technologies like SPARK

9

SPARK

Database ODS Server

YARN

TASK CPU

TASK CPU

TASK CPU

TASK CPU

HDFSUSA

HBASEUSA

SOLR / …USA

SPARK is a processing machine. It processes / distributes on a logical level and “independent” to the physical storage.

Where is the information (cluster location)?Who has the information (technology)?

How does it work?The ODS Server sends a request (“order”)A “job”, a pre-defined execution box, processes an order.“Apps” containing one or multiple jobs are executed


Prototype DevelopmentObjective 2: Defining Jobs as part of the ODS interface(?)

10

SPARK

Database ODS Server

YARN

TASK CPU

TASK CPU

TASK CPU

TASK CPU

HDFSUSA

HBASEUSA

SOLR / …USA

Part of the interface definition may be the definition of what a job is. A job depends on

the order (ODS; what to retrieve?)The big data technology / physical storage used (here: SPARK; how to retrieve)?

Top level technologies (like SPARK) require to be defined / supported.


Prototype DevelopmentProcessing tasks

11

SPARK

Database ODS Server

YARN

TASK CPU

TASK CPU

TASK CPU

TASK CPU

HDFSUSA

HBASEUSA

SOLR / …USA

YARN is a resource manager. It processes / distributes on a physical level.

CPU / MemoryWork-Load Distribution (within cluster)Supports major physical storage technologies

How does it work?SPARK Apps are executed as tasksTasks are outsourced and executed


Prototype DevelopmentObjective 3: Is a physical storage definition required?

12

SPARK

Database ODS Server

YARN

TASK CPU

TASK CPU

TASK CPU

TASK CPU

HDFSUSA

HBASEUSA

SOLR / …USA

Big Data Technologies are used as implemented at the customer (HDFS, HBASE, MongoDB, SOLR, …).

There is (probably) no definition of physical storageEach technology needs a new job definition based on the “order” (each job has one derivation per technology / physical storage)

How does it work?The information is retrieved from the physical storage (that is only is defined in the job)


Prototype DevelopmentConclusions

13

SPARK

Database ODS Server

YARN

TASK CPU

TASK CPU

TASK CPU

TASK CPU

HDFSUSA

HBASEUSA

SOLR / …USA

The big data technology zoo can be managed with assumptions on a middle-ware technologyThere will not be “a” (as in one) solutionSolutions will depend on use-cases and technologies used


Prototype DevelopmentConclusion: Performance and Scalability

14

SPARK

Database ODS Server

YARN

TASK CPU

TASK CPU

TASK CPU

TASK CPU

HDFSUSA

HBASEUSA

SOLR / …USA

e.g. Avalon Distributor (horizontal and vertical)Indexer (Notification Server and ODS API Security!)

This a cluster. Get more clusters.

Ramp up your cluster.

Enhance disk space.Index / Pre‐Process for On‐Demand Performance (with technologies wanted)

ElasticSearch?


15




16

HighQSoft ApproachGeneral Ideas

Import data from various formats (MDF, DBC, …) and from distributed sites Provide interfaces to typical data analytics and visualization toolsSupport state of the art security (today ODS has own role models etc. plus LDAP)Migration ODS 5.3 to a big data solution (for the client that shall be transparent)

Understand your project (analyze measure data) status quoUnderstand your organization (from measure events to ODS “project” data) big data challengeSizes: #channels ~ 105, #measurements/day ~ 105

Aliases for channel names etc. (v_fz vfz -> v_veh velocity)


17

HighQSoft ApproachSolution Architecture

Avalon ODS Server

ATHOS

Driver

CORBA / ODS 6.0

Big Da

ta Cluster / Partne

r

Web Service

SPARK Future DB

MQ | SM | .

…

V | T | L | …

… ?

HighQSoft

ODS Server remains as ODS gateway / security

Driver is independent to partner / project and connects to big dataUse-Case 1: writeUse-Case 2: readUse-Case 2: stream (read)

Cluster is set up by the partner / customer due to the use-case

HighQSoft requirements are to be implemented. Web Service is independent to partner / project, but contains specific “jobs” (see use-case)

Big Data contains Measured Values and Quantities and Units (utilized)

The data format is to be defined (standardized?)


18

Big Da

ta Cluster / Partne

r

Web Service

SPARK Future DB

MQ | SM | .

…

V | T | L | …

… ?

Specific File Formats: When data is itemized files

need to be parsed for import (big data import, notification to ODS)

Require to be (partly) generated for file based third party tool

HighQSoft ApproachSolution Architecture

File XYGenerator

File XYParser / Importer


19

Thanks


20

This is the first headlineThis is the second headline

102 / 179 / 190

51 / 154 / 169

0 / 129 / 147

204 / 230 / 232

153 / 205 / 212

102 /1146 / 176

51 / 110 / 150

0 / 74 / 123

204 / 219 / 229

153 / 183 / 202

217 / 218 / 219

177 / 179 /180

135 / 136 / 138

88 / 88 / 90


21

This is the first headlineThis is the second headline

237 / 179 / 201

218 / 102 / 148

193 / 0 / 76

244 / 208 / 182

232 / 161 / 110

217 / 98 / 13

249 / 235 / 179

243 / 216 / 102

235 / 189 / 0

Big Data ODS - HighQSoft€¦ · ODS Server ODS Client ODS API 6.0 Definition ODS 6.0 Technology...

Documents

Transcript of Big Data ODS - HighQSoft€¦ · ODS Server ODS Client ODS API 6.0 Definition ODS 6.0 Technology...