Big Data ODS - HighQSoft ODS Server ODS Client ODS API 6.0 Definition ODS 6.0 Technology Objectives

download Big Data ODS - HighQSoft ODS Server ODS Client ODS API 6.0 Definition ODS 6.0 Technology Objectives

of 21

  • date post

    03-Aug-2020
  • Category

    Documents

  • view

    4
  • download

    0

Embed Size (px)

Transcript of Big Data ODS - HighQSoft ODS Server ODS Client ODS API 6.0 Definition ODS 6.0 Technology Objectives

  • Big Data ODS Setting up of a prototype

    HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    1

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    2

    Performance und Scalability Topics

    1. Why Big Data? 2. General Overview 3. HighQSoft Approach 4. Summary

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    Physical Storage: Meta-Data

    What is the ODS 6.0 Proposal? Overview

    3

    ODS Server

    ODS Client

    ODS API Definition CORBA Technology

    Physical Storage: Mass-Data

    Objectives of Proposal: Re-Work of ODS API Definition (streamlining in focus; may include enhancement of e.g. web-services) Replacement of CORBA Technology

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    Big Data Zoo

    What is Big Data Integration? Overview

    4

    ODS Server

    ODS Client

    ODS API 6.0 Definition ODS 6.0 Technology

    Objectives of Proposal: Integration of Big Data Enhancement of ASAM Base Model Enhancement of ODS Server Functionality Defining an Interface to Big Data

    The proposal is independent to the current ODS 6.0 proposal. It also covers other areas.

    Big Data Enhancements

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    Why Big Data? Overview

    5

    Data produce gets large scale in volume. Currently, some MDF4.1 measurement files already can’t efficiently be integrated into ODS.

    In terms of ODS, the files can inherit a great number of external components (millions per file) The files are too large to move them around (server to client)

    There are limits within Oracle: 1*10^50 entries is a limit With 3*10^8 entries, a “select * from table where id = xy” takes 30 seconds (no indexing) The latency grows linear

    We want to do 1000 vehicles with 100 measurements with 10^3 to 10^5 channels a day (2*10^10 / year).

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    6

    Performance und Scalability Topics

    1. Why Big Data? 2. General Overview 3. HighQSoft Approach 4. Summary

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    Prototype Development The ODS Setup

    7

    SPARK

    Database ODS Server

    YARN

    TASK CPU

    TASK CPU

    TASK CPU

    TASK CPU

    HDFS USA

    HBASE USA

    SOLR / … USA

    The general ODS setup remains pretty much alike: ODS Server remains as an organizing entity a database (not Oracle) will still be required

    Environment Security “Big Data Configuration” (Catalogs)

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    Prototype Development Objective 1: Defining a BIG ODS interface

    8

    SPARK

    Database ODS Server

    YARN

    TASK CPU

    TASK CPU

    TASK CPU

    TASK CPU

    HDFS USA

    HBASE USA

    SOLR / … USA

    The integration of BIG ODS requires: Definition of ODS Request Interface (may have impact on base model), e.g.

    Information: Ask for value matrix Location: USA Technology / Physical Storage: HDFS / MDF4 …

    Driver implementation

    Here: SPARK is used as an middle-ware / umbrella technology

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    Prototype Development Using middle-ware technologies like SPARK

    9

    SPARK

    Database ODS Server

    YARN

    TASK CPU

    TASK CPU

    TASK CPU

    TASK CPU

    HDFS USA

    HBASE USA

    SOLR / … USA

    SPARK is a processing machine. It processes / distributes on a logical level and “independent” to the physical storage.

    Where is the information (cluster location)? Who has the information (technology)?

    How does it work? The ODS Server sends a request (“order”) A “job”, a pre-defined execution box, processes an order. “Apps” containing one or multiple jobs are executed

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    Prototype Development Objective 2: Defining Jobs as part of the ODS interface(?)

    10

    SPARK

    Database ODS Server

    YARN

    TASK CPU

    TASK CPU

    TASK CPU

    TASK CPU

    HDFS USA

    HBASE USA

    SOLR / … USA

    Part of the interface definition may be the definition of what a job is. A job depends on

    the order (ODS; what to retrieve?) The big data technology / physical storage used (here: SPARK; how to retrieve)?

    Top level technologies (like SPARK) require to be defined / supported.

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    Prototype Development Processing tasks

    11

    SPARK

    Database ODS Server

    YARN

    TASK CPU

    TASK CPU

    TASK CPU

    TASK CPU

    HDFS USA

    HBASE USA

    SOLR / … USA

    YARN is a resource manager. It processes / distributes on a physical level.

    CPU / Memory Work-Load Distribution (within cluster) Supports major physical storage technologies

    How does it work? SPARK Apps are executed as tasks Tasks are outsourced and executed

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    Prototype Development Objective 3: Is a physical storage definition required?

    12

    SPARK

    Database ODS Server

    YARN

    TASK CPU

    TASK CPU

    TASK CPU

    TASK CPU

    HDFS USA

    HBASE USA

    SOLR / … USA

    Big Data Technologies are used as implemented at the customer (HDFS, HBASE, MongoDB, SOLR, …).

    There is (probably) no definition of physical storage Each technology needs a new job definition based on the “order” (each job has one derivation per technology / physical storage)

    How does it work? The information is retrieved from the physical storage (that is only is defined in the job)

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    Prototype Development Conclusions

    13

    SPARK

    Database ODS Server

    YARN

    TASK CPU

    TASK CPU

    TASK CPU

    TASK CPU

    HDFS USA

    HBASE USA

    SOLR / … USA

    The big data technology zoo can be managed with assumptions on a middle-ware technology There will not be “a” (as in one) solution Solutions will depend on use-cases and technologies used

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    Prototype Development Conclusion: Performance and Scalability

    14

    SPARK

    Database ODS Server

    YARN

    TASK CPU

    TASK CPU

    TASK CPU

    TASK CPU

    HDFS USA

    HBASE USA

    SOLR / … USA

    e.g. Avalon Distributor (horizontal and vertical) Indexer (Notification Server and ODS API Security!)

    This a cluster. Get more clusters.

    Ramp up your cluster.

    Enhance disk space. Index / Pre‐Process for On‐Demand Performance (with  technologies wanted)

    Elastic Search?

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    15

    Performance und Scalability Topics

    1. Why Big Data? 2. General Overview 3. HighQSoft Approach 4. Summary

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    16

    HighQSoft Approach General Ideas

    Import data from various formats (MDF, DBC, …) and from distributed sites Provide interfaces to typical data analytics and visualization tools Support state of the art security (today ODS has own role models etc. plus LDAP) Migration ODS 5.3 to a big data solution (for the client that shall be transparent)

    Understand your project (analyze measure data)  status quo Understand your organization (from measure events to ODS “project” data)  big data challenge Sizes: #channels ~ 105, #measurements/day ~ 105

    Aliases for channel names etc. (v_fz vfz -> v_veh velocity)

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    17

    HighQSoft Approach Solution Architecture

    Avalon ODS Server

    ATHOS

    Driver

    CORBA / ODS 6.0

    Bi g  Da

    ta  C lu st er  /  Pa rt ne

    r

    Web Service

    SPARK Future DB

    MQ | SM | .

    V | T | L | …

    … ?

    Hi gh Q So ft

    ODS Server remains as ODS gateway / security

    Driver is independent to partner / project and connects to big data Use-Case 1: write Use-Case 2: read Use-Case 2: stream (read)

    Cluster is set up by the partner / customer due to the use-case

    HighQSoft requirements are to be implemented. Web Service is independent to partner / project, but contains specific “jobs” (see use-case)

    Big Data contains Measured Values and Quantities and Units (utilized)

    The data format is to be defined (standardized?)

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    18

    Bi g  Da

    ta  C lu st er  /  Pa rt ne

    r

    Web Service

    SPARK Future DB

    MQ | SM | .

    V | T | L | …

    … ?

    Specific File Formats: When data is itemized files

    need to be parsed for import (big data import, notification to ODS)

    Require to be (partly) generated for file based third party tool

    HighQSoft Approach Solution Architecture

    File XY Generator

    File XY Parser / Importer

  • HighQSoft GmbH | www.highqsoft.de | 22.12.2014

    19

    Thanks