Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka,...

46
Managing Big Data using New Innovations with HPCC Systems Bob Foreman – Senior Software Engineer/ECL Instructor Twitter: #ATO2017 #HPCCSystems

Transcript of Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka,...

Page 1: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Managing Big Data using New Innovations with HPCC SystemsBob Foreman – Senior Software Engineer/ECL Instructor

Twitter: #ATO2017 #HPCCSystems

Page 2: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Welcome!

• HPCC Systems has been open source since June 2011• Although the base technology has remained consistent, the last 6 years has

seen many new support technologies unfold.

• These technologies have enhanced and extended the base technology, and HPCC Systems remains ahead of the curve with these new innovations.

• We will look at many of them in this presentation.

2 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 3: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Today’s Agenda:

• Quick intro on the platform and history before 2011• Open Source in 2011• Machine Learning February 2012 - 2017, many changes• Continuous updates and improvements in speed and compiler power.• Changes in the ECL Watch (Version 5 and 6)• ECL Playground• New services, like WSSQL• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.)• EMBED support - new feature for EMBED• KEL lite• Looking ahead to Version 7

3 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 4: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

History of HPCC Systems(High Performance Computing Cluster)

4 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Open sourcing a long established big data strategy

Page 5: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Why Does HPCC Systems Exist?

• It was NOT developed with the idea of selling the technology to anybody else!

• It was all created only to solve some of the data-handling problems that we encountered as we were developing our products.

5 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 6: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

The Result Of All That Development?

HPCC SystemsA single, fully-integrated platform

supporting the entire life cycle of Big Data product development:

• Raw Data Ingest – Thor• Data Transformation to Product – Thor• End-user Query Development – Thor• End-user Query Delivery – ROXIE

6 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 7: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

The Complete Big Data Value Chain

• Collection – collecting structured, unstructured and semi-structured data

• Ingestion – consuming vast amounts of data including extraction, transforming and loading

• Discovery & Cleansing - clean up, formatting and statistical analysis of the data

• Integration – linking, indexing and data fusion

• Analysis – statistics and machine learning

• Delivery – querying, visualization, and redundancy, enterprise-class availability

7 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Collection Ingestion Discovery & Cleansing Integration Analysis Delivery

Page 8: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

HPCC Systems Platform

There are two types of clusters in HPCC Systems:

• Data Refinery (THOR) – Processes every one of billions of records in order to create billions of "improved" records – runs one job at a time.

• Rapid Data Delivery Engine (ROXIE) – Searches quickly for a particular record or set of records – handles thousands of concurrent transactions per second.

• Both are tightly coupled to the infrastructure that supports their operation, and the ECL programming language that defines the work done on them.

8 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 9: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Data Flow Oriented Big Data Platform

#ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems9

ESPMiddleware

Services

Raw data from several sources

Batc

h Su

bscr

iber

sPo

rtal

Thor• Shared Nothing MPP Architecture• Commodity Hardware• Batch ETL and Analytics

ECLBatch requests for

scoring and analytics • Easy to use • Implicitly Parallel • Compiles to C++

ROXIE• Shared Nothing MPP Architecture• Commodity Hardware• Real-time Indexed Based Query• Low Latency, Highly Concurrent

and Highly Redundant

Batch ProcessedData

Page 10: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Batc

h Su

bscr

iber

s

Thor

Thor – The Batch Processing Analytics Engine

#ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems10

Raw data from

several sources

Repo

rtin

g

ECLBatch reporting requests

ROXIE

Batch reporting requests

Massively Parallel Extract Transform and Load (ETL) engine• Built from the ground up as a parallel data

environment. • Leverages inexpensive locally attached storage. • Doesn’t require a SAN infrastructure.

Enables data integration on a scale not previously available• Current LexisNexis person data build process

generates 350 billion intermediate results at peak.

Suitable for:• Massive joins/merges• Massive sorts and transformations• Any N2 problem

Page 11: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Batc

h Su

bscr

iber

s

Thor

ROXIE – The Real-Time Analytics Delivery Engine

#ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems11

Raw data from

several sources

Repo

rtin

g

ECLBatch reporting requests

ROXIE

Batch reporting requests

A massively parallel, high throughput, structured query response engine.

Ultra fast due to its read-only nature.

Allows indices to be built onto data for efficient multi-user retrieval of data.

Suitable for:• Volumes of structured queries• Full text ranked Boolean search

Page 12: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

HPCC Systems Hardware

• Clusters of commercial off-the-shelf components (COTS). Components are ideally homogeneous (all processing/disk storage components same) and the system is tightly coupled.

• Nodes are managed en masse instead of individually, which allows coordinated processing like global sorts (unlike Grid systems).

12 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 13: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Thor Cluster• Brute force: Thor operates on massive amounts of data where datasets typically

contain billions of records• Open Data Model: The data model is defined by the user, not constrained by the

limitations of a strict key-value paradigm• Scalable: Horizontally linear scalability provides room to accommodate future data

and performance growth• Truly parallel: Datagraph Nodes can be processed in parallel as data seamlessly

flows through them, effectively avoiding the well-known “long tail problem”, resulting in higher and predictable performance.

• Powerful optimizer: The HPCC Systems optimizer ensures submitted ECL code executes at the maximum possible speed for the underlying hardware. Advanced techniques such as lazy execution and code reordering are thoroughly utilized to maximize performance

13 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 14: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

ROXIE Cluster

• Low latency: Data queries typically complete sub-second

• Not a key-value store: ROXIE is not limited by the constraints of key-value data stores, allowing for complex queries, multi-key retrieval, fuzzy matching and more

• Highly available: ROXIE operates in critical environments under the most rigorous service level requirements

• Scalable: Horizontally linear scalability provides room to accommodate future data and performance growth

• Highly concurrent: In a typical environment, thousands of concurrent clients can be simultaneously executing transactions on the same ROXIE system

• Redundant: A shared-nothing architecture with no single point of failure provides extreme fault tolerance

14 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 15: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

HPCC Systems Platform

• Batteries included: All components create a consistent and homogeneous platform • Over 15 years of experience: The HPCC Systems platform is the technology underpinning

LexisNexis data offerings – its development began in 1999• Few moving parts: HPCC Systems is an integrated solution extending across the entire

data lifecycle, from data ingest and transformation to data delivery – no third party tools needed

• Multiple data formats: Supported out of the box, including fixed and variable length, delimited records, and XML

• ECL inside: One language to describe both: the data transformations in Thor and data delivery strategies in ROXIE. Solutions to complex data problems are expressed easily and directly in terms of high level ECL primitives.

• Consistent tools: Thor and ROXIE share the same set of tools, which provides consistency across the platform.

15 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 16: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Data on HPCC Systems

• Open Data Model: The data model is defined by the user, as standard files, records, and fields (tables, rows, and columns)

• Simple: Solutions to complex data problems can be expressed easily and directly in terms of high level ECL primitives

• Implicitly parallel: Data is always in distributed datasets whose parts are managed by the DFU, eliminating the need for programmers to manage the complexity of working with distributed datasets

16 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 17: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Data on HPCC Systems

• Data is stored in ISAM Files• Native support for:

• Flat files, with fixed or variable-length records• CSV-type files (any delimiters may be used)• XML datasets• New JSON format support

• Each Record is always whole and complete on a single node• A Record may have as many fields as needed• Indexes are always LZW compressed and may contain “payload” fields in

addition to search terms

17 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 18: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

What is ECL? (Enterprise Control Language)

• Declarative programming language:“Describes what needs to be done, not how to do it”

• Powerful: Unlike Java, high level primitives such as JOIN, TRANSFORM, PROJECT, SORT, DISTRIBUTE, MAP, etc. are available. Higher level code means fewer programmers and shorter time to deliver complete projects

• Extensible: As new definitions are created, they become primitives that other programmers can use

• Implicitly parallel: Parallelism is built into the underlying platform. The programmer need not be concerned with it

• Maintainable: A High level programming language, no side effects and definition encapsulation provide for more succinct, reliable and easier to troubleshoot code

• Complete: Unlike Pig and Hive, ECL provides for a complete programming paradigm.

• Homogeneous: One language to express data algorithms across the entire HPCC Systems platform, including data ETL and delivery.

18 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 19: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Batc

h Su

bscr

iber

s

Thor

ECL – The Data Flow Oriented Programming Language

#ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems19

Raw data from

several sources

Repo

rtin

g

ECLBatch reporting requests

ROXIE

Batch reporting requests

• An easy to use, data-centric programming language optimized for large-scale data management and query processing

• Highly efficient — automatically distributes workload across all nodes

• 80% more efficient than C++, Java and SQL —1/3 reduction in programmer time to maintain/enhance existing applications

• Benchmark against SQL (5 times more efficient)for code generation

• Automatic parallelization and synchronization of sequential algorithms for parallel and distributed processing

• Large library of built-in modules to handle common data manipulation tasks

Declarative programming language … powerful, extensible, implicitly parallel, maintainable, complete and homogeneous

Page 20: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Machine Learning

20

Page 21: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Machine Learning and HPCC Systems

• The HPCC Machine Learning Library contains an extensible collection of machine learning routines which are easy and efficient to use and are designed to execute in parallel across a cluster.

• In 2012 the first set of modules were released:o Associations (ML.Associate)o Classify (ML.Classify)o Cluster (ML.Cluster)o Correlations (ML.Correlate)o Discretize (ML.Discretize)o Distribution (ML.Distribution)o Field Aggregates (ML.FieldAggregates)o Regression (ML.Regression)o Visualization (ML.VL)

• https://hpccsystems.com/download/free-modules/ecl-ml

21 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 22: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Machine Learning and HPCC Systems

• In 2017, there are several new ML algorithms,some already implemented and others under development.

• These algorithms now use the ECL bundle technology.o PBBlas (parallel block basic linear algebra subprograms)o Time Series (TS)o Neural Networkso Deep Learningo Ensembleo NFold Cross Validationo Population Estimateo LDA (Linear Discriminant Analysis) o LSA (Latent Semantic Analysis)o StepwiseLogistico SVM (Support Vector Machine)

• https://github.com/hpcc-systems/ecl-ml

22 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 23: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Internal Updates and Improvements

23

Page 24: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

HPCC Version 6.x Internal Improvements • Virtual Slave THOR (better sharing of resources – e.g. RAM)• Parallel Activity Execution (taking advantage of multiple CPU cores)• Affinity support in Thor (Binding a slave process to a single CPU socket)• Optimized merge sort for large number of cores• LZ4 compression for temporary files• Refresh Boolean option on persist• Parallel child query execution in Thor• Memory management improvements

24 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

• Lookup JOINS in child queries• Compiler optimization• Improved INDEX reads on THOR and ROXIE• Enhanced Performance Test Suite

References:https://hpccsystems.com/resources/blog/lchapman/hpcc-systems-60x-feature-highlights-part-1https://hpccsystems.com/resources/blog/lchapman/hpcc-systems-60x-feature-highlights-part-2https://hpccsystems.com/resources/blog/lchapman/hpcc-systems-62x-here-whats-it-youhttps://hpccsystems.com/blog/performance_improvements_640

Page 25: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

ECL Watch (Version 5 and 6)

25

Page 26: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

ECL Watch

• Long awaited face lift in Version 5.0• Upgrade was completed in Version 6.0 with even more features• Ability to spray multiple files of same type with one click.• New File Uploader• Hex Previewer• Enhanced filtering throughout• Improved Query Viewer, including Package Maps• New Plug-in interface• Improved Workunit Graphs • Built in Visualization

References:

https://www.youtube.com/watch?v=fupH_to2i84#action=share

https://www.youtube.com/watch?v=wm4xtNsR4bA#action=share

26 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 27: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

ECL Playground

27

Page 28: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

ECL Playground

• New ESP web service

References:

http://cdn.hpccsystems.com/releases/CE-Candidate-6.4.2/docs/ECL_Playground-6.4.2-1.pdf

http://cdn.hpccsystems.com/podcasts/2012_0904_v1_ECL_Playground.mp3

28 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 29: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

WSSQL

29

Page 30: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

WSSQL• add-on service that provides an SQL interface into HPCC Systems

• Submit SQL queries directly to HPCC via SOAP

• Access HPCC data files and Published queries

• Analyze HPCC data using familiar SQL syntax

• Supports SQL SELECT or CALL syntax • Access HPCC data files as DB Tables• Access published queries as DB Stored Procedures

• Supports SQL Create and Load Syntax

• Harnesses the full power of HPCC under the covers • Submitted SQL request generates ECL code which is submitted, compiled, and executed on your target cluster• Automatic Index fetching capabilities for quicker data fetches

• Creates entry-point for programmatic data access

• Leverage HPCC data without need to learn and write ECL! • Opens the door for non ECL programmers to access HPCC data.

References:

http://cdn.hpccsystems.com/releases/CE-Candidate-6.4.0/docs/WsSQL_ESP_Web_Service_Users_Guide-6.4.0-1.pdf

https://hpccsystems.com/download/free-modules/WSSQL

30 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 31: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

31 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 32: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

32 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 33: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Plugins!

33

Page 34: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Library and Datastore PlugIns

• New Plugin interface with ECL Watch

• Built ins (Debug, File Services)

• Audit and Logging

• dMetaphone (double metaphone)

• Apache Kafka

• Security Manager

• Redis

• Memcached

34 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

References:https://github.com/hpcc-systems/HPCC-Platform/tree/master/plugins

Page 35: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Embedded Language PlugIns• C++

• R Integration

• Couchbase

References:https://hpccsystems.com/resources/blog/lchapman/using-your-favorite-language-or-data-source-hpcc-systemshttps://hpccsystems.com/resources/blog/richardkchapman/projecting-fields-embedsUse and abuse of the EMBED feature: https://hpccsystems.com/bb/viewtopic.php?f=41&t=1509

• Java

• JavaScript

• MySQL

• Python/Python 3

• SQLite3

• Cassandra

• AWS SQS (Simple Queue Service)

35

Page 36: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Embedded Language PlugIns

36 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 37: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

New Horizons - Working with TensorFlow

37

Page 38: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Working with TensorFlow

• Wonderful blog written by Richard Chapman

• TensorFlow is a new open-source program from Google

• Performs linear algebra operations on tensors (matrices) and connects multiple operations together.

• Particularly suited for machine learning applications and large datasets

• Works with HPCC 6.2 and greater versions

• Implemented in ECL using Python EMBED

• Shows how a TensorFlow model could be used inside an ECL workflow!

• This test resulted in enhanced Python plug-in capabilities.

38 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

References:https://hpccsystems.com/resources/blog/richardkchapman/embedding-tensorflow-operations-eclhttps://www.tensorflow.org/

Page 39: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

KEL (Knowledge Engineering Language)

39

Page 40: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Knowledge Engineering Language• Designed for Data Modeling

• KEL expands the ECL specification of data flows and algorithms.

• Presumes that the user wants control over:

• the logical data model

• the analytic logic

• the mathematics

• ENTITY, MODEL, and ASSOCIATION

• Data Mapping (USE)

• Logic (GLOBAL)

• OUTPUT or QUERY

References:https://hpccsystems.com/download/free-modules/kel-lite

#ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems40

Page 41: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Sample KEL:

#ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems41

Page 42: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Coming Soon…HPCC Version 7

42

Page 43: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Coming soon in 2018…• Ease of use – ECL Watch Dashboards, session management, query access• Reliability and Stability• Machine Learning• Security – ROXIE SSL, Encryption in Transit, Restricted file access, dropzone whitelists• Interoperability (Spark, UnicodeLib, R, Tensorflow)• Dali Replacement for DFS• Opportunistic Improvements• Text Search, XML Improvements• Multi-core support• Extended Built-In Visualization • Cloud/Hive 360 Support

And contributions and suggestions from YOU !!!

References:https://track.hpccsystems.com/secure/Dashboard.jspahttps://hpccsystems.com/community/how-to-contribute

#ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems43

Page 44: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Getting Started

Install:

1. Oracle’s VirtualBox:https://www.virtualbox.org/wiki/Downloads

2. ECL IDE and Documentation:https://hpccsystems.com/download/developer-tools/ecl-idehttps://hpccsystems.com/download/documentation

44 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 45: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

Getting Started

Run:1. Launch your VM player.

2. Import the HPCC Virtual Machine .ova file:http://hpccsystems.com/download/hpcc-vm-image

3. Note the IP next to the IP Address: prompt at the top of the VM.

This IP address is the key to allowing the HPCC Systems client tools to access the environment.

45 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Page 46: Managing Big Data using New Innovations with …...• Plugin support (Ganglia, Nagios, Kafka, Security Manager, etc.) • EMBED support - new feature for EMBED • KEL lite • Looking

That’s All Folks!

And there’s so much more to learn!!!Thanks for Attending!

46 #ATO2017 #HPCCSystems Managing Big Data using New Innovations with HPCC Systems

Email me! [email protected] https://hpccsystems.com/community/events/All-Things-Open-2017