aFlux version 0.0.3.0 -SNAPSHOT Documentation · antlr4 plugin mention ... Job get All Jobs /jobs...

aFlux version 0.0.3.0 -SNAPSHOT DocumentationDate: May 31, 2017

Revision Date Author Description

0.0.1 May 31, 2017 Tanmaya Mahapatra Document Creation: Functionality / WalkThrough / Build and Deploy aFlux / How to Create a Plugin / Code Summary

0.0.1.1 June 3,2017 Tanmaya Mahapatra Add Creating Plugin in IntellijIdea

0.0.1.2 July 31,2017 Tanmaya Mahapatra Pig Script Flow Example

0.0.1.3 August 7,2017 Tanmaya Mahapatra Hive Flow Example

0.0.1.4 Auguts 14,2017 Tanmaya Mahapatra Common Analytics Example

0.0.2.0 August 17,2017 Tanmaya Mahapatra Common Analytics Language (In Code Summary Section)

0.0.2.1 August 17,2017 Tanmaya Mahapatra Common Analytics Example update

0.0.2.2 August 17,2017 Tanmaya Mahapatra Added Hive Samples to CA-> Hive and mvn antlr4 plugin mention

0.0.2.3 August 17,2017 Tanmaya Mahapatra Add Subflow Section

0.0.2.4 August 18,2017 Tanmaya Mahapatra How Pig Flows Work / How Pig Executor Works

0.0.2.5 August 21,2017 Tanmaya Mahapatra Add Local Subflow section

0.0.3.0 May 2,2018 Tanmaya Mahapatra Frontend Description

FunctionalityList of use casesTopic Function Url Controller User

ActionDescription

Setings read settings

/settings FlowSettingsController

automatic read settings

FlowElementType refresh Tools

/flowElementType FluxElementTypeController

automatic on refresh toolbar / activate - deactivate plugins / remove plugin

FlowElementType /elementTypes/{name} FluxElementTypeController

not used

FlowElementType /elementBaseProperties FluxElementTypeController

automatic showing element properties / creating new element

Job get All Jobs /jobs FluxJobController Open Job Dialog

Job Open a Job /jobs/get/{id} FluxJobController Open a Job : Menu : Job / Open : Toolbar - folder icon

get a specific job

Job Save a Job /jobs/save FluxJobController Menu: Job / Save - Toolbar save icon

persist job information

Job Rename Job /jobs/saveAs FluxJobController Menu: Job / Save as

persist job selecting another name

Job Delete Job /job/remove/{jobId} FluxJobController Job Selector Screen accesed via Job Open / press remove icon

Removes information of job from application (it cannot be recovered)

Job New Menu: Job / New

Creates a new job in frontend. To be persisted should be saved

Job Rename Job Menu: Job / Rename

Renames the job. To be persisted should be save. Job Rename followed by Save is equivalent to Job / Save as

Activity Add Press + sign on toolbar and set a name

A new tab (activity) is created for the current job

Activity Remove Press - on toolbar

The selected tab (Activity) is removed if there is more than one activity in the current job

Plugin get all available plugins

/plugins FluxPluginController Upload a plugin (+ sign below left toolbar) / Menu Plugin Option

Show list of plugins to manage them

Plugin Activate Plugin

/plugin/activate/{id} FluxPluginController Press activate in Manage Plugin Dialog

Load classes from selected plugin to make related tools available if there is no conflict with current tools (Each tool is identified by its complete class name) cannot be present 2 implementations of the same tool. A deactivated tool can be reactivated only after restarting the application

Function Url Controller User Action

DescriptionTopic

Plugin Deactivate Plugin

/plugin/deactivate/{id} FluxPluginController Press Deactivate in Manage Plugin Dialog

Deactivates the selected plugin. All related tools are removed from left toolbar. To reactivate the plugin application should be restarted

Plugin Remove plugin

/plugin/remove/{id} FluxPluginController Press remove icon in Manage Plugin Dialog

Deactivates the plugin and erases the uploaded related jar. The plugin will be erased definitively

Plugin Upload plugin

/plugin/upload FluxPluginController + Sign below left toolbar

Uploads a plugin. It upload the jar file associated to a plugin . It will be in deactivated state initially. Use Plugin Manager menu option to activate it.

Output Show execution output

/showOutput ShowOutputController

automatically generated when run an activity or a job

Shows the output generated on running elementsIn the footer of the screen is shown the output of the running processes generated by the sentence sendMessage in the implementation of tools

Output Change size Drag splitter bar

changes the size of output footer panel

Output Clear Output Clear Output icon in toolbar (X and minus inside circle)

Clears the execution output

DescriptionTopic

Output Refresh Output

/showOutput ShowOutputController

Press Refresh icon

Ask the server to show the last generated output. No pressing refresh this operation is run periodically

Running Processes Show running processes

/getEnvironment/status ShowEnvironmentStatusController

Click on canvas on any region where there are not any element or connector

Shows list of running processes on right pane (Properties Panel)

Running Processes Show running processes

/running RunFluxController not being used

Execution Run activity /run/{jobName}/{activityName}

RunFluxController Play button in toolbar

Runs the current activity- The output is shown in footer and during execution the selected job/activity is shown in Properties Panel (if there is no other element selected)

Execution Run Job /runAll/{jobName} RunFluxController Run Job Icon in toolbar

Run all activities of current job / Activities are added to running processes during execution

Execution Stop Activity /stop/{jobName}/{activityName}

RunFluxController Stop button in toolbar

Stop activity execution if it is running

Execution Stop Job /stopAll/{jobName} RunFluxController Stop Job icon in toolbar

Stop all running activities of current job

Edit Activity Add Element

Drag tool from left toolbar to canvas

Add an element to current activity

DescriptionTopic

Edit Activity Add Connector

Drag from output (right) connector of an element to an input (left) connector of another element in canvas

Add a connector between 2 elements

Edit Activity Delete element

Select an element in canvas and press delete icon in toolbar

Deletes the selected element in canvas

Edit Activity Delete connector

Select a connector in canvas and press delete icon in toolbar

Deletes the selected connector in canvas

Edit Activity Edit Element Properties

Select an element in canvas and modify data shown in Properties (right) pane

Edit custom properties of an element

DescriptionTopic

WalkthroughThis section shows as example some operations on the applicationOpen the application loading http://localhost:8080 on browser

(NOTE: The samples are shown using localhost:3000 running the process in nodes server)To run the app from a browser accessing a nodes server is recommended to use CORS Toggle

extension in Google Chrome

Add a plugin pressing + below left toolbar

Press: Upload Plugin FileSelect a jar file containing the tools implementationPress Accept

Press the activate button to activate the tools associated to the plugin

The list of tools are shown in left panel

Create a new Activity pressing + sign in toolbar and set the name of the new tab

and set the name of the new activity (e.g. registerTwitter)

Components Loaded

Activity Creation

Select activity-1 pressing on it and press - button to remove itAnswer Yes to Are you Sure Question

The selected activity is removed

Drag tools to the canvas to define a flowGenerate connectors dragging from output interfaces of elements to input interfaces of other elements

Select individual elements and set the configuration properties to do the wanted task

Select the element and edit the properties

Save the job in this using the option Job / Save As and setting a name

Drag the splitter bar to increase the output size

Unselect the selected element in canvas clicking in a blank areas (to view Running Processes at right)

Run the process to view the output in footer

During the execution can be viewed the list of running activities in right side panel

Subflows in aFluxSubflows in aFlux are flows that can be used as a tool in other activities.

The way to define a sub flow is to setting to true the property sub flow of an activity.

User can have access to activity properties clicking in any empty area of the flux canvas on any activity.

When saving a Job that has at least one activity marked as subflow, it will be added a new tool representing that activity in the left tool pane area.

The subflow tools as other tools can be dragged into the flux canvas and used in the same way.The execution of the sub flow box is the execution of the internal defined flows that makes it.

When a subflow tool is added to the canvas it is also added a new activity to edit the components of the added tool.

The added tab is used to edit the sub flow components in the current job. As the sub flow is an instance of the current job it is independent on the sub flow original definitions. Editing a sub flow in a job does not affect the original definition.

It can be added more tools to the canvas in the new job and also to the subflows.

In the example the sub flow is formed by 2 wait tools wait : 2500 mswait: 1500 ms

After adding the tool Wait2515Job:wait2515 to the canvas to the new job also are added to waits before and after the subflow tool.

wait 1000 mswait 2000 ms

When running the flow will execute the following actions

wait 1000 ms

subflow : wait 2500 ms

wait 1500 ms

wait 2000 ms

as shown in the image.

Subflow tools also has an async output connector that allows launch the next activity as soon as sub flow tool starts.

Local SubFlows

Local SubFlows are sub flows that are used in a Job Scope. Subflows defined as local are not present in the left side toolbar to be used in other jobs. The main goal of local sub flow is to simplify a graph dividing it definition of a complex flow. The local sub flow definition will not be reused by more than one job.

To create a Local SubFlow select the local sub flow tool to be added to the canvas of an activity in any job.

Drag it to the canvas

When local sub flow Tool is dropped in the canvas it is also added a new tab in activities pane with the same name of the sub flow (like global sub flows) . Local sub flows will not be present in left toolbar after saving the job.

In the properties Panel the node element name can be changed and automatically will be changed the tab name

The definition of the sub flow should be defined in the new created tab. When execution the sub flow node will be executed the provided definition.

How to Build and Deploy aFluxInitially aFlux is composed of 4 main projects

The projects are:- aFlux: java project - Main Application- aflux-tool-base: java project - Base library- aflux-tool-mainplugins: java project Plugin library with a set of base tools- nodejs/async-flows: Javascript project: Application frontend

Build process1.- Build frontend1.1.- In nodejs/async-flows run > npm run build

It generates static folder inside build folder

2.- copy static folder to aFlux/src/main/resources and rename javascript files to main.js and main.css

2.- Build Backend:2.1.- Run mvn clean install in aflux-tool-base project2.2.- Run mvn clean install in aflux-tool-mainplugins project2.3.- Run mvn clean install

in aFlux

3.- Start mongo db database>mongod

(To initiate application from scratch delete database)db.flowSetting.drop()db.flowElementType.drop()db.flowPlugin.drop()db.flowJob.drop()

4.- run application

in aflux run

java -jar aflux-0.0.1-SNAPSHOT.jar

5.- Access application from a browser loadinghttp://localhost:8080

6.- The app initially is started with no tools. To add tools upload and activate the plugin File target/aflux-tool-mainplugins-0.0.1-SNAPSHOT-plugin-jar-with-dependencies.jar from project aflux-tool-mainplugins

Option B Run nodejs frontend1.- Build application2.- Follow Steps to deploy app except step 2 (copy static folder)3.- in nodejs project run >npm start4.- Access application from a browser loadinghttp://localhost:3000

NOTE: In order to access from nodejs port 30 in Google Chrome install extension CORS Toogle and set on

How to Create a Plugin A plugin is a set of tools that can be used in aFlux

Create a Maven Java Projectpom.xml

In dependencies section add the dependency

<dependency><groupId>de.tum.in.aflux</groupId><artifactId>aflux-tool-base</artifactId><version>0.0.1-SNAPSHOT</version>

</dependency>

use the jar aflux-tool-base.jar

In plugin section add

<plugin> <artifactId>maven-assembly-plugin</artifactId> <version>3.0.0</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> <descriptors> <descriptor>assembly/plugin-assembly.xml</descriptor> </descriptors> </configuration> <executions> <execution> <id>make-assembly</id>  <phase>package</phase>  <goals> <goal>single</goal> </goals> </execution> </executions> </plugin>

</plugins></build>

In the folder asembly add the file plugin-assembly.xml

with the following content

<id>plugin-jar-with-dependencies</id> <formats> <format>jar</format> </formats> <includeBaseDirectory>false</includeBaseDirectory>

<dependencySets> <dependencySet> <outputDirectory>/</outputDirectory> <useProjectArtifact>true</useProjectArtifact> <unpack>true</unpack> <scope>runtime</scope> <excludes> <exclude>junit:junit</exclude> <exclude>de.tum.in.aflux:aflux-tool-base</exclude> <exclude>com.typesafe.akka:akka-stream_2.11</exclude> <exclude>org.scala-lang:scala-library</exclude> <exclude>com.typesafe.akka:akka-actor_2.11</exclude> <exclude>com.typesafe:config</exclude> <exclude>org.scala-lang.modules:scala-java8-compat_2.11</exclude> <exclude>org.reactivestreams:reactive-streams</exclude> <exclude>com.typesafe:ssl-config-core_2.11</exclude> <exclude>org.scala-lang.modules:scala-parser-combinators_2.11</exclude> <exclude>org.springframework:spring-context</exclude> <exclude>org.springframework:spring-aop</exclude> <exclude>org.springframework:spring-beans</exclude> <exclude>org.springframework:spring-core</exclude> <exclude>commons-logging:commons-logging</exclude> <exclude>org.springframework:spring-expression</exclude> <exclude>org.slf4j:slf4j-api</exclude>

<exclude>org.mongodb:mongodb-driver</exclude>

<exclude>org.mongodb:bson</exclude> <exclude>org.mongodb:mongodb-driver-core</exclude>

</excludes> </dependencySet>

</dependencySets>

</assembly>

This plugin configuration makes mvn clean installgenerates the resulting jar in target

<plugin name project>—<version>-plugin-jar-with-dependencies.jar

that should be uploaded in aFlux

The jar contains all the classes to run the plugin excluding those base classes present in aflux to avoid conflicts

Implement the toolsIn each plugin can be implemented more than one tool.

To implement each tool is needed to create at least 2 classes

- Executor Class must extend de.tum.in.aflux.tools.core.AbstractMainExecutor- Actor: Class must extend de.tum.in.aflux.tools.core.AbstractAFluxActor

It can be added more classes and dependencies as needed to execute the desired task

Executor: Is the class that contains information about the tool as a component that interacts with aFlux and can be used by aFlux on its environment. Executor is the glue between aFlux and the Actor

Actor: Is the class that contains the implementation of what should execute every time aFlux run the element. The task in implemented in the method runCore of this class.

Typical parts of an executor

As a sample is shown WriteMongoDB implementation to describe its main sections

- Definition- extends AbstractMainExecutor

- NAME is not mandatory. Only to have a visual description of the tool- connectionPropertes

- An array of ToolProperty to define a set of custom properties can be edited for each instance of the tool and can be used (read or write) during execution

- This properties on runtime can be accessed using this.getPropertyValues[index]- This properties are shown in the Properties Editor Pane in aFlux and user can set values for

each - This values are added to the base set of values all elements have (name / color / dispatcher /

width / mailbox)- The properties are setted passing as last parameter on constructor

- Constructor with no parameters- Establish the constructor for the tool

The constructor of the superclass uses the following data

public AbstractMainExecutor(String name,String actorClassName, String className,int inputInterfaces,int outputInterfaces,int launchedBy,Boolean async, ToolProperty[] properties)

Constructor parameters

name: Name of the tool to be shown in tool gallery

actorClassName: Complete class name of related actor that contains the implementation of the task

className: the class name of this class (the executor class)

inputInterfaces: Number of input interfaces (Includes data and no data interfaces)Input interfaces are represented as connectors that can receive data (or signal) from

another element. “no data” interfaces refers to a precedence relation.There are no difference between data or no data connectors. The real difference depends

on the implementation. Usually “no data” input means that in the implementation of runCore in the actor the task can be executed with no importance of the message that has received.

In case the implementation doesn’t take data from the received message can be said that the element can be launched by SIGNAL.

In this version …. inputInterfaces should be always 1

outputInterfaces: Number of output data interfaces. This number does not include the async output

connector. Actors can send data to other actors calling setOutput(index,message) having index between 1 and outputInterfaces

launcheBy: Indicates if this tool will be launched by receiving data or by a start signalPossible values ar LAUNCHED_BY_DATA or LAUNCHED_BY_SIGNALThere are no difference between data or no data connectors. The real difference depends

on the implementation. Usually “no data” input means that in the implementation of runCore in the actor the task can be executed with no importance of the message that has received.

In case the implementation doesn’t take data from the received message can be said that the element can be launched by SIGNAL.

async: indicates if the tool is async capable or notAsync capable elements allows to have a connector that send a simple signal message at

the time the element starts execution. It does not need to specify a setOutput(index, message) to run it.

properties:

Names and initial values of the editable properties in orderSet of properties that allow users to set values for each instance of each element. Each of

these properties will appear in the properties pane to be edited for each elementBeyond these defined properties each element also will have a set of base properties that

can be edited (name / width / color / dispatcher / mailbox)The value the user input for each property can be accessed in runCoreImplementation

calling the method this.getProperties().get(index)

Typical parts of an actor

As a sample is shown WriteMongoDBActor implementation to describe its main sections

- Definition- extends AbstractAFluxActor

- Declare Constructor

Generally will only call its superclass constructor

public WriteMongoDBActor(Long fluxId, FluxEnvironment fluxEnvironment, FluxRunner fluxRunner,Map<String,String> properties) {

super(fluxId, fluxEnvironment, fluxRunner,properties);}

- Implement runCore method

protected void runCore(Object message) throws Exception

RunCore will have the implementation of the task to be executed.Following will be described operations that can be used in the context of aFlux

Operations that can be used inside runCore

this.setOutput(index,message)

Sets a value to a specific connector indicated by index.Possible values for index are 1 to the value settled in outputInterfaces.

message can be any value (Object) that will be send to all elements connected to the indicated output connector

Notice that setOutput can be called in any location of the runCore method or any method called by it. This means that the message can be sent (and other elements can be launched) before finish of execution of runCore

this.sendOutput(String)

Sets a value that will be shown in the output section in the footer

this.getProperties.get(index)

Allows to use values setted by user in properties panel for each element instance.

async connections

Before the begin of runCore() execution the runner will launch any signal message to the elements connected from the async connector

Plugin Installation

To install the plugin the resulting jar must be uploaded.After uploading it (creates a copy in server environment in uploads/<datetime>/<jarname>) it should be activatedOn activation the classes in the plugin JAR are loaded in classLoader of aFlux it there are no errors nor conflictsCannot be activated plugins that contains actor or executor classes already loaded in aFlux

The plugins can be activated / deactivated / removed according to the following state machine description

Generating Plugin in Intellij Idea

The project plugin should be a maven Project

It can be build using the options

View / Tool Windows.. / Maven ProjectsAnd the AccessMaven Projects (section) and run Lifecycle / Install

and the plugin jar is generated in target folder.

Maven Projects in IntelliJ

Pig Script Flow Example As an Example it is presented

Function:Load data from a csv file available in HDFSSpecify to take only a defined number of recordsStores de selected pig data

The elements to build the script are:

Pig LOADPig LIMITPig STOREPig Executeshow value

The parameters of each element are:

Pig LOAD:source data: /user/root/links.csvfunction (optional): PigStorage(‘,')schema(optional): movieId:int,imdbId:int,tmdbId:intalias: PIGLINKS

This Tool generates the Pig Latin sentencePIGLINKS = LOAD ‘/user/root/links.csv’ USING PigStorage(‘,’) AS ‘movieId:int,imdbId:int,tmdbId:int’;

source data refers to an existent HDFS directionalias is the target element where the result will be allocated

Output is a Pig Execution Plan that contributes to generate the final sentences

if function and scheme are specified allow interpret the content as a set of records. If not they are considered a unique record.

PIG LIMITsource data: <empty>limit: 3target alias: PIGLINKSLIMITED

This tool produce the PIG sentencePIGLINKSLIMITED = LIMIT PIGLINKS 3;

In this case… the PIGLINKS element is taken from the last tool that assigns an element.If in the previous path there are more elements it can be specified in alias any used element before declaring it in alias. If the alias used corresponds to the last one it is no need to declare it.

Pig STORE

alias: <empty>directory: /user/root/limitresult9function:

This tool generate the sentenceSTORE PIGLINKSLIMITED INTO ‘/user/root/limitresult9’;

The alias is deduced from the previos tool used (Pig LIMIT)It generate the sentence to store the desired element

Output is a Pig Execution Plan that contributes to generate the final sentences

Pig Execute:script: <empty>

In the script property could be setted s string representing a Script to be executed before the execution plan stablished in the flow.Pig Execute is the tool that sends the previously defined script to the Pig Server and executes it

Pig Execute generates 3 outputs: (In this example are sent to the same show value tool)

1st output: The result : it sends the data stored in ‘/user/root/limitresult9’2nd output: The final script: It outputs the following strings:

PIGLINKS = LOAD ‘/user/root/links.csv’ function= PigStorage(‘,’) schema=‘movieId:int,imdbId:int,tmdbId:int’;PIGLINKSLIMITED = LIMIT PIGLINKS 3;STORE PIGLINKSLIMITED INTO ‘/user/root/limitresult9’;

3rd output: it outputs the representation of the Java Object Execution Plan that has each element and the parameter values.

Hive Flow Example As an Example it is presented

Function:Creates a Hive Table. Load data from a csv file available in HDFS.Selects some records based on a condition Run the sentences and show the results

Hive CREATE TABLEHive LOADHive SELECTHive Executeshow value

Hive CREATE TABLE:temporary: falseexternal: falsename: moviescolumns: movieId int,title String,genres Stringcomment: table for deal with moviespartitioned: <blank>row format: SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ("separatorChar" = ",","quoteChar" = "'","escapeChar" = “\\")format: TEXTFILElocation: <blank>

This Tool generates the Hive sentence:CREATE TABLE IF NOT EXISTS movies

(movieId int,title String,genres String) COMMENT 'table for deal with movies'

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

WITH SERDEPROPERTIES ("separatorChar" = “,","quoteChar" = “'","escapeChar" = "\\")

STORED AS TEXTFILE;

temporary: If true adds the TEMPORARY clause to the sentenceexternal: If true adds the EXTERNAL clause to the sentencename: Name of the Hive table to be createdcolumns: List of comma separated values defining the columns of the table composed by Name Type pairs separated by space. (with no enclosing quotes)comment: Table comment for the table definition (no enclosing quotes)partitioned: String to add a PARTITIONED BY clause (with no enclosing parenthesis)row format: String to specify ROW FORMAT clause. In the example is shown a typical CSV row format.format: STORED AS clause. Allowed Values: TEXTFILE / SEQUENCEFILE / ORC / PARQUET / AVRO / RCFILElocation: hdfs location

Output is a Hive Execution Plan that contributes to generate the final sentences

Hive LOAD:hdfs file: /user/root/movies.csvoverwrite: falsename: moviespartition: <blank>

This Tool generates the Hive sentence:LOAD DATA INPATH '/user/root/movies.csv' INTO TABLE movies

hdfs file: Source file to be loaded. As it is referred to an HDFS resource the specified resource will be “moved” being deleted from the former hdfs locationoverwrite: If it is settled to true then the contents of the target table (or partition) will be deleted and replaced by the files referred to by filepath; otherwise the files referred by filepath will be added to the tablename: Name of the Hive table where data will be insertedpartition: String to declare partition clause.

Hive SELECT:all/distinct: ALLexpressions: movieId,title,genressource list: moviescondition: title > ‘K’

This Tool generates the Hive sentence:SELECT ALL movieId,title,genres FROM movies;, type=QUERY] WHERE title>'K'

all/distinct: ALL or DISTINCT clauseexpressions: List of expressions defining the columns in the resulting query answersource list: Name of the main Hive table to be queriedpartition: String to declare partition clause.group by / having: Strings to be added to GROUP BY and HAVING clauseorder by: Expression to be added to ORDER BY clauselimit: numeric expression to generate a LIMIT clauseunion: indication of a following UNION clause. Considered values are ALL / DISTINCT / <blank> If ALL or DISTINCT is setted it should be followed for a Hive Union Tool

Hive Execute

Hive Execute is the tool that sends the previously defined sentences to the HiveServer and executes it (HiveServer2 should be running)

Hive Execute generates 3 outputs: (In this example are sent to the same show value tool)

1st output: The result : it sends the result of the last SELECT statement.If there are many select statements it outputs only the result of the last one.2nd output: The final list of sentencs: It outputs the following strings:

CREATE TABLE IF NOT EXISTS movies(movieId int,title String,genres String) COMMENT 'table for deal with movies' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ("separatorChar" = ",","quoteChar" = "'","escapeChar" = "\\") STORED AS TEXTFILE;LOAD DATA INPATH '/user/root/movies.csv' INTO TABLE movies;SELECT ALL movieId,title,genres FROM movies WHERE title>’K';

3rd output: it outputs the representation of the Java Object Execution Plan that has each element and the parameter values.

Common Analytics Flow Example As an Example it is presented

Function:Generate a CA Sentence to Load data from cvs file and generates a Table.Generate a CA Sentence to Select records from the TableGenerate a CA Sentence to Show the selected recordsTranslates the generated CA Sentences to Pig or Hive and executes the resulting scriptShow the data , CA Sentences and CA Execution Plan

CA LOADCA SELECTCA SHOWCA Executeshow value

CA LOAD:file name: /user/root/links.csvstructure: movieId:INT,imdbId:INT,tmdbId:INTalias: CALinks3

This Tool generates the CA sentence:

LOAD '/user/root/links.csv' TO CALinks3 STRUCTURE(movieId:INT,imdbId:INT,tmdbId:INT);

file name: The hdfs resource including its path. Its content should be in csv formatstructure: List of pairs <name>:<type> separated by comma. Type could be INT or STRINGalias: Target collection name

Outputs a Common Analytics Execution Plan with the load sentence added (to the received input Common Analytics Execution Plan)

CA SELECT:target alias: SelectedCALinkssource alias: CALinks3columns: *filter: imdbId>1000

This Tool generates the CA sentence:SelectedCALinks = SELECT CALinks3 (*) FILTER (imdbId>1000);

target alias: The collection where the result will be stored.source alias: The source collection namecolumns: List of separated by comma expressions that will be generated. * means all the table columns.filter: condition to filter the source data

Output is a CA Execution Plan that contributes to generate the final sentences

CA SHOW:alias: <empty>

This Tool generates the CA sentence:SHOW SelectedCALinks

alias: The collection to be shown. If it is empty take the alias of the last generated collection in the precedent flow.

Output is a CA Execution Plan that contributes to generate the final sentences

CA Execute:

executor: PIGrun: true

CA Execute translates the previously generated CA sentences to Pig or Hive depending on the value of property executor. If run is set to true, it also ask aflux to execute the resulting sentences

executor: PIG or HIVE. Value to select the executor to which the CA Sentences will be translatedrun: if is set to true runs the resulting script. If not, only generates the CA Plan (output 2 and 3)

Hive Execute generates 3 outputs: (In this example are sent to the same show value tool)

1st output: The result : it sends the result of the last SELECT statement.If there are many select statements it outputs only the result of the last one.

For example . for the current flow . if PIG is selected it generates the following Pig Latin Sentences

CALinks3 = LOAD '/user/root/links.csv' USING PigStorage(',') AS (movieId:int,imdbId:int,tmdbId:int);SelectedCALinks =

FILTER CALinks3 BY imdbId>1000;STORE SelectedCALinks

INTO 'SelectedCALinks_tmp_20170817093935186' USING PigStorage(',');

If HIVE were selected it should generate the following sentences:

CREATE TABLE IF NOT EXISTS CALinks3 (movieId INT,imdbId INT,tmdbId INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ("separatorChar" = ",","quoteChar" = "'","escapeChar" = "\\")

STORED AS TEXTFILELOAD DATA INPATH '/user/root/links.csv' OVERWRITE INTO TABLE CALinks3CREATE TABLE IF NOT EXISTS SelectedCALinks

row format delimited fields terminated by '|' STORED AS RCFile AS SELECT ALL * FROM CALinks3 WHERE imdbId>1000

SELECT ALL * FROM CALinks3 WHERE imdbId>1000SELECT ALL * FROM SelectedCALinks

2nd output: The final list of CA sentences: It outputs the following strings:

LOAD '/user/root/links.csv' TO CALinks3 STRUCTURE(movieId:INT,imdbId:INT,tmdbId:INT); SelectedCALinks = SELECT CALinks3 (*) FILTER (imdbId>1000); SHOW SelectedCALinks;

Pig or Hive translation is available in log if debug is enabled

3rd output: it outputs the representation of the Java Object CA Execution Plan that has each element and the parameter values.

Pig output Example

Hive Output Example

Code Summary

setOutput(index,message) getProperties.get(index)

fluxRunnerImpl.broadcast(…)

sendOutput(message)

Actor Functions: What aspects they represent on the front-end?

Sequence diagram describing sample execution of an activity

The diagram doesn’t show all the calls. It is created in order to identify main interactions

FluxRunnerEnvironment:

It is the main executors. It contains a list of all running activities. Each running activity generate a FluxRunnerImpl instance that is created on starting and destroyed on finishing

FluxRunnerImpl:It builds a map of elements and connections based on the activity is running. Creates each element of the activity (instantiating Executor: AbstractMainExecutor and Actor) It creates a new ActorSystem and sets all actors in the actor SystemThe process starts all elements that has no predecessors and does not need input data

Before starting each element class AbstractAfluxActor send the signal for async outputs (no shown in this graph) The call to async methods is made by calling broadcast with index=-1

After starting them…. each element, while running runCore can call the setOutput method.

On calling setOutputMethod, it calls “broadcast” method that send messages to all connected elements.

When each element finishes running core, AbstractAFluxActor calls the finish method indicating to FluxRunnerImpl that that process has ended.

Every time each element is launched or finished, FluxRunnerImpl notifies that information to the class ExecutionEnvironment (not shown in graph) that takes into account each initiated and finished running processes in order to determine when the entire flux has finished (no running processes)

Frontend Summary

aflux frontend is implemented using React + Redux

See: https://redux.js.org/basics

Main Components in Folders

aFlux Location Description React+Redux Element

index.js Entry point to application main

components Parts of main graphical elements

containers Element or regions that group other elements

functions General used functions - util folder

images Static images

libs Downloaded resources - the original resources do not affect behavior

model Data model - They are the nexus between backend model and graphical elements

reducers Only aFluxMainReducer. . This is a reducer, a pure function with (state, action) => state signature. * It describes how an action transforms the state into the next state.It is called by index.js oncevar store = createStore(mainReducer)

Reducer

redux-actions Each event that happens … generated by the user or by another event or a timeout or any external event generates an action. The action has a unique type… and the set of values that are enough to generate a reaction.Actions are the parameters passed to reducers to change the state

Action

redux-components These are like React Components. They are basically Visual Representation that generates the view. Each element has a body where are defined those properties and functions that should be used They return the typical “render” function of react components. render function is the definition of how the component should be showed. Like HTML tags represent other defined elements with equivalent names.

React Components: https://reactjs.org/tutorial/tutorial.html

redux-containers Containers are correspondent to components. They define the behavior of the events launched on views. Most of them call redux.dispatch… passing the actions as parameters to launch the desired change of state.Components are the views and containers the behavior

Typical React Containers

aFlux Location

Description of the flow

- React reacts to events. - The state is the whole set of values that are used to show the “view” in an instant of time.- View is drawn by react following the definitions of return values of “redux-components” and the values of the current state.- When something happens .. “redux-containers” generates an “action” - The “action” and current “state” are passed as parameters to “redux-reducer” to generate the new “state”- React redraws the view based on the new state and waits for new events.

COMPONENTS

redux-reducers Main element is AsyncFlowsAppReducer. It contains the mapping that takes as parameters “Current state” and “Action” and generates the “New State”… (state, action) => state… After that react redraws the view based on the new state

Redux Reducer

screen Simple Dialog Screens like “Open Jobs”

Description React+Redux Element

aFlux Location

ActivityTitleConnectors

CONTAINERS

FooterContainer

HeaderContainer

ToolbarContainer

REDUX COMPONENTS

Element Description

Flow Set of nodes and connectors with its seated properties values.

FlowActivity Activity is represented by a Tab in the View Panel . It has a name and contains a flow

FlowConnector A link between two nodes

FlowPlugin Information about a plugin. Plugin are set of Node Types that can be loaded and used

FlowElement Node.

FlowElementProperty Property of a node

FlowElementType It is the template of a node. Generally FlowElementType are showed in Left Tool Panel

FlowElementTypeProperty

Property of a FlowElementType

FlowJob Set of Activities

Element Description

ReduxActivityProperties Right panel where user can edit node properties

ReduxActivityTab One only tab in the tab pane that contains the name of the activity and can be clicked to select the activity

ReduxAsyncFlowsApp The whole view

ReduxCanvas WhiteBoard . White part of the screen where nodes and links are drawnCanvas contains the Flows that are drawn

ReduxPropertiesPanel Right Panel to hold properties

ReduxRunningProcessesProperties

Properties Panel content while it is showing the number of processes that are running

ReduxSideBar Left toolbar container

ReduxToolNodeElement Each one of the tools present in the left tool panel

Element

ReduxWorkspace The whole view. The only child of ReduxAsyncFlowsApp

ReduxCanvasNodeElement Container to hold Nodes. It holds ReduxNewNodeElement or ReduxNodeElement. It is the node from the perspective of the canvas (Canvas does not know if it is new or not… it only has a box in a position)

ReduxFlowTabs Tab Panel where Activity tabs are shown

ReduxNewNodeElement Nodes that have not been saved (added recently) Where they are saved they are converted to ReduxCanvasNodeElement

ReduxNodeElement Nodes in the canvas. These represent nodes that were previously saved.

ReduxNodeProperties Properties of a node

DescriptionElement

Common Analytics LanguageIn the following section is the description of Common Analytics Source code solution

Common Analytics Plugin provide a set of tools with reduced and simple elements to run Big Data tasks executing it in defined target Big Data related languages.

First version makes translations to Pig and Hive. In next version will be added Spark.

Main sentences of CA language are the following simple sentences:

<load_sentence>: LOAD <file_name> TO <collection_name> STRUCTURE(<structure_definition_list>);

<show_sentence>: SHOW <collection_name>;

<select_sentence>: <collection_name> = SELECT <collection_name>(<expression_list>)

FILTER ( [condition_list] );

<summarize_sentence>: <collection_name> =

SUMMARIZE <collection_name> ( [summary_expression_list] ) KEYS ( [groupby_list] );

<join_sentence>: <collection_name> = JOIN <collection_name> AND <collection_name>

COLUMNS( <expression_list> ) MATCH ( [matching_list] );

The complete grammar is in aflux-tool-commonanalytics(Location: /src/main/antlr4/CommonAnalytics.g4 using antlr4 grammar notation

There are also available sample sentences in src/main/resources/resources/afluxSentencesSample.gr

LOAD 'data1' TO A STRUCTURE(a1:INT,a2:INT,a3:INT); LOAD 'data1' TO A STRUCTURE(a1:STRING,a2:INT,a3:INT); TMP1 = SELECT T1(col1,col2) FILTER(); TMP1 = SUMMARIZE T1() KEYS(col1,col2); TMP1 = SUMMARIZE T1() KEYS(col1); X = SUMMARIZE A() KEYS(*); TMP1 = SELECT sales(*) FILTER (amount>10 AND region=='US'); X = SELECT A(*) FILTER(f3==3); X = SELECT A(*) FILTER((f1 == 8) OR (NOT (f2+f3 > f1))); X = SELECT A(*) FILTER(); X = SELECT A(a1,a2) FILTER(); B = SUMMARIZE A() KEYS (age); t2 = SUMMARIZE t1(SUM(col2) AS col2sum) KEYS(col1); tmp1 = SELECT t2(col1) FILTER(t2.colsum>10); X = JOIN A AND B COLUMNS (*) MATCH (a.id = b.id); X = JOIN A AND B COLUMNS(*) MATCH(a.id = b.id,a.department= b.department); X = JOIN A AND B COLUMNS(*) MATCH(a.a1=b.b1); TMP1 = SELECT customers(*) FILTER (); TMP1 = SUMMARIZE t1(SUM(col2) AS SUM_COL2) KEYS(col1); TMP2 = SELECT TMP1(col1) FILTER(SUM_COL2>10); LOAD '/user/root/links.csv' TO CALinks2 STRUCTURE(movieId:INT,imdbid:INT,tmdbId:INT); SelectedCALinks = SELECT CALinks2 (*) FILTER (imddbId>1000);

SHOW SelectedCALinks; These sentences can be tested running CommonAnalyticsSample.java in package de.tum.in.aflux.component.commonanalytics.util (to view the Pig and Hive resulting sentences)

There are available 5 main tools

- CommonAnalyticsLoad - CommonAnalyticsSelect - CommonAnalyticsShow - CommonAnalyticsSummarize - CommonAnalyticsJoin

and their corresponding actor classes that are related with each defined sentence

It is also provided a CommonAnalyticsExecute tool that translates the Common Analytics Language to Pig or Hive depending on the setted properties (executor: PIG / HIVE)

How common Analytics Language tools workA common analytics flow generates basically a CommonAnalyticsExecutionPlan (See in code CommonAnalyticsExecutionPlan.java)

The plan it has almost no behavior but only store enough data to represent a list of sentences (each of them called implemented in CommonAnalyticsExecutionStep.java)

Each CommonAnalyticsExecutionStep has a direct relation with a CA Sentence.

aflux-tool-commonanalytics Packages and Classes

Each sentence-type tool add the corresponding sentence to the plan passing it to the next flow.

Each tool receives a plan as input and generates a new plan as output having added the sentence.

Each tool has enough parameters the user can fill to complete the sentence.

The plain text for the generated plan can be obtained calling getScript() method on a CommonAnalyticsExecutionPlan object.

Generally the end tool in a CA Flux should be a CommonAnalyticsExecute tool.

As can be seen in CommonAnalyticsExecuteActor.java depending on the property “executor” the method run core will translate the Script obtained from the plan to a Pig Latin Script or a Hive Script.

The translation is made using a parser (later described in this manual)The translation produce a PigExecutionPlan.java or HiveExecutionPlan.java that are classes provided by aflux in aflux-tool-base.To run the resulting Pig or Hive script CommonAnalyticsExecuteActor calls PigExecutor.java or HiveExecutor.java present in aflux-base-tool.jar.PigExecutor and HiveExecutor are the classes provided by aflux and can be used by any other plugin being built in the future. They use the properties specified in aflux application. Before running PigExecutor and/or HiveExecutor generates a text script based on the plan.

If run property of execute tool is setted to “true” it also run the Pig or Hive Script.

Output of common analytics execute are 31.- The result (data)2.- finalScript (txt in common analytics language)3.- executionPlan (CommonAnalyticsExecutionPlan java object)

How Translation is made

There are 2 main classes that make the translation, CommonAnalyticsPigListener.java for Pig Latin and CommonAnalyticsHiveListener.java for Hive.

These classes take specific actions when entering or exiting each lexical element of the CA Language script (plain text) generating a PigExecutionPlan or HiveExecutionPlan (defined in flux-tool-base) An empty plan is passed as parameter to the parser. After the parser process is executed the plan contains the translated steps (or sentences)

Building and deploying changes on CA Grammar

Parser implementation is present in package de.tum.in.aflux.component.commonanalytics.grammar

These classes are generated automatically by antlr. Developers should no modify these classes.

To change the grammar follow the following steps.

1.- Change CommonAnalytics.g4 file in /src/main/antlr4

The fail contains the formal grammar definition

2.- Run maven clean install

3.- Copy generated classs

Maven clean install generates the Parser classes in /target/generated-sources/antlr4 Generated deliverables are CommonAnalytics.tokens CommonAnalyticsBaseListener.java

CommonAnalyticsLexer.javaCommonAnalyticsListener.javaCommonAnalyticsParser.java

Those clases are generated by the maven plugin antlr4-maven-plugin present in pom.xml

Copy and overwrite these classes to de.tum.in.aflux.component.commonanalytics.grammar

Fix the package declaration for each one and remove all @Override annotations in class CommonAnalyticsBaseListener.java

4.- Change the CommonAnalyticsHiveListener.java and CommonAnalyticsPigListener.java to consider the new grammar elements (if they affect the translation to Pig or Hive)

5.- Optional: Add a sample including new syntactical elements in afluxSentencesSample.gr- And add tests to CommonAnalyticsSample.java- It’s no need to run aflux to test CommonAnalyticsSampel.java . It can be ran as a simple java

5.- Build plugin as usual and import it into flux application

How Pig Flows Work

Pig Tool Actors run as other flux Actor. In the sequence diagram the example shows how does run PIgLoadActor followed by a PigLimitActor

As all actors, FluxRunnerImpl starts launching (sending akka messages) to all SIGNAL TYPE actors that has no predecessors.

The model that involves the complete set of executors are seen by FluxRunnerImpl as AbsractMainExecutor instances. It has access to ActorSystem created previously by FluxRunnerImpl.

In this case, AbstractMainExecutor (related with PigLoadActor) sends a message to akkaSystem. The recipient will be PigLoadActor running its runCore method.

Most Pig tools main task involves the following activities.- Generates a PigExecutionPlan

They receive a PigExecutionPlan as input.If input is null, it creates a new PigExecutionPlanIf input is not null, PigExecutionPlan will be the received one.

- Gets the inputs the user entered caling getProperty(…) depending on each tool.- Builds the Pig Latin Sentence (String) being helped by PigBuilder (for building some

clauses) - Most times creates a PigExecutionStep and add it to the PigExecutionPlan

- setOutput -> set as output the new PigExecutionPlan

The output is used by FluxRunnerImpl to be sent to the following tool (In this case Pig LIMIT) via the akkaSystem.

Pig LIMIT receives the plan (containing the LOAD sentence) and executing a similar process generates the LIMIT sentence and adds it to the Plan that sends to the following Pig Tool

Generally Last Tool should be “Pig Execute” that receives a PigExecutionPlan (like all other pig actors) and is explained below.

How Pig Executor Works

The following Sequence Diagram shows main involved clases in a Pig Execution.

As all Pig Actors “Pig Execute” receives a PigExecutionPlan as input.From the plan it gets the script (in text form) and calls the execute method of PigExecutor class (that is present in aflux independent of any plugin)PigExecutor.java uses the Spring class template (PigOperators) that is available as a resource in FluxEnvironmentImpl class (that can be accesses by any actor)

aFlux version 0.0.3.0 -SNAPSHOT Documentation · antlr4 plugin mention ... Job get All Jobs /jobs...

Documents

Transcript of aFlux version 0.0.3.0 -SNAPSHOT Documentation · antlr4 plugin mention ... Job get All Jobs /jobs...

Der Parser-Generator ANTLR4

Job Scheduler Managed Jobs - Technical Documentation · Job Scheduler Managed Jobs - Technical Documentation 2 Software- und Organisations-Service GmbH March 2009 ContactInformation

JOB INTERVIEWS Introducing yourself for different jobs.

Popeyes Job Application · popeye job application, job application for popeyes,printable job application,restaurant jobs,hourly jobs,fast food jobs,now hiring Created Date:

Job advertisements / postings skills, qualifications jobs titles.

Jobs Fund Briefing – Innovation for job creation.

Job Drain 061118 - Home | Good Jobs First

clueseducation.comclueseducation.com/resources/Level 2_WinterSpring2020.docx · Web viewIdentify common jobs. Match skills to jobs. Read simple job ads. Understand simple job applications.

Ohio Means Jobs - Job Seeker -

Code Generation Cambridge 2013 Introduction to Parsing with ANTLR4

Jobs in Finland - Ejobs - Your Job - Job Board

NOT JUST ANY JOB, GOOD JOBS!

Job Market Intelligence: Cybersecurity Jobs, 2015 report

GOOD JOBS FOR ALL JOBS STRATEGY - oecd.org · The new OECD Jobs Strategy goes beyond job quantity and considers job ... It recognises that flexibility-enhancing policies in product

[Job Profile] SSC-CGL: desk jobs, field jobs, CSS ...examsbuzz.in/wp-content/uploads/2013/02/Job-Profiles-in-SSC-CGL.pdf · 2/4/13 Mrunal » [Job Profile] SSC-CGL: desk jobs, field

JOBS UTAH JOB OUTLOOK Make informed

Find the latest update of upcoming government jobs in India like banks job , Railways job, Sarkari Jobs.

Special Jobs Claudio Cherubino INFN - Catania. 2 MPI jobs on gLite DAG Job Collection Parametric jobs Outline.

Resumes and job interviews for tech jobs

Jobs and Job Analysis