Post on 01-Jun-2020
aFlux version 0.0.3.0 -SNAPSHOT DocumentationDate: May 31, 2017
Revision Date Author Description
0.0.1 May 31, 2017 Tanmaya Mahapatra Document Creation: Functionality / WalkThrough / Build and Deploy aFlux / How to Create a Plugin / Code Summary
0.0.1.1 June 3,2017 Tanmaya Mahapatra Add Creating Plugin in IntellijIdea
0.0.1.2 July 31,2017 Tanmaya Mahapatra Pig Script Flow Example
0.0.1.3 August 7,2017 Tanmaya Mahapatra Hive Flow Example
0.0.1.4 Auguts 14,2017 Tanmaya Mahapatra Common Analytics Example
0.0.2.0 August 17,2017 Tanmaya Mahapatra Common Analytics Language (In Code Summary Section)
0.0.2.1 August 17,2017 Tanmaya Mahapatra Common Analytics Example update
0.0.2.2 August 17,2017 Tanmaya Mahapatra Added Hive Samples to CA-> Hive and mvn antlr4 plugin mention
0.0.2.3 August 17,2017 Tanmaya Mahapatra Add Subflow Section
0.0.2.4 August 18,2017 Tanmaya Mahapatra How Pig Flows Work / How Pig Executor Works
0.0.2.5 August 21,2017 Tanmaya Mahapatra Add Local Subflow section
0.0.3.0 May 2,2018 Tanmaya Mahapatra Frontend Description
aFlux
FunctionalityList of use casesTopic Function Url Controller User
ActionDescription
Setings read settings
/settings FlowSettingsController
automatic read settings
FlowElementType refresh Tools
/flowElementType FluxElementTypeController
automatic on refresh toolbar / activate - deactivate plugins / remove plugin
FlowElementType /elementTypes/{name} FluxElementTypeController
not used
FlowElementType /elementBaseProperties FluxElementTypeController
automatic showing element properties / creating new element
Job get All Jobs /jobs FluxJobController Open Job Dialog
Job Open a Job /jobs/get/{id} FluxJobController Open a Job : Menu : Job / Open : Toolbar - folder icon
get a specific job
Job Save a Job /jobs/save FluxJobController Menu: Job / Save - Toolbar save icon
persist job information
Job Rename Job /jobs/saveAs FluxJobController Menu: Job / Save as
persist job selecting another name
Job Delete Job /job/remove/{jobId} FluxJobController Job Selector Screen accesed via Job Open / press remove icon
Removes information of job from application (it cannot be recovered)
Topic
Job New Menu: Job / New
Creates a new job in frontend. To be persisted should be saved
Job Rename Job Menu: Job / Rename
Renames the job. To be persisted should be save. Job Rename followed by Save is equivalent to Job / Save as
Activity Add Press + sign on toolbar and set a name
A new tab (activity) is created for the current job
Activity Remove Press - on toolbar
The selected tab (Activity) is removed if there is more than one activity in the current job
Plugin get all available plugins
/plugins FluxPluginController Upload a plugin (+ sign below left toolbar) / Menu Plugin Option
Show list of plugins to manage them
Plugin Activate Plugin
/plugin/activate/{id} FluxPluginController Press activate in Manage Plugin Dialog
Load classes from selected plugin to make related tools available if there is no conflict with current tools (Each tool is identified by its complete class name) cannot be present 2 implementations of the same tool. A deactivated tool can be reactivated only after restarting the application
Function Url Controller User Action
DescriptionTopic
Plugin Deactivate Plugin
/plugin/deactivate/{id} FluxPluginController Press Deactivate in Manage Plugin Dialog
Deactivates the selected plugin. All related tools are removed from left toolbar. To reactivate the plugin application should be restarted
Plugin Remove plugin
/plugin/remove/{id} FluxPluginController Press remove icon in Manage Plugin Dialog
Deactivates the plugin and erases the uploaded related jar. The plugin will be erased definitively
Plugin Upload plugin
/plugin/upload FluxPluginController + Sign below left toolbar
Uploads a plugin. It upload the jar file associated to a plugin . It will be in deactivated state initially. Use Plugin Manager menu option to activate it.
Output Show execution output
/showOutput ShowOutputController
automatically generated when run an activity or a job
Shows the output generated on running elementsIn the footer of the screen is shown the output of the running processes generated by the sentence sendMessage in the implementation of tools
Output Change size Drag splitter bar
changes the size of output footer panel
Output Clear Output Clear Output icon in toolbar (X and minus inside circle)
Clears the execution output
Function Url Controller User Action
DescriptionTopic
Output Refresh Output
/showOutput ShowOutputController
Press Refresh icon
Ask the server to show the last generated output. No pressing refresh this operation is run periodically
Running Processes Show running processes
/getEnvironment/status ShowEnvironmentStatusController
Click on canvas on any region where there are not any element or connector
Shows list of running processes on right pane (Properties Panel)
Running Processes Show running processes
/running RunFluxController not being used
Execution Run activity /run/{jobName}/{activityName}
RunFluxController Play button in toolbar
Runs the current activity- The output is shown in footer and during execution the selected job/activity is shown in Properties Panel (if there is no other element selected)
Execution Run Job /runAll/{jobName} RunFluxController Run Job Icon in toolbar
Run all activities of current job / Activities are added to running processes during execution
Execution Stop Activity /stop/{jobName}/{activityName}
RunFluxController Stop button in toolbar
Stop activity execution if it is running
Execution Stop Job /stopAll/{jobName} RunFluxController Stop Job icon in toolbar
Stop all running activities of current job
Edit Activity Add Element
Drag tool from left toolbar to canvas
Add an element to current activity
Function Url Controller User Action
DescriptionTopic
Edit Activity Add Connector
Drag from output (right) connector of an element to an input (left) connector of another element in canvas
Add a connector between 2 elements
Edit Activity Delete element
Select an element in canvas and press delete icon in toolbar
Deletes the selected element in canvas
Edit Activity Delete connector
Select a connector in canvas and press delete icon in toolbar
Deletes the selected connector in canvas
Edit Activity Edit Element Properties
Select an element in canvas and modify data shown in Properties (right) pane
Edit custom properties of an element
Function Url Controller User Action
DescriptionTopic
WalkthroughThis section shows as example some operations on the applicationOpen the application loading http://localhost:8080 on browser
(NOTE: The samples are shown using localhost:3000 running the process in nodes server)To run the app from a browser accessing a nodes server is recommended to use CORS Toggle
extension in Google Chrome
Add a plugin pressing + below left toolbar
Press: Upload Plugin FileSelect a jar file containing the tools implementationPress Accept
Press the activate button to activate the tools associated to the plugin
The list of tools are shown in left panel
Create a new Activity pressing + sign in toolbar and set the name of the new tab
and set the name of the new activity (e.g. registerTwitter)
Components Loaded
Activity Creation
Select activity-1 pressing on it and press - button to remove itAnswer Yes to Are you Sure Question
The selected activity is removed
Drag tools to the canvas to define a flowGenerate connectors dragging from output interfaces of elements to input interfaces of other elements
Select individual elements and set the configuration properties to do the wanted task
Select the element and edit the properties
Save the job in this using the option Job / Save As and setting a name
Drag the splitter bar to increase the output size
Unselect the selected element in canvas clicking in a blank areas (to view Running Processes at right)
Run the process to view the output in footer
During the execution can be viewed the list of running activities in right side panel
Subflows in aFluxSubflows in aFlux are flows that can be used as a tool in other activities.
The way to define a sub flow is to setting to true the property sub flow of an activity.
User can have access to activity properties clicking in any empty area of the flux canvas on any activity.
When saving a Job that has at least one activity marked as subflow, it will be added a new tool representing that activity in the left tool pane area.
The subflow tools as other tools can be dragged into the flux canvas and used in the same way.The execution of the sub flow box is the execution of the internal defined flows that makes it.
When a subflow tool is added to the canvas it is also added a new activity to edit the components of the added tool.
The added tab is used to edit the sub flow components in the current job. As the sub flow is an instance of the current job it is independent on the sub flow original definitions. Editing a sub flow in a job does not affect the original definition.
It can be added more tools to the canvas in the new job and also to the subflows.
In the example the sub flow is formed by 2 wait tools wait : 2500 mswait: 1500 ms
After adding the tool Wait2515Job:wait2515 to the canvas to the new job also are added to waits before and after the subflow tool.
wait 1000 mswait 2000 ms
When running the flow will execute the following actions
wait 1000 ms
subflow : wait 2500 ms
wait 1500 ms
wait 2000 ms
as shown in the image.
Subflow tools also has an async output connector that allows launch the next activity as soon as sub flow tool starts.
Local SubFlows
Local SubFlows are sub flows that are used in a Job Scope. Subflows defined as local are not present in the left side toolbar to be used in other jobs. The main goal of local sub flow is to simplify a graph dividing it definition of a complex flow. The local sub flow definition will not be reused by more than one job.
To create a Local SubFlow select the local sub flow tool to be added to the canvas of an activity in any job.
Drag it to the canvas
When local sub flow Tool is dropped in the canvas it is also added a new tab in activities pane with the same name of the sub flow (like global sub flows) . Local sub flows will not be present in left toolbar after saving the job.
In the properties Panel the node element name can be changed and automatically will be changed the tab name
The definition of the sub flow should be defined in the new created tab. When execution the sub flow node will be executed the provided definition.
How to Build and Deploy aFluxInitially aFlux is composed of 4 main projects
The projects are:- aFlux: java project - Main Application- aflux-tool-base: java project - Base library- aflux-tool-mainplugins: java project Plugin library with a set of base tools- nodejs/async-flows: Javascript project: Application frontend
Build process1.- Build frontend1.1.- In nodejs/async-flows run > npm run build
It generates static folder inside build folder
2.- copy static folder to aFlux/src/main/resources and rename javascript files to main.js and main.css
2.- Build Backend:2.1.- Run mvn clean install in aflux-tool-base project2.2.- Run mvn clean install in aflux-tool-mainplugins project2.3.- Run mvn clean install
in aFlux
3.- Start mongo db database>mongod
(To initiate application from scratch delete database)db.flowSetting.drop()db.flowElementType.drop()db.flowPlugin.drop()db.flowJob.drop()
4.- run application
in aflux run
java -jar aflux-0.0.1-SNAPSHOT.jar
5.- Access application from a browser loadinghttp://localhost:8080
6.- The app initially is started with no tools. To add tools upload and activate the plugin File target/aflux-tool-mainplugins-0.0.1-SNAPSHOT-plugin-jar-with-dependencies.jar from project aflux-tool-mainplugins
Option B Run nodejs frontend1.- Build application2.- Follow Steps to deploy app except step 2 (copy static folder)3.- in nodejs project run >npm start4.- Access application from a browser loadinghttp://localhost:3000
NOTE: In order to access from nodejs port 30 in Google Chrome install extension CORS Toogle and set on
How to Create a Plugin A plugin is a set of tools that can be used in aFlux
Create a Maven Java Projectpom.xml
In dependencies section add the dependency
<dependency><groupId>de.tum.in.aflux</groupId><artifactId>aflux-tool-base</artifactId><version>0.0.1-SNAPSHOT</version>
</dependency>
use the jar aflux-tool-base.jar
In plugin section add
<build><plugins>
<plugin> <artifactId>maven-assembly-plugin</artifactId> <version>3.0.0</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> <descriptors> <descriptor>assembly/plugin-assembly.xml</descriptor> </descriptors> </configuration> <executions> <execution> <id>make-assembly</id> <!-- this is used for inheritance merges --> <phase>package</phase> <!-- bind to the packaging phase --> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin>
</plugins></build>
In the folder asembly add the file plugin-assembly.xml
with the following content
<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.0.0 http://maven.apache.org/xsd/assembly-2.0.0.xsd">
<id>plugin-jar-with-dependencies</id> <formats> <format>jar</format> </formats> <includeBaseDirectory>false</includeBaseDirectory>
<dependencySets> <dependencySet> <outputDirectory>/</outputDirectory> <useProjectArtifact>true</useProjectArtifact> <unpack>true</unpack> <scope>runtime</scope> <excludes> <exclude>junit:junit</exclude> <exclude>de.tum.in.aflux:aflux-tool-base</exclude> <exclude>com.typesafe.akka:akka-stream_2.11</exclude> <exclude>org.scala-lang:scala-library</exclude> <exclude>com.typesafe.akka:akka-actor_2.11</exclude> <exclude>com.typesafe:config</exclude> <exclude>org.scala-lang.modules:scala-java8-compat_2.11</exclude> <exclude>org.reactivestreams:reactive-streams</exclude> <exclude>com.typesafe:ssl-config-core_2.11</exclude> <exclude>org.scala-lang.modules:scala-parser-combinators_2.11</exclude> <exclude>org.springframework:spring-context</exclude> <exclude>org.springframework:spring-aop</exclude> <exclude>org.springframework:spring-beans</exclude> <exclude>org.springframework:spring-core</exclude> <exclude>commons-logging:commons-logging</exclude> <exclude>org.springframework:spring-expression</exclude> <exclude>org.slf4j:slf4j-api</exclude>
<exclude>org.mongodb:mongodb-driver</exclude>
<exclude>org.mongodb:bson</exclude> <exclude>org.mongodb:mongodb-driver-core</exclude>
</excludes> </dependencySet>
</dependencySets>
</assembly>
This plugin configuration makes mvn clean installgenerates the resulting jar in target
<plugin name project>—<version>-plugin-jar-with-dependencies.jar
that should be uploaded in aFlux
The jar contains all the classes to run the plugin excluding those base classes present in aflux to avoid conflicts
Implement the toolsIn each plugin can be implemented more than one tool.
To implement each tool is needed to create at least 2 classes
- Executor Class must extend de.tum.in.aflux.tools.core.AbstractMainExecutor- Actor: Class must extend de.tum.in.aflux.tools.core.AbstractAFluxActor
It can be added more classes and dependencies as needed to execute the desired task
Executor: Is the class that contains information about the tool as a component that interacts with aFlux and can be used by aFlux on its environment. Executor is the glue between aFlux and the Actor
Actor: Is the class that contains the implementation of what should execute every time aFlux run the element. The task in implemented in the method runCore of this class.
Typical parts of an executor
As a sample is shown WriteMongoDB implementation to describe its main sections
- Definition- extends AbstractMainExecutor
- NAME is not mandatory. Only to have a visual description of the tool- connectionPropertes
- An array of ToolProperty to define a set of custom properties can be edited for each instance of the tool and can be used (read or write) during execution
- This properties on runtime can be accessed using this.getPropertyValues[index]- This properties are shown in the Properties Editor Pane in aFlux and user can set values for
each - This values are added to the base set of values all elements have (name / color / dispatcher /
width / mailbox)- The properties are setted passing as last parameter on constructor
- Constructor with no parameters- Establish the constructor for the tool
The constructor of the superclass uses the following data
public AbstractMainExecutor(String name,String actorClassName, String className,int inputInterfaces,int outputInterfaces,int launchedBy,Boolean async, ToolProperty[] properties)
Constructor parameters
name: Name of the tool to be shown in tool gallery
actorClassName: Complete class name of related actor that contains the implementation of the task
className: the class name of this class (the executor class)
inputInterfaces: Number of input interfaces (Includes data and no data interfaces)Input interfaces are represented as connectors that can receive data (or signal) from
another element. “no data” interfaces refers to a precedence relation.There are no difference between data or no data connectors. The real difference depends
on the implementation. Usually “no data” input means that in the implementation of runCore in the actor the task can be executed with no importance of the message that has received.
In case the implementation doesn’t take data from the received message can be said that the element can be launched by SIGNAL.
In this version …. inputInterfaces should be always 1
outputInterfaces: Number of output data interfaces. This number does not include the async output
connector. Actors can send data to other actors calling setOutput(index,message) having index between 1 and outputInterfaces
launcheBy: Indicates if this tool will be launched by receiving data or by a start signalPossible values ar LAUNCHED_BY_DATA or LAUNCHED_BY_SIGNALThere are no difference between data or no data connectors. The real difference depends
on the implementation. Usually “no data” input means that in the implementation of runCore in the actor the task can be executed with no importance of the message that has received.
In case the implementation doesn’t take data from the received message can be said that the element can be launched by SIGNAL.
async: indicates if the tool is async capable or notAsync capable elements allows to have a connector that send a simple signal message at
the time the element starts execution. It does not need to specify a setOutput(index, message) to run it.
properties:
Names and initial values of the editable properties in orderSet of properties that allow users to set values for each instance of each element. Each of
these properties will appear in the properties pane to be edited for each elementBeyond these defined properties each element also will have a set of base properties that
can be edited (name / width / color / dispatcher / mailbox)The value the user input for each property can be accessed in runCoreImplementation
calling the method this.getProperties().get(index)
Typical parts of an actor
As a sample is shown WriteMongoDBActor implementation to describe its main sections
- Definition- extends AbstractAFluxActor
- Declare Constructor
Generally will only call its superclass constructor
public WriteMongoDBActor(Long fluxId, FluxEnvironment fluxEnvironment, FluxRunner fluxRunner,Map<String,String> properties) {
super(fluxId, fluxEnvironment, fluxRunner,properties);}
- Implement runCore method
protected void runCore(Object message) throws Exception
RunCore will have the implementation of the task to be executed.Following will be described operations that can be used in the context of aFlux
Operations that can be used inside runCore
this.setOutput(index,message)
Sets a value to a specific connector indicated by index.Possible values for index are 1 to the value settled in outputInterfaces.
message can be any value (Object) that will be send to all elements connected to the indicated output connector
Notice that setOutput can be called in any location of the runCore method or any method called by it. This means that the message can be sent (and other elements can be launched) before finish of execution of runCore
this.sendOutput(String)
Sets a value that will be shown in the output section in the footer
this.getProperties.get(index)
Allows to use values setted by user in properties panel for each element instance.
async connections
Before the begin of runCore() execution the runner will launch any signal message to the elements connected from the async connector
Plugin Installation
To install the plugin the resulting jar must be uploaded.After uploading it (creates a copy in server environment in uploads/<datetime>/<jarname>) it should be activatedOn activation the classes in the plugin JAR are loaded in classLoader of aFlux it there are no errors nor conflictsCannot be activated plugins that contains actor or executor classes already loaded in aFlux
The plugins can be activated / deactivated / removed according to the following state machine description
Generating Plugin in Intellij Idea
The project plugin should be a maven Project
It can be build using the options
View / Tool Windows.. / Maven ProjectsAnd the AccessMaven Projects (section) and run Lifecycle / Install
and the plugin jar is generated in target folder.
Maven Projects in IntelliJ
Pig Script Flow Example As an Example it is presented
Function:Load data from a csv file available in HDFSSpecify to take only a defined number of recordsStores de selected pig data
The elements to build the script are:
Pig LOADPig LIMITPig STOREPig Executeshow value
The parameters of each element are:
Pig LOAD:source data: /user/root/links.csvfunction (optional): PigStorage(‘,')schema(optional): movieId:int,imdbId:int,tmdbId:intalias: PIGLINKS
This Tool generates the Pig Latin sentencePIGLINKS = LOAD ‘/user/root/links.csv’ USING PigStorage(‘,’) AS ‘movieId:int,imdbId:int,tmdbId:int’;
source data refers to an existent HDFS directionalias is the target element where the result will be allocated
Output is a Pig Execution Plan that contributes to generate the final sentences
if function and scheme are specified allow interpret the content as a set of records. If not they are considered a unique record.
PIG LIMITsource data: <empty>limit: 3target alias: PIGLINKSLIMITED
This tool produce the PIG sentencePIGLINKSLIMITED = LIMIT PIGLINKS 3;
In this case… the PIGLINKS element is taken from the last tool that assigns an element.If in the previous path there are more elements it can be specified in alias any used element before declaring it in alias. If the alias used corresponds to the last one it is no need to declare it.
Pig STORE
alias: <empty>directory: /user/root/limitresult9function:
This tool generate the sentenceSTORE PIGLINKSLIMITED INTO ‘/user/root/limitresult9’;
The alias is deduced from the previos tool used (Pig LIMIT)It generate the sentence to store the desired element
Output is a Pig Execution Plan that contributes to generate the final sentences
Pig Execute:script: <empty>
In the script property could be setted s string representing a Script to be executed before the execution plan stablished in the flow.Pig Execute is the tool that sends the previously defined script to the Pig Server and executes it
Pig Execute generates 3 outputs: (In this example are sent to the same show value tool)
1st output: The result : it sends the data stored in ‘/user/root/limitresult9’2nd output: The final script: It outputs the following strings:
PIGLINKS = LOAD ‘/user/root/links.csv’ function= PigStorage(‘,’) schema=‘movieId:int,imdbId:int,tmdbId:int’;PIGLINKSLIMITED = LIMIT PIGLINKS 3;STORE PIGLINKSLIMITED INTO ‘/user/root/limitresult9’;
3rd output: it outputs the representation of the Java Object Execution Plan that has each element and the parameter values.
Hive Flow Example As an Example it is presented
Function:Creates a Hive Table. Load data from a csv file available in HDFS.Selects some records based on a condition Run the sentences and show the results
The elements to build the script are:
Hive CREATE TABLEHive LOADHive SELECTHive Executeshow value
The parameters of each element are:
Hive CREATE TABLE:temporary: falseexternal: falsename: moviescolumns: movieId int,title String,genres Stringcomment: table for deal with moviespartitioned: <blank>row format: SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ("separatorChar" = ",","quoteChar" = "'","escapeChar" = “\\")format: TEXTFILElocation: <blank>
This Tool generates the Hive sentence:CREATE TABLE IF NOT EXISTS movies
(movieId int,title String,genres String) COMMENT 'table for deal with movies'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = “,","quoteChar" = “'","escapeChar" = "\\")
STORED AS TEXTFILE;
temporary: If true adds the TEMPORARY clause to the sentenceexternal: If true adds the EXTERNAL clause to the sentencename: Name of the Hive table to be createdcolumns: List of comma separated values defining the columns of the table composed by Name Type pairs separated by space. (with no enclosing quotes)comment: Table comment for the table definition (no enclosing quotes)partitioned: String to add a PARTITIONED BY clause (with no enclosing parenthesis)row format: String to specify ROW FORMAT clause. In the example is shown a typical CSV row format.format: STORED AS clause. Allowed Values: TEXTFILE / SEQUENCEFILE / ORC / PARQUET / AVRO / RCFILElocation: hdfs location
Output is a Hive Execution Plan that contributes to generate the final sentences
Hive LOAD:hdfs file: /user/root/movies.csvoverwrite: falsename: moviespartition: <blank>
This Tool generates the Hive sentence:LOAD DATA INPATH '/user/root/movies.csv' INTO TABLE movies
hdfs file: Source file to be loaded. As it is referred to an HDFS resource the specified resource will be “moved” being deleted from the former hdfs locationoverwrite: If it is settled to true then the contents of the target table (or partition) will be deleted and replaced by the files referred to by filepath; otherwise the files referred by filepath will be added to the tablename: Name of the Hive table where data will be insertedpartition: String to declare partition clause.
Output is a Hive Execution Plan that contributes to generate the final sentences
Hive SELECT:all/distinct: ALLexpressions: movieId,title,genressource list: moviescondition: title > ‘K’
This Tool generates the Hive sentence:SELECT ALL movieId,title,genres FROM movies;, type=QUERY] WHERE title>'K'
all/distinct: ALL or DISTINCT clauseexpressions: List of expressions defining the columns in the resulting query answersource list: Name of the main Hive table to be queriedpartition: String to declare partition clause.group by / having: Strings to be added to GROUP BY and HAVING clauseorder by: Expression to be added to ORDER BY clauselimit: numeric expression to generate a LIMIT clauseunion: indication of a following UNION clause. Considered values are ALL / DISTINCT / <blank> If ALL or DISTINCT is setted it should be followed for a Hive Union Tool
Output is a Hive Execution Plan that contributes to generate the final sentences
Hive Execute
<no properties>
Hive Execute is the tool that sends the previously defined sentences to the HiveServer and executes it (HiveServer2 should be running)
Hive Execute generates 3 outputs: (In this example are sent to the same show value tool)
1st output: The result : it sends the result of the last SELECT statement.If there are many select statements it outputs only the result of the last one.2nd output: The final list of sentencs: It outputs the following strings:
CREATE TABLE IF NOT EXISTS movies(movieId int,title String,genres String) COMMENT 'table for deal with movies' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ("separatorChar" = ",","quoteChar" = "'","escapeChar" = "\\") STORED AS TEXTFILE;LOAD DATA INPATH '/user/root/movies.csv' INTO TABLE movies;SELECT ALL movieId,title,genres FROM movies WHERE title>’K';
3rd output: it outputs the representation of the Java Object Execution Plan that has each element and the parameter values.
Common Analytics Flow Example As an Example it is presented
Function:Generate a CA Sentence to Load data from cvs file and generates a Table.Generate a CA Sentence to Select records from the TableGenerate a CA Sentence to Show the selected recordsTranslates the generated CA Sentences to Pig or Hive and executes the resulting scriptShow the data , CA Sentences and CA Execution Plan
The elements to build the script are:
CA LOADCA SELECTCA SHOWCA Executeshow value
The parameters of each element are:
CA LOAD:file name: /user/root/links.csvstructure: movieId:INT,imdbId:INT,tmdbId:INTalias: CALinks3
This Tool generates the CA sentence:
LOAD '/user/root/links.csv' TO CALinks3 STRUCTURE(movieId:INT,imdbId:INT,tmdbId:INT);
file name: The hdfs resource including its path. Its content should be in csv formatstructure: List of pairs <name>:<type> separated by comma. Type could be INT or STRINGalias: Target collection name
Outputs a Common Analytics Execution Plan with the load sentence added (to the received input Common Analytics Execution Plan)
CA SELECT:target alias: SelectedCALinkssource alias: CALinks3columns: *filter: imdbId>1000
This Tool generates the CA sentence:SelectedCALinks = SELECT CALinks3 (*) FILTER (imdbId>1000);
target alias: The collection where the result will be stored.source alias: The source collection namecolumns: List of separated by comma expressions that will be generated. * means all the table columns.filter: condition to filter the source data
Output is a CA Execution Plan that contributes to generate the final sentences
CA SHOW:alias: <empty>
This Tool generates the CA sentence:SHOW SelectedCALinks
alias: The collection to be shown. If it is empty take the alias of the last generated collection in the precedent flow.
Output is a CA Execution Plan that contributes to generate the final sentences
CA Execute:
executor: PIGrun: true
CA Execute translates the previously generated CA sentences to Pig or Hive depending on the value of property executor. If run is set to true, it also ask aflux to execute the resulting sentences
executor: PIG or HIVE. Value to select the executor to which the CA Sentences will be translatedrun: if is set to true runs the resulting script. If not, only generates the CA Plan (output 2 and 3)
Hive Execute generates 3 outputs: (In this example are sent to the same show value tool)
1st output: The result : it sends the result of the last SELECT statement.If there are many select statements it outputs only the result of the last one.
For example . for the current flow . if PIG is selected it generates the following Pig Latin Sentences
CALinks3 = LOAD '/user/root/links.csv' USING PigStorage(',') AS (movieId:int,imdbId:int,tmdbId:int);SelectedCALinks =
FILTER CALinks3 BY imdbId>1000;STORE SelectedCALinks
INTO 'SelectedCALinks_tmp_20170817093935186' USING PigStorage(',');
If HIVE were selected it should generate the following sentences:
CREATE TABLE IF NOT EXISTS CALinks3 (movieId INT,imdbId INT,tmdbId INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ("separatorChar" = ",","quoteChar" = "'","escapeChar" = "\\")
STORED AS TEXTFILELOAD DATA INPATH '/user/root/links.csv' OVERWRITE INTO TABLE CALinks3CREATE TABLE IF NOT EXISTS SelectedCALinks
row format delimited fields terminated by '|' STORED AS RCFile AS SELECT ALL * FROM CALinks3 WHERE imdbId>1000
SELECT ALL * FROM CALinks3 WHERE imdbId>1000SELECT ALL * FROM SelectedCALinks
2nd output: The final list of CA sentences: It outputs the following strings:
LOAD '/user/root/links.csv' TO CALinks3 STRUCTURE(movieId:INT,imdbId:INT,tmdbId:INT); SelectedCALinks = SELECT CALinks3 (*) FILTER (imdbId>1000); SHOW SelectedCALinks;
Pig or Hive translation is available in log if debug is enabled
3rd output: it outputs the representation of the Java Object CA Execution Plan that has each element and the parameter values.
Pig output Example
Hive Output Example
Code Summary
setOutput(index,message) getProperties.get(index)
fluxRunnerImpl.broadcast(…)
sendOutput(message)
Actor Functions: What aspects they represent on the front-end?
Sequence diagram describing sample execution of an activity
The diagram doesn’t show all the calls. It is created in order to identify main interactions
FluxRunnerEnvironment:
It is the main executors. It contains a list of all running activities. Each running activity generate a FluxRunnerImpl instance that is created on starting and destroyed on finishing
FluxRunnerImpl:It builds a map of elements and connections based on the activity is running. Creates each element of the activity (instantiating Executor: AbstractMainExecutor and Actor) It creates a new ActorSystem and sets all actors in the actor SystemThe process starts all elements that has no predecessors and does not need input data
Before starting each element class AbstractAfluxActor send the signal for async outputs (no shown in this graph) The call to async methods is made by calling broadcast with index=-1
After starting them…. each element, while running runCore can call the setOutput method.
On calling setOutputMethod, it calls “broadcast” method that send messages to all connected elements.
When each element finishes running core, AbstractAFluxActor calls the finish method indicating to FluxRunnerImpl that that process has ended.
Every time each element is launched or finished, FluxRunnerImpl notifies that information to the class ExecutionEnvironment (not shown in graph) that takes into account each initiated and finished running processes in order to determine when the entire flux has finished (no running processes)
Frontend Summary
aflux frontend is implemented using React + Redux
See: https://redux.js.org/basics
Main Components in Folders
aFlux Location Description React+Redux Element
index.js Entry point to application main
components Parts of main graphical elements
containers Element or regions that group other elements
functions General used functions - util folder
images Static images
libs Downloaded resources - the original resources do not affect behavior
model Data model - They are the nexus between backend model and graphical elements
reducers Only aFluxMainReducer. . This is a reducer, a pure function with (state, action) => state signature. * It describes how an action transforms the state into the next state.It is called by index.js oncevar store = createStore(mainReducer)
Reducer
redux-actions Each event that happens … generated by the user or by another event or a timeout or any external event generates an action. The action has a unique type… and the set of values that are enough to generate a reaction.Actions are the parameters passed to reducers to change the state
Action
redux-components These are like React Components. They are basically Visual Representation that generates the view. Each element has a body where are defined those properties and functions that should be used They return the typical “render” function of react components. render function is the definition of how the component should be showed. Like HTML tags represent other defined elements with equivalent names.
React Components: https://reactjs.org/tutorial/tutorial.html
redux-containers Containers are correspondent to components. They define the behavior of the events launched on views. Most of them call redux.dispatch… passing the actions as parameters to launch the desired change of state.Components are the views and containers the behavior
Typical React Containers
aFlux Location
Description of the flow
- React reacts to events. - The state is the whole set of values that are used to show the “view” in an instant of time.- View is drawn by react following the definitions of return values of “redux-components” and the values of the current state.- When something happens .. “redux-containers” generates an “action” - The “action” and current “state” are passed as parameters to “redux-reducer” to generate the new “state”- React redraws the view based on the new state and waits for new events.
COMPONENTS
redux-reducers Main element is AsyncFlowsAppReducer. It contains the mapping that takes as parameters “Current state” and “Action” and generates the “New State”… (state, action) => state… After that react redraws the view based on the new state
Redux Reducer
screen Simple Dialog Screens like “Open Jobs”
Description React+Redux Element
aFlux Location
ActivityTitleConnectors
CONTAINERS
FooterContainer
HeaderContainer
ToolbarContainer
MODEL
REDUX COMPONENTS
Element Description
Flow Set of nodes and connectors with its seated properties values.
FlowActivity Activity is represented by a Tab in the View Panel . It has a name and contains a flow
FlowConnector A link between two nodes
FlowPlugin Information about a plugin. Plugin are set of Node Types that can be loaded and used
FlowElement Node.
FlowElementProperty Property of a node
FlowElementType It is the template of a node. Generally FlowElementType are showed in Left Tool Panel
FlowElementTypeProperty
Property of a FlowElementType
FlowJob Set of Activities
Element Description
ReduxActivityProperties Right panel where user can edit node properties
ReduxActivityTab One only tab in the tab pane that contains the name of the activity and can be clicked to select the activity
ReduxAsyncFlowsApp The whole view
ReduxCanvas WhiteBoard . White part of the screen where nodes and links are drawnCanvas contains the Flows that are drawn
ReduxPropertiesPanel Right Panel to hold properties
ReduxRunningProcessesProperties
Properties Panel content while it is showing the number of processes that are running
ReduxSideBar Left toolbar container
ReduxToolNodeElement Each one of the tools present in the left tool panel
Element
ReduxWorkspace The whole view. The only child of ReduxAsyncFlowsApp
ReduxCanvasNodeElement Container to hold Nodes. It holds ReduxNewNodeElement or ReduxNodeElement. It is the node from the perspective of the canvas (Canvas does not know if it is new or not… it only has a box in a position)
ReduxFlowTabs Tab Panel where Activity tabs are shown
ReduxNewNodeElement Nodes that have not been saved (added recently) Where they are saved they are converted to ReduxCanvasNodeElement
ReduxNodeElement Nodes in the canvas. These represent nodes that were previously saved.
ReduxNodeProperties Properties of a node
DescriptionElement
Common Analytics LanguageIn the following section is the description of Common Analytics Source code solution
Common Analytics Plugin provide a set of tools with reduced and simple elements to run Big Data tasks executing it in defined target Big Data related languages.
First version makes translations to Pig and Hive. In next version will be added Spark.
Main sentences of CA language are the following simple sentences:
<load_sentence>: LOAD <file_name> TO <collection_name> STRUCTURE(<structure_definition_list>);
<show_sentence>: SHOW <collection_name>;
<select_sentence>: <collection_name> = SELECT <collection_name>(<expression_list>)
FILTER ( [condition_list] );
<summarize_sentence>: <collection_name> =
SUMMARIZE <collection_name> ( [summary_expression_list] ) KEYS ( [groupby_list] );
<join_sentence>: <collection_name> = JOIN <collection_name> AND <collection_name>
COLUMNS( <expression_list> ) MATCH ( [matching_list] );
The complete grammar is in aflux-tool-commonanalytics(Location: /src/main/antlr4/CommonAnalytics.g4 using antlr4 grammar notation
There are also available sample sentences in src/main/resources/resources/afluxSentencesSample.gr
LOAD 'data1' TO A STRUCTURE(a1:INT,a2:INT,a3:INT); LOAD 'data1' TO A STRUCTURE(a1:STRING,a2:INT,a3:INT); TMP1 = SELECT T1(col1,col2) FILTER(); TMP1 = SUMMARIZE T1() KEYS(col1,col2); TMP1 = SUMMARIZE T1() KEYS(col1); X = SUMMARIZE A() KEYS(*); TMP1 = SELECT sales(*) FILTER (amount>10 AND region=='US'); X = SELECT A(*) FILTER(f3==3); X = SELECT A(*) FILTER((f1 == 8) OR (NOT (f2+f3 > f1))); X = SELECT A(*) FILTER(); X = SELECT A(a1,a2) FILTER(); B = SUMMARIZE A() KEYS (age); t2 = SUMMARIZE t1(SUM(col2) AS col2sum) KEYS(col1); tmp1 = SELECT t2(col1) FILTER(t2.colsum>10); X = JOIN A AND B COLUMNS (*) MATCH (a.id = b.id); X = JOIN A AND B COLUMNS(*) MATCH(a.id = b.id,a.department= b.department); X = JOIN A AND B COLUMNS(*) MATCH(a.a1=b.b1); TMP1 = SELECT customers(*) FILTER (); TMP1 = SUMMARIZE t1(SUM(col2) AS SUM_COL2) KEYS(col1); TMP2 = SELECT TMP1(col1) FILTER(SUM_COL2>10); LOAD '/user/root/links.csv' TO CALinks2 STRUCTURE(movieId:INT,imdbid:INT,tmdbId:INT); SelectedCALinks = SELECT CALinks2 (*) FILTER (imddbId>1000);
SHOW SelectedCALinks; These sentences can be tested running CommonAnalyticsSample.java in package de.tum.in.aflux.component.commonanalytics.util (to view the Pig and Hive resulting sentences)
There are available 5 main tools
- CommonAnalyticsLoad - CommonAnalyticsSelect - CommonAnalyticsShow - CommonAnalyticsSummarize - CommonAnalyticsJoin
and their corresponding actor classes that are related with each defined sentence
It is also provided a CommonAnalyticsExecute tool that translates the Common Analytics Language to Pig or Hive depending on the setted properties (executor: PIG / HIVE)
How common Analytics Language tools workA common analytics flow generates basically a CommonAnalyticsExecutionPlan (See in code CommonAnalyticsExecutionPlan.java)
The plan it has almost no behavior but only store enough data to represent a list of sentences (each of them called implemented in CommonAnalyticsExecutionStep.java)
Each CommonAnalyticsExecutionStep has a direct relation with a CA Sentence.
aflux-tool-commonanalytics Packages and Classes
Each sentence-type tool add the corresponding sentence to the plan passing it to the next flow.
Each tool receives a plan as input and generates a new plan as output having added the sentence.
Each tool has enough parameters the user can fill to complete the sentence.
The plain text for the generated plan can be obtained calling getScript() method on a CommonAnalyticsExecutionPlan object.
Generally the end tool in a CA Flux should be a CommonAnalyticsExecute tool.
As can be seen in CommonAnalyticsExecuteActor.java depending on the property “executor” the method run core will translate the Script obtained from the plan to a Pig Latin Script or a Hive Script.
The translation is made using a parser (later described in this manual)The translation produce a PigExecutionPlan.java or HiveExecutionPlan.java that are classes provided by aflux in aflux-tool-base.To run the resulting Pig or Hive script CommonAnalyticsExecuteActor calls PigExecutor.java or HiveExecutor.java present in aflux-base-tool.jar.PigExecutor and HiveExecutor are the classes provided by aflux and can be used by any other plugin being built in the future. They use the properties specified in aflux application. Before running PigExecutor and/or HiveExecutor generates a text script based on the plan.
If run property of execute tool is setted to “true” it also run the Pig or Hive Script.
Output of common analytics execute are 31.- The result (data)2.- finalScript (txt in common analytics language)3.- executionPlan (CommonAnalyticsExecutionPlan java object)
How Translation is made
There are 2 main classes that make the translation, CommonAnalyticsPigListener.java for Pig Latin and CommonAnalyticsHiveListener.java for Hive.
These classes take specific actions when entering or exiting each lexical element of the CA Language script (plain text) generating a PigExecutionPlan or HiveExecutionPlan (defined in flux-tool-base) An empty plan is passed as parameter to the parser. After the parser process is executed the plan contains the translated steps (or sentences)
Building and deploying changes on CA Grammar
Parser implementation is present in package de.tum.in.aflux.component.commonanalytics.grammar
These classes are generated automatically by antlr. Developers should no modify these classes.
To change the grammar follow the following steps.
1.- Change CommonAnalytics.g4 file in /src/main/antlr4
The fail contains the formal grammar definition
2.- Run maven clean install
3.- Copy generated classs
Maven clean install generates the Parser classes in /target/generated-sources/antlr4 Generated deliverables are CommonAnalytics.tokens CommonAnalyticsBaseListener.java
CommonAnalyticsLexer.javaCommonAnalyticsListener.javaCommonAnalyticsParser.java
Those clases are generated by the maven plugin antlr4-maven-plugin present in pom.xml
Copy and overwrite these classes to de.tum.in.aflux.component.commonanalytics.grammar
Fix the package declaration for each one and remove all @Override annotations in class CommonAnalyticsBaseListener.java
4.- Change the CommonAnalyticsHiveListener.java and CommonAnalyticsPigListener.java to consider the new grammar elements (if they affect the translation to Pig or Hive)
5.- Optional: Add a sample including new syntactical elements in afluxSentencesSample.gr- And add tests to CommonAnalyticsSample.java- It’s no need to run aflux to test CommonAnalyticsSampel.java . It can be ran as a simple java
app.
5.- Build plugin as usual and import it into flux application
How Pig Flows Work
Pig Tool Actors run as other flux Actor. In the sequence diagram the example shows how does run PIgLoadActor followed by a PigLimitActor
As all actors, FluxRunnerImpl starts launching (sending akka messages) to all SIGNAL TYPE actors that has no predecessors.
The model that involves the complete set of executors are seen by FluxRunnerImpl as AbsractMainExecutor instances. It has access to ActorSystem created previously by FluxRunnerImpl.
In this case, AbstractMainExecutor (related with PigLoadActor) sends a message to akkaSystem. The recipient will be PigLoadActor running its runCore method.
Most Pig tools main task involves the following activities.- Generates a PigExecutionPlan
They receive a PigExecutionPlan as input.If input is null, it creates a new PigExecutionPlanIf input is not null, PigExecutionPlan will be the received one.
- Gets the inputs the user entered caling getProperty(…) depending on each tool.- Builds the Pig Latin Sentence (String) being helped by PigBuilder (for building some
clauses) - Most times creates a PigExecutionStep and add it to the PigExecutionPlan
- setOutput -> set as output the new PigExecutionPlan
The output is used by FluxRunnerImpl to be sent to the following tool (In this case Pig LIMIT) via the akkaSystem.
Pig LIMIT receives the plan (containing the LOAD sentence) and executing a similar process generates the LIMIT sentence and adds it to the Plan that sends to the following Pig Tool
Generally Last Tool should be “Pig Execute” that receives a PigExecutionPlan (like all other pig actors) and is explained below.
How Pig Executor Works
The following Sequence Diagram shows main involved clases in a Pig Execution.
As all Pig Actors “Pig Execute” receives a PigExecutionPlan as input.From the plan it gets the script (in text form) and calls the execute method of PigExecutor class (that is present in aflux independent of any plugin)PigExecutor.java uses the Spring class template (PigOperators) that is available as a resource in FluxEnvironmentImpl class (that can be accesses by any actor)