DataLines a framework for building steaming data applications
-
Upload
tatiana-blackwell -
Category
Documents
-
view
31 -
download
1
description
Transcript of DataLines a framework for building steaming data applications
![Page 1: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/1.jpg)
DataLinesa framework for building steaming data applications
Mike HabermanSenior Software/Network Engineer
![Page 2: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/2.jpg)
The Problem
• Data deluge: routers, switches, IDS, servers (web, mail, logs, etc), software (tcpdump, web100, SNMP, tarpit, etc), sensors, taps, … (help me)
??
?
![Page 3: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/3.jpg)
The problem (continues)
• Disparate data formats
• Software (sometimes) to manage each
• Tweaking to get what you want (custom software)
• Correlating data (more custom software)
![Page 4: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/4.jpg)
DataLines
• Can we build a framework that can remove all (most) of the tedium of working with all these disparate data formats?
![Page 5: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/5.jpg)
DataLines Framework
• designed to manage and build streaming data processing applications
![Page 6: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/6.jpg)
DataLines Framework
• designed to manage and build streaming data processing applications
![Page 7: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/7.jpg)
DataLines Framework
• Manage: would like one tool to handle all these different data sources.
designed to manage and build streaming data processing applications
![Page 8: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/8.jpg)
DataLines Framework
• Build: uniform way of creating a data processing application.
designed to manage and build streaming data processing applications
![Page 9: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/9.jpg)
DataLines Framework
• Streaming data:• Never ending stream of ‘manageable’ chunks of data• No random access, no blocking operators• One look, linear or sub-linear algorithms/data ops• Each data item (a tuple in DataLines) is an
independent entity• Many tools were not designed for streaming data
designed to manage and build streaming data processing applications
![Page 10: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/10.jpg)
DataLines Framework
• Processing:• Something you want to do to the data (e.g.
reading, writing, parsing, event generation, filtering, statistics, reports, data synopsis, …)
designed to manage and build streaming data processing applications
![Page 11: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/11.jpg)
DataLines
• Creating a DataLines application:
XML DataLines Application
“compile”
![Page 12: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/12.jpg)
DataLines
• XML file defines 3 major components:– Data Processors
• What one does with the data
– Processing Order• The order in which the processors will operate
on the data
– Event Management• What to do when a processor generates an
event
![Page 13: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/13.jpg)
DataLines Processors• Data Processors are the heart of D.L.
– I/O: socket, file– Filters: inline, dispatch– Collectors: binning, windowing (w/operators)– Gui: charts, picture taking– Converters: binary to tuple– Misc: printers, counters, iterators, timers,
data generators, gates, delays
• Processors can generate events• Processors can drop, mutate, mutilate the
tuple being processed, generate new tuples
![Page 14: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/14.jpg)
DataLines Pipelines
• Control tuple movement among processors
• Can connect either processors or other pipelines
• Two paths within a pipeline: binary and tuple
![Page 15: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/15.jpg)
Event Management
• Allow processors to signal an event– timers, open/close, client connects, etc
• Allow the user to tie in domain logic
• Allow the user to call a processor specific API
![Page 16: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/16.jpg)
DataLines Data
• The generalization of data is a DlTuple
• Tuple is just a set of values
• DlTuple is the interface processors use– String[] <-- getFieldNames()– DlValue <-- getValue(fieldname)
![Page 17: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/17.jpg)
DataLines Data
• Tuples can have virtual fields– calculated values, static values
• Tuples can have composite fields
• The creation of the tuple is left to the processor in charge of conversion
![Page 18: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/18.jpg)
XML Syntax … run away!<application><dataline name =“dl”>
<processor name=“reader” type=“FileReader”><configInfo></configInfo>
</processor>
<pipeline name =“p1”><pipe from = “reader” to = “parser” /><pipe from = “parser” to = “printer” />
</pipeline>
<eventManagement><event name=“start”>
<call method = “start” target = “reader”/></event><event name=“alert” from = “reader”>
<call method=“stop” target=“parser” /></event>
</eventManagement><dataline></application>
![Page 19: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/19.jpg)
Data Example<arg name = “tupleField”>
<map name = “name” value = “Src Ip”/><map name = “peer” value = “IpV4AddressPeer” /><map name = “length” value = “4” />
</arg>
![Page 20: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/20.jpg)
Data Example<arg name = “tupleField”>
<map name = “name” value = “A”/><map name = “peer” value = “IntegerPeer” /><map name = “length” value = “4” />
</arg><arg name = “tupleField”>
<map name = “name” value = “B”/><map name = “peer” value = “IntegerPeer” /><map name = “length” value = “4” />
</arg><arg name = “tupleField”>
<map name = “name” value = “C”/><map name = “peer” value = “JepPeer” /><data name = “expression”>
${A} + ${B}</data>
</arg>
![Page 21: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/21.jpg)
DataLines Tutorial
• Fast forward past a painful 3 hour tutorial covering each of those sections in detail (tuples, processors, pipelines, event management, configurations)
• You have seen all the XML though!
![Page 22: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/22.jpg)
DataLines Distilled
• A library of data processors that operate on “Tuples”
– one of the processors takes the raw data and creates the tuple
• An XML compiler that takes the xml file, the library, and creates an application
![Page 23: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/23.jpg)
DataLines Example
![Page 24: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/24.jpg)
DataLines in use
• DataLines does make it easier to hit the ground running. Much of the tedious work you need to do is taken care of
• For highly specific needs, you still need to write code. But that code then becomes part of the DataLines lib. That others can build on
![Page 25: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/25.jpg)
Balance Sheet• Positive
•Flexible (vendor neutral, data, debugging)•Reusable (pipelines, processors)•Fast development time•“easy” to change the client (cli, desktop, web page)
• Negative
•May need to write domain specific code •Learning curve -- processors config, data expectations, events
![Page 26: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/26.jpg)
DataLines in Action
• Network Engineering group– Monitor router, tar pit, IDS, packet
sampling, L2/L3 mappings• Security Group
– Network forensics
• Intergroup Wiring• Use DataLines to share data between groups/projects
![Page 27: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/27.jpg)
DataLines in Action
• Network Research group– Monitor cluster network activity from MPI
layer– Data Mining– Misc. NSF data oriented projects
![Page 28: DataLines a framework for building steaming data applications](https://reader030.fdocuments.net/reader030/viewer/2022032805/56813196550346895d9808d2/html5/thumbnails/28.jpg)
Future
• Open Source
• More Info: [email protected]
• http://datalines.ncsa.uiuc.edu (a work in progress)