HPCC Systems - ECL for Programmers - Big Data - Data Scientist
-
Upload
fujio-turner -
Category
Technology
-
view
279 -
download
4
Transcript of HPCC Systems - ECL for Programmers - Big Data - Data Scientist
![Page 1: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/1.jpg)
By Fujio Turner
HPCC Systems - ECL Intro Big Data Querying Made EZ
Enterprise Control Language explained for Programmers
@FujioTurner
![Page 2: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/2.jpg)
LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,!
accounting and academic markets. !!
LexisNexis has been in business since 1977 with over 30,000 employees worldwide.
What is HPCC Systems?Who is LexisNexis?
LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.
![Page 3: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/3.jpg)
Comparison
JAVA C++
Petabytes
1-80,000 Jobs/day
Since 2005
Exabytes
Non-Indexed 4X-13X
Since 2000
Indexed: 2K-3K Jobs/sec
? ? ? ? ? ?
Thor Roxie
Block Based File Based
![Page 4: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/4.jpg)
What Is ECL?ECL (Enterprise Control Language) is a C++ based query language for use with HPCC Systems Big Data platform. ECLs syntax and format is very simple and easy to learn.!!
Note - ECL is very similar to Hadoop’s pig ,but!more expressive and feature rich.
![Page 5: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/5.jpg)
Comparing ECL to General Programming
ECLGeneral
In this presentation you will see how in ECL loading and querying data is just like reading and finding data in a plain text file.!
general programming (general common logic)!vs.!
ECL
ECL Code HEREGeneral Code HERE
![Page 6: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/6.jpg)
Example Text File
Kevin CA 45 Mark MI 27 Sara FL 64
Name State Age
Customer Data May 2010
~/cdata_2010.txt!example file name
~/hpcc::cdata_2010.txt=ECL example file distributed in HPCC cluster
![Page 7: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/7.jpg)
d = fopen(‘~/cdata_2010.txt’)
Opening File: general programming vs ECL
ECLGeneral
File Location
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
![Page 8: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/8.jpg)
d = fopen(‘~/cdata_2010.txt’)
Opening File: general programming vs ECL
ECLGeneral
File Location
Open File Function
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
![Page 9: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/9.jpg)
Organizing: general programming vs ECL
new_d = split( d ,“\r\n”)
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
Split Data(d) by Rowd := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
![Page 10: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/10.jpg)
Organizing: general programming vs ECL
new_d = split( d ,“\r\n”)
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
ECLGeneral
Use This Schema on this file!to Give Structure to Data
Kevin CA 45 Mark MI 27 Sara FL 64
Split Data(d) by Rowd := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
![Page 11: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/11.jpg)
Find “Sara”: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Split Data by Column
![Page 12: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/12.jpg)
Find “Sara”: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Split Data by Column
Filter Data By
![Page 13: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/13.jpg)
Find “Sara”: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Split Data by Column
Filter Data By
Output
![Page 14: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/14.jpg)
Find “Sara”: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}
sara := d(Name = ‘Sara’);
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Split Data by Column
Filter Data By
Output
![Page 15: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/15.jpg)
Find “Sara”: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}
sara := d(Name = ’Sara’);
OUTPUT(sara);
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Split Data by Column
Filter Data By
Output
![Page 16: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/16.jpg)
Find “Sara” & Older then 50: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = row.split(“ ”)!! if(new row[0] == ‘Sara’ and row[2] >50){!! ! print ”Found Sara”!! }!}
sara := d(Name = ‘Sara’ AND Age > 50);
OUTPUT(sara);
ECLGeneral
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
![Page 17: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/17.jpg)
ECL is EZ•Make your own functions & libraries in ECL.!•Modularize your code with “Import”: reuse old code
Machine Learning Built-in
http://hpccsystems.com/ml
![Page 18: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/18.jpg)
ECL Plugin for Eclipse IDE
http://hpccsystems.com/products-and-services/products/plugins/eclipse-ide
![Page 19: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/19.jpg)
ECL + Others Languages
ECL is C++ based so all your C/C++ code can be used in ECL.!&!
Use other languages and methods like below to query too.
![Page 20: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/20.jpg)
ECL GUIDEhttp://hpccsystems.com/download/docs/ecl-language-referenceJOIN!
MERGE!LENGTH!REGEX!
ROUND!SUM!
COUNT!TRIM!WHEN!
AVE!ABS!
CASE!DEDUP!
NORMALIZE!DENORMALIZE!
IF!SORT!
GROUP!more ….
![Page 21: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/21.jpg)
Query with Plain SQL
http://www.slideshare.net/FujioTurner/meet-up-sqldemopp
For More HPCC “How To’s” Go to
http://www.slideshare.net/hpccsystems/jdbc-hpcc
SQL TO ECLor
![Page 22: HPCC Systems - ECL for Programmers - Big Data - Data Scientist](https://reader033.fdocuments.net/reader033/viewer/2022052911/559f003a1a28ab28378b45a3/html5/thumbnails/22.jpg)
http://www.youtube.com/watch?v=8SV43DCUqJg
Watch how to install HPCC Systems
in 5 Minutes
Download HPCC Systems Open Source
Community Edition
or
Source Codehttps://github.com/hpcc-systems
http://hpccsystems.com/download/