Post on 27-Jul-2015
1
From Mining To Analytics
Making Sense of Medicare Data
2
Tripfilmscom
3
The Achievement Network
4
Archway Health Advisors
Medicare bull Centers for Medicare amp Medicaid Services (CMS)
bull Medicare is a national social insurance program since
1966 covering Americans aged 65 and older
bull ldquoFee For Servicerdquo Model
5
Medicare Fee for Service
6
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days 2 days 12 visits
$3000 $12000 $5000 $4000
= $24000
Bundled Payments for Care bull Better care smarter spending and healthier people
bull The Bundled Payments for Care Improvement (BPCI)
Payment arrangements Based on financial and performance
bull 4 Models
Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-
acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only
7
Medicare BPCI
8
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days
Episode $20000
$20000 - ($3000 + $12000) = $5000
$3000 $12000
Claims bull Statement of services and costs from a healthcare
provider Patient information Diagnosis information Procedure(s) information
bull Types
Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip
9
10
Episode of Care
SNF 1 SNF 2
IP 1
IP 2
SNF 2
IP 3
HHA 1
HHA 2
time
Mining Claims
11
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital Stays
Post-Acute Care
Anchor
Episode Initiator
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
2
Tripfilmscom
3
The Achievement Network
4
Archway Health Advisors
Medicare bull Centers for Medicare amp Medicaid Services (CMS)
bull Medicare is a national social insurance program since
1966 covering Americans aged 65 and older
bull ldquoFee For Servicerdquo Model
5
Medicare Fee for Service
6
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days 2 days 12 visits
$3000 $12000 $5000 $4000
= $24000
Bundled Payments for Care bull Better care smarter spending and healthier people
bull The Bundled Payments for Care Improvement (BPCI)
Payment arrangements Based on financial and performance
bull 4 Models
Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-
acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only
7
Medicare BPCI
8
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days
Episode $20000
$20000 - ($3000 + $12000) = $5000
$3000 $12000
Claims bull Statement of services and costs from a healthcare
provider Patient information Diagnosis information Procedure(s) information
bull Types
Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip
9
10
Episode of Care
SNF 1 SNF 2
IP 1
IP 2
SNF 2
IP 3
HHA 1
HHA 2
time
Mining Claims
11
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital Stays
Post-Acute Care
Anchor
Episode Initiator
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
3
The Achievement Network
4
Archway Health Advisors
Medicare bull Centers for Medicare amp Medicaid Services (CMS)
bull Medicare is a national social insurance program since
1966 covering Americans aged 65 and older
bull ldquoFee For Servicerdquo Model
5
Medicare Fee for Service
6
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days 2 days 12 visits
$3000 $12000 $5000 $4000
= $24000
Bundled Payments for Care bull Better care smarter spending and healthier people
bull The Bundled Payments for Care Improvement (BPCI)
Payment arrangements Based on financial and performance
bull 4 Models
Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-
acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only
7
Medicare BPCI
8
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days
Episode $20000
$20000 - ($3000 + $12000) = $5000
$3000 $12000
Claims bull Statement of services and costs from a healthcare
provider Patient information Diagnosis information Procedure(s) information
bull Types
Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip
9
10
Episode of Care
SNF 1 SNF 2
IP 1
IP 2
SNF 2
IP 3
HHA 1
HHA 2
time
Mining Claims
11
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital Stays
Post-Acute Care
Anchor
Episode Initiator
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
4
Archway Health Advisors
Medicare bull Centers for Medicare amp Medicaid Services (CMS)
bull Medicare is a national social insurance program since
1966 covering Americans aged 65 and older
bull ldquoFee For Servicerdquo Model
5
Medicare Fee for Service
6
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days 2 days 12 visits
$3000 $12000 $5000 $4000
= $24000
Bundled Payments for Care bull Better care smarter spending and healthier people
bull The Bundled Payments for Care Improvement (BPCI)
Payment arrangements Based on financial and performance
bull 4 Models
Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-
acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only
7
Medicare BPCI
8
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days
Episode $20000
$20000 - ($3000 + $12000) = $5000
$3000 $12000
Claims bull Statement of services and costs from a healthcare
provider Patient information Diagnosis information Procedure(s) information
bull Types
Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip
9
10
Episode of Care
SNF 1 SNF 2
IP 1
IP 2
SNF 2
IP 3
HHA 1
HHA 2
time
Mining Claims
11
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital Stays
Post-Acute Care
Anchor
Episode Initiator
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Medicare bull Centers for Medicare amp Medicaid Services (CMS)
bull Medicare is a national social insurance program since
1966 covering Americans aged 65 and older
bull ldquoFee For Servicerdquo Model
5
Medicare Fee for Service
6
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days 2 days 12 visits
$3000 $12000 $5000 $4000
= $24000
Bundled Payments for Care bull Better care smarter spending and healthier people
bull The Bundled Payments for Care Improvement (BPCI)
Payment arrangements Based on financial and performance
bull 4 Models
Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-
acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only
7
Medicare BPCI
8
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days
Episode $20000
$20000 - ($3000 + $12000) = $5000
$3000 $12000
Claims bull Statement of services and costs from a healthcare
provider Patient information Diagnosis information Procedure(s) information
bull Types
Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip
9
10
Episode of Care
SNF 1 SNF 2
IP 1
IP 2
SNF 2
IP 3
HHA 1
HHA 2
time
Mining Claims
11
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital Stays
Post-Acute Care
Anchor
Episode Initiator
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Medicare Fee for Service
6
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days 2 days 12 visits
$3000 $12000 $5000 $4000
= $24000
Bundled Payments for Care bull Better care smarter spending and healthier people
bull The Bundled Payments for Care Improvement (BPCI)
Payment arrangements Based on financial and performance
bull 4 Models
Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-
acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only
7
Medicare BPCI
8
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days
Episode $20000
$20000 - ($3000 + $12000) = $5000
$3000 $12000
Claims bull Statement of services and costs from a healthcare
provider Patient information Diagnosis information Procedure(s) information
bull Types
Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip
9
10
Episode of Care
SNF 1 SNF 2
IP 1
IP 2
SNF 2
IP 3
HHA 1
HHA 2
time
Mining Claims
11
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital Stays
Post-Acute Care
Anchor
Episode Initiator
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Bundled Payments for Care bull Better care smarter spending and healthier people
bull The Bundled Payments for Care Improvement (BPCI)
Payment arrangements Based on financial and performance
bull 4 Models
Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-
acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only
7
Medicare BPCI
8
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days
Episode $20000
$20000 - ($3000 + $12000) = $5000
$3000 $12000
Claims bull Statement of services and costs from a healthcare
provider Patient information Diagnosis information Procedure(s) information
bull Types
Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip
9
10
Episode of Care
SNF 1 SNF 2
IP 1
IP 2
SNF 2
IP 3
HHA 1
HHA 2
time
Mining Claims
11
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital Stays
Post-Acute Care
Anchor
Episode Initiator
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Medicare BPCI
8
Hospital SNF
CMS
Claim Claim Claim Claims Claim Claim Claim Claims
Home Health Hospital (Readmit)
3 days 18 days
Episode $20000
$20000 - ($3000 + $12000) = $5000
$3000 $12000
Claims bull Statement of services and costs from a healthcare
provider Patient information Diagnosis information Procedure(s) information
bull Types
Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip
9
10
Episode of Care
SNF 1 SNF 2
IP 1
IP 2
SNF 2
IP 3
HHA 1
HHA 2
time
Mining Claims
11
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital Stays
Post-Acute Care
Anchor
Episode Initiator
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Claims bull Statement of services and costs from a healthcare
provider Patient information Diagnosis information Procedure(s) information
bull Types
Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip
9
10
Episode of Care
SNF 1 SNF 2
IP 1
IP 2
SNF 2
IP 3
HHA 1
HHA 2
time
Mining Claims
11
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital Stays
Post-Acute Care
Anchor
Episode Initiator
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
10
Episode of Care
SNF 1 SNF 2
IP 1
IP 2
SNF 2
IP 3
HHA 1
HHA 2
time
Mining Claims
11
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital Stays
Post-Acute Care
Anchor
Episode Initiator
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Mining Claims
11
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP SNF HHA IP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital Stays
Post-Acute Care
Anchor
Episode Initiator
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Technologies considered bull Pig (amp Hadoop)
Data Processing Language Procedural Language Relational-oriented
bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary
12
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
HPCC Systems Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
WHT082311
What is HPCC Systems
14
bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler
integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on
MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout
LexisNexis for more than 10 years
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
WHT082311
The HPCC Systems platform
15
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
WHT082311
bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL
Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing
bull Highly efficient automatically distributes workload across all nodes compiles to native machine code
bull Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion End to End platform bull No need for any third party tools
16
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
WHT082311
bull Declarative programming language Describe what needs to be done and not how to do it
bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available
bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand
bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism
bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot
bull Complete ECL provides a complete data programming paradigm
bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery
bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more
Enterprise Control Language (ECL)
17
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
WHT082311
Current Status and Resources
bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements
bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-
component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis
access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface
definition bull JAVA API project ndash facilitates interaction between Java based apps and
HPCC web services and c++ tools bull Available now ndash HPCCSystemscom
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Data mining with HPCC Systems bull Thor
Responsible for processing vast amount of data Optimized for Extraction Transformation Loading
Sorting and Linking Data
bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL
19
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
20
SQL vs ECL
SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs
FROM inpatient_claims
GROUP BY diag_group_cd
TABLE( inpatient_claims
diag_group_cd INTEGER volume =
COUNT(GROUP) REAL costs = SUM(pmt_amt)
diag_group_cd
)
SQL ECL
SELECT
FROM inpatient_claims LEFT JOIN ip_value_codes
RIGHT ON LEFTid = RIGHTid
JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid
)
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims
OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN
hellip END CLOSE my_cursor DEALLOCATE my_cursor
ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout
SELFis_dropped = is_one_year_or_greater(
RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT
) )
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
Tx LEFT RIGHT
RB R4 R6 R5
ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for
most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim
2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay
3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode
4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level
5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file
23
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1
is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))
H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4
24
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
25
Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
26
Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Beyond Processing Data
bull Security amp Authentication
bull Collaboration
bull Unit Tests
bull Visualizations
27
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Beyond Processing Security
bull HTTPS
bull Htpasswd
bull LDAP support
bull File level security when using LDAP
28
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Beyond Processing Workunits bull Workunit Identifier
bull Attribution
bull Query
bull Timings
bull Results
29
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
30
Beyond Processing Collaboration
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
31
Beyond Processing Collaboration (2)
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
32
Beyond Processing Collaboration
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
33
Beyond Processing Collaboration
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
34
Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Beyond Processing Unit Tests (2)
35
Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )
simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
36
Beyond Processing Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episo
des
Costs in $1000
No readmit 1 Readmit 2 Readmits
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Data Delivery Roxie
bull Data Delivery Engine
bull Indexed compressed and in-memory
bull Data Warehouse Capabilities
bull Data Services
40
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Data Services
bull Web Services over Data Warehouse
bull XMLSOAP but also JSON
bull Web Services defined in ECL
bull Solution to addremove data from the cluster
41
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Data Service Query bull Define service using ECL
42
INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
43
Data (Web) Services
summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip
summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813
hellip ] hellip
Request Response
httpsWsEclformsjsonqueryroxiesummary
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
44
Data Services WsECL
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
45
Data Services WsECL
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
Loading up data bull Logical vs Physical
~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile
bull ECL to load data into cluster
46
oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)
Layoutsip_claim_layout CSV(HEADING(0)) )
oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)
oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)
bull ECL to use data loaded into cluster
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout
47
WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END
SEQUENTIAL( StdFileStartSuperFileTransaction()
StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )
bull Including (more) data
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
48
Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]
UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]
UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(
CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)
PARALLEL(O1 O2 O3) ENDMACRO
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
49
AHA System Architecture
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
50
Archway Analytics
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com
51
lpezetarchwayhacom wwwlinkedincominlucpezet
mezzetinblogspotcom
HPCC Systems open source portal httphpccsystemscom
Thank you Questions Feedback Questions Feedback
wwwlinkedincominlucpezet
mezzetin blogspot com