Sampling Data for Better Understanding

60
1 Sampling Data for Better Understanding Understanding your data is vital. When faced with millions of records, sampling is necessary, and this session will show some great techniques and tricks to better understand your data or Data Mart tables. Foxwoods Resort & Casino Ledyard Ct August 22. 2008 Jim Morrow

description

Sampling Data for Better Understanding Understanding your data is vital. When faced with millions of records, sampling is necessary, and this session will show some great techniques and tricks to better understand your data or Data Mart tables. Foxwoods Resort & Casino Ledyard Ct - PowerPoint PPT Presentation

Transcript of Sampling Data for Better Understanding

Page 1: Sampling Data for Better Understanding

1

Sampling Data for Better Understanding

Understanding your data is vital. When faced with millions of records, sampling is necessary, and this session will show some great techniques and tricks to better understand your data or Data Mart tables.

Foxwoods Resort & Casino

Ledyard Ct

August 22. 2008

Jim Morrow

Page 2: Sampling Data for Better Understanding

2

Topics

Limiting the records processed

Counting Records

Determining The Domain of Data

Finding the biggest / smallest values

Dropping the Outliers

Skipping records in your sample

Page 3: Sampling Data for Better Understanding

3

WHERE RECORDLIMIT EQ 1000

IF RECORDLIMIT EQ 1000

Stops the processing of the table request after 1000 records have passed the where or if test.

WHERE READLIMIT EQ 1000

IF READLIMIT EQ 1000

Stops the processing of the table request after 1000 records read from the data source. This is not used for FOCUS Data Bases..

Page 4: Sampling Data for Better Understanding

4

NUMBER OF RECORDS IN TABLE= 41 LINES= 41

TABLE FILE EMPDATAPRINT PIN LASTNAME FIRSTNAMEEND

PIN LASTNAME FIRSTNAME000000010 VALINO DANIEL 000000020 BELLA MICHAEL 000000030 CASSANOVA LOIS 000000040 ADAMS RUTH 000000050 ADDAMS PETER 000000060 PATEL DORINA

\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\//\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

000000370 WANG JOHN 000000380 ELLNER DAVID 000000390 GRAFF ELAINE 000000400 LOPEZ ANNE 000000410 CONTI MARSHALL

Page 5: Sampling Data for Better Understanding

5

TABLE FILE EMPDATAPRINT PIN LASTNAME FIRSTNAMEWHERE RECORDLIMIT EQ 10END

PIN LASTNAME FIRSTNAME

000000010 VALINO DANIEL

000000020 BELLA MICHAEL

000000030 CASSANOVA LOIS

000000040 ADAMS RUTH

000000050 ADDAMS PETER

000000060 PATEL DORINA

000000070 SANCHEZ EVELYN

000000080 SO PAMELA

000000090 PULASKI MARIANNE

000000100 ANDERSON TIM

PAGE 1

NUMBER OF RECORDS IN TABLE= 10 LINES= 10

Page 6: Sampling Data for Better Understanding

6

TABLE FILE EMPDATAPRINT PIN LASTNAME FIRSTNAMEWHERE LASTNAME CONTAINS 'E'WHERE RECORDLIMIT EQ 10END

PIN LASTNAME FIRSTNAME 000000020 BELLA MICHAEL 000000060 PATEL DORINA 000000070 SANCHEZ EVELYN 000000100 ANDERSON TIM 000000130 CVEK MARCUS 000000140 WHITE VERONICA 000000150 WHITE KARL 000000190 MEDINA MARK 000000220 LEWIS CASSANDRA 000000260 ROSENTHAL KATRINA

NUMBER OF RECORDS IN TABLE= 10 LINES= 10

Page 7: Sampling Data for Better Understanding

7

-SET &&LIMIT = 'WHERE RECORDLIMIT EQ 2 ';EX FUN_EMPDATA_REC_2.FEX

-DEFAULT &&LIMIT = ' 'TABLE FILE EMPDATAPRINT PIN LASTNAME FIRSTNAME&&LIMITEND

PIN LASTNAME FIRSTNAME

000000010 VALINO DANIEL

000000020 BELLA MICHAEL

NUMBER OF RECORDS IN TABLE= 2 LINES= 2

Page 8: Sampling Data for Better Understanding

8

FILENAME=EMPDATA, SUFFIX=FOCSEGNAME=EMPDATA, SEGTYPE=S1 FIELDNAME=PIN, ALIAS=ID, FORMAT=A9, INDEX=I, $/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/ FIELDNAME=HIREDATE, ALIAS=HDAT, FORMAT=YMD, $DEFINE AREA/A13=DECODE DIV (NE 'NORTH EASTERN' SE 'SOUTH EASTERN'CE 'CENTRAL' WE 'WESTERN' CORP 'CORPORATE' ELSE 'INVALID AREA');$END

DBA=DBAUSER,$USER= , ACCESS=RW, $USER=LIM5, ACCESS=R, RESTRICT=VALUE,NAME=SYSTEM, VALUE=RECORDLIMIT EQ 5,$

Page 9: Sampling Data for Better Understanding

9

NUMBER OF RECORDS IN TABLE= 41 LINES= 41

TABLE FILE FUN_EMPDATAPRINT PIN LASTNAME FIRSTNAMEEND

PIN LASTNAME FIRSTNAME000000010 VALINO DANIEL 000000020 BELLA MICHAEL 000000030 CASSANOVA LOIS 000000040 ADAMS RUTH 000000050 ADDAMS PETER 000000060 PATEL DORINA

\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\//\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

000000370 WANG JOHN 000000380 ELLNER DAVID 000000390 GRAFF ELAINE 000000400 LOPEZ ANNE 000000410 CONTI MARSHALL

Page 10: Sampling Data for Better Understanding

10

NUMBER OF RECORDS IN TABLE= 5 LINES= 5

ACCESS LIMITED BY PASSWORD

SET USER = LIM5TABLE FILE FUN_EMPDATAPRINT PIN LASTNAME FIRSTNAMEWHERE LASTNAME CONTAINS 'E'END

PIN LASTNAME FIRSTNAME 000000020 BELLA MICHAEL 000000060 PATEL DORINA 000000070 SANCHEZ EVELYN 000000100 ANDERSON TIM 000000130 CVEK MARCUS

Page 11: Sampling Data for Better Understanding

11

FILTER FILE EMPDATA ADDNAME=LIMIT10,DESC=LIMITING RECORDS TO 1OIF RECORDLIMIT EQ 10END-RUN

SET FILTER=* IN EMPDATA ON

-RUN

? FILTER

-RUN

TABLE FILE EMPDATA PRINT PIN LASTNAME FIRSTNAME WHERE LASTNAME CONTAINS 'E' END

Page 12: Sampling Data for Better Understanding

12

SET FILE FILTER NAME DESCRIPTION--- -------- ----------- ---------------------------------* EMPDATA LIMIT10 LIMITING RECORDS TO 1O

NUMBER OF RECORDS IN TABLE= 10 LINES= 10ACCESS LIMITED BY FILTERS

PIN LASTNAME FIRSTNAME 000000020 BELLA MICHAEL 000000060 PATEL DORINA 000000070 SANCHEZ EVELYN 000000100 ANDERSON TIM 000000130 CVEK MARCUS 000000140 WHITE VERONICA 000000150 WHITE KARL 000000190 MEDINA MARK 000000220 LEWIS CASSANDRA 000000260 ROSENTHAL KATRINA

Page 13: Sampling Data for Better Understanding

13

Topics

Limiting the records processed

Counting Records

Determining The Domain of Data

Finding the biggest / smallest values

Dropping the Outliers

Skipping records in your sample

Page 14: Sampling Data for Better Understanding

14

SALARYCOUNT SALARY

41 $2,029,200.00

TABLE FILE EMPDATACOUNT SALARY/I8C SUM.SALARYEND

NUMBER OF RECORDS IN TABLE= 41 LINES= 1

SALARYCOUNT SALARY

41 $2,029,200.00

TABLE FILE EMPDATASUM CNT.SALARY/I8C SALARYEND

NUMBER OF RECORDS IN TABLE= 41 LINES= 1

Page 15: Sampling Data for Better Understanding

15

DEFINE FILE EMPDATACTR/I8C WITH PIN = 1;ENDTABLE FILE EMPDATASUM CTR SALARYEND

CTR SALARY 41 $2,029,200.00

NUMBER OF RECORDS IN TABLE= 41 LINES= 1

Page 16: Sampling Data for Better Understanding

16

Topics

Limiting the records processed

Counting Records

Determining The Domain of Data

Finding the biggest / smallest values

Dropping the Outliers

Skipping records in your sample

Page 17: Sampling Data for Better Understanding

17

DataSource

“Summarization”Processing

Output(Report,

Hold Fileor

Save file)

WHERE&IF

DEFINE

TABLE FILE CAR PRINT CONTINENT CAR RCOST DCOSTCOMPUTE PROFIT/D7 = RCOST - DCOST; WHERE COUNTRY IN ('ENGLAND' , 'JAPAN' , 'FRANCE')WHERE TOTAL PROFIT GT 1000 BY COUNTRYEND

DEFINE FILE CARCONTINENT/A20 = IF COUNTRY EQ 'JAPAN' THEN 'ASIA' ELSE 'EUROPE';CTR/I2 WITH LENGTH = 1;END

COMPUTE

WHERE TOTAL&

IF TOTAL

Page 18: Sampling Data for Better Understanding

18

COUNTRY CONTINENT CAR RETAIL_COST DEALER_COST PROFIT ENGLAND EUROPE JAGUAR 8,878 7,427 1,451 EUROPE JAGUAR 13,491 11,194 2,297 EUROPE JENSEN 17,850 14,940 2,910

Page 19: Sampling Data for Better Understanding

19

FILE=NUMBERS, SUFFIX=FIXSEGNAME=SEG01, SEGTYPE=S0, $ FIELD=ID , USAGE=A03 , ACTUAL=A03, $ FIELD=NBR , USAGE=I02L , ACTUAL=A02, $

AA 56AA 92AA 19AA 27

BB 14BB 13BB 82BB 50

CC 03CC 62CC 10CC 12

Page 20: Sampling Data for Better Understanding

20

ID NBR

AA 56

AA 92

AA 19

AA 27

BB 14

BB 13

BB 82

BB 50

CC 03

CC 62

CC 10

CC 12

NUMBER OF RECORDS IN TABLE= 12 LINES= 12

TABLEF FILE NUMBERSPRINT *END

Page 21: Sampling Data for Better Understanding

21

MIN NBR

MAX NBR

FST NBR

LST NBR

03 92 56 12

TABLE FILE NUMBERSSUM MIN.NBR MAX.NBR FST.NBR LST.NBREND

ID NBR

AA 56

AA 92

AA 19

AA 27

BB 14

BB 13

BB 82

BB 50

CC 03

CC 62

CC 10

CC 12

Page 22: Sampling Data for Better Understanding

22

IDMIN NBR

MAX NBR

FST NBR

LST NBR

AA 19 92 56 27

BB 13 82 14 50

CC 03 62 03 12

ID NBR

AA 56

AA 92

AA 19

AA 27

BB 14

BB 13

BB 82

BB 50

CC 03

CC 62

CC 10

CC 12

TABLE FILE NUMBERSSUM MIN.NBR MAX.NBR FST.NBR LST.NBR BY IDEND

Page 23: Sampling Data for Better Understanding

23

ID NBR

AA 56

AA 92

AA 19

AA 27

BB 14

BB 13

BB 82

BB 50

CC 3

CC 62

CC 10

CC 12

First Record

ID MIN.NBR MAX.NBR FST.NBR LST.NBR

AA 56 56 56 56

Second Record

ID MIN.NBR MAX.NBR FST.NBR LST.NBR

AA 56 92 56 92

Third Record

ID MIN.NBR MAX.NBR FST.NBR LST.NBR

AA 19 92 56 19

Fourth Record

ID MIN.NBR MAX.NBR FST.NBR LST.NBR

AA 19 92 56 27

Page 24: Sampling Data for Better Understanding

24

ID NBR

AA 56

AA 92

AA 19

AA 27

BB 14

BB 13

BB 82

BB 50

CC 3

CC 62

CC 10

CC 12

Fourth Record

ID MIN.NBR MAX.NBR FST.NBR LST.NBR

AA 19 92 56 27

Fifth Record

ID MIN.NBR MAX.NBR FST.NBR LST.NBR

AA 19 92 56 27

BB 14 14 14 14

Sixth Record

ID MIN.NBR MAX.NBR FST.NBR LST.NBR

AA 19 92 56 27

BB 13 14 14 13

Page 25: Sampling Data for Better Understanding

25

ID NBR

AA 56

AA 92

AA 19

AA 27

BB 14

BB 13

BB 82

BB 50

CC 3

CC 62

CC 10

CC 12

Sixth Record

ID MIN.NBR MAX.NBR FST.NBR LST.NBR

AA 19 92 56 27

BB 13 14 14 13Seventh Record

ID MIN.NBR MAX.NBR FST.NBR LST.NBR

AA 19 92 56 27

BB 13 82 14 82

Eighth Record

ID MIN.NBR MAX.NBR FST.NBR LST.NBR

AA 19 92 56 27

BB 13 82 14 50

Page 26: Sampling Data for Better Understanding

26

ID NBR

AA 56

AA 92

AA 19

AA 27

BB 14

BB 13

BB 82

BB 50

CC 3

CC 62

CC 10

CC 12

Last Record

ID MIN.NBR MAX.NBR FST.NBR LST.NBR

AA 19 92 56 27

BB 13 82 14 50

CC 3 62 3 12

Page 27: Sampling Data for Better Understanding

27

MIN NBR

MAX NBR

FST NBR

LST NBR ID

MIN NBR

MAX NBR

FST NBR

LST NBR

03 92 56 12 AA 19 92 56 27

BB 13 82 14 50

CC 03 62 03 12

TABLE FILE NUMBERSSUM MIN.NBR MAX.NBR FST.NBR LST.NBRSUM MIN.NBR MAX.NBR FST.NBR LST.NBR BY IDEND

Page 28: Sampling Data for Better Understanding

28

DEFINE FILE EMPDATACTR/I8C WITH PIN = 1;END

TABLE FILE EMPDATASUM CTR COMPUTE AVE_SALARY/P12.2C = SALARY / CTR;SALARY MIN.SALARY MAX.SALARYBY DEPTON TABLE SUMMARIZEEND

Page 29: Sampling Data for Better Understanding

29

MIN MAX DEPT CTR SALARY SALARY SALARY AVE_SALARY ACCOUNTING 5 $283,300.00 $26,400.00 $83,000.00 56,660.00 ADMIN SERVICES 2 $56,200.00 $25,400.00 $30,800.00 28,100.00 CONSULTING 3 $126,300.00 $35,900.00 $49,500.00 42,100.00 CUSTOMER SUPPORT 4 $198,400.00 $19,300.00 $62,500.00 49,600.00 MARKETING 11 $570,700.00 $32,300.00 $62,500.00 51,881.82 PERSONNEL 5 $216,800.00 $25,000.00 $80,500.00 43,360.00 PROGRAMMING & DVLPMT 4 $182,300.00 $40,900.00 $49,500.00 45,575.00 SALES 7 $395,200.00 $30,500.00 $115,000.00 56,457.14 TOTAL 41 $2,029,200.00 $235,700.00 $533,300.00 49,492.68

NUMBER OF RECORDS IN TABLE= 41 LINES= 8

Page 30: Sampling Data for Better Understanding

30

DEFINE FILE EMPDATA CTR/I8C WITH SALARY = 1; TOT_SAL/D12.2M = SALARY; END TABLE FILE EMPDATA HEADING " CTR <20 <CTR " " TOT_SAL <20 <TOT_SAL " " AVE.TOT_SAL <20 <AVE.TOT_SAL " " MIN.TOT_SAL <20 <MIN.TOT_SAL " " MAX.TOT_SAL <20 <MAX.TOT_SAL " SUM CTR NOPRINT TOT_SAL NOPRINT AVE.TOT_SAL NOPRINT MIN.TOT_SAL NOPRINT MAX.TOT_SAL NOPRINT SUM CTR SALARY AVE.SALARY MIN.SALARY MAX.SALARY BY DIV END

Page 31: Sampling Data for Better Understanding

31

DIV CTR SALARYAVE

SALARYMIN

SALARYMAX

SALARY

CE 10 $493,700.00 $49,370.00 $25,000.00 $115,000.00

CORP 8 $436,500.00 $54,562.50 $26,400.00 $83,000.00

NE 7 $304,200.00 $43,457.14 $19,300.00 $62,500.00

SE 7 $390,400.00 $55,771.43 $35,900.00 $80,500.00

WE 9 $404,400.00 $44,933.33 $30,500.00 $70,000.00

CTR 41

TOT_SAL $2,029,200.00

AVE.TOT_SAL $49,492.68

MIN.TOT_SAL $19,300.00

MAX.TOT_SAL $115,000.00

Page 32: Sampling Data for Better Understanding

32

Topics

Limiting the records processed

Counting Records

Determining The Domain of Data

Finding the biggest / smallest values

Dropping the Outliers

Skipping records in your sample

Page 33: Sampling Data for Better Understanding

33

DEPT PIN COUNT ACCOUNTING 5 ADMIN SERVICES 2 CONSULTING 3 CUSTOMER SUPPORT 4 MARKETING 11 PERSONNEL 5 PROGRAMMING & DVLPMT 4 SALES 7

TABLE FILE EMPDATA COUNT PIN/I8C BY DEPT END

NUMBER OF RECORDS IN TABLE= 41 LINES= 8

Page 34: Sampling Data for Better Understanding

34

TABLE FILE EMPDATAPRINT COMPUTE MESG/A10 = 'HIGHEST';RANKED BY HIGHEST 5 SALARYEND

RANK SALARY MESG 1 $115,000.00 HIGHEST 2 $83,000.00 HIGHEST 3 $80,500.00 HIGHEST 4 $79,000.00 HIGHEST 5 $70,000.00 HIGHEST

TABLE FILE EMPDATA PRINT COMPUTE MESG/A10 = 'LOWEST'; RANKED BY LOWEST 5 SALARY END

RANK SALARY MESG 1 $19,300.00 LOWEST 2 $25,000.00 LOWEST 3 $25,400.00 LOWEST 4 $26,400.00 LOWEST 5 $30,500.00 LOWEST

Page 35: Sampling Data for Better Understanding

35

TABLE FILE EMPDATA PRINT COMPUTE MESG/A10 = 'HIGHEST'; RANKED BY HIGHEST 5 SALARY ON TABLE HOLD AS HI_LIST FORMAT ALPHA END TABLE FILE EMPDATA PRINT COMPUTE MESG/A10 = 'LOWEST'; RANKED BY LOWEST 5 SALARY ON TABLE HOLD AS LO_LIST FORMAT ALPHA END TABLE FILE HI_LIST SUM SALARY BY MESG AS '' ACROSS RANK AS '' MORE FILE LO_LIST END

Page 36: Sampling Data for Better Understanding

36

1 2 3 4 5 HIGHEST $115,000.00 $83,000.00 $80,500.00 $79,000.00 $70,000.00 LOWEST $19,300.00 $25,000.00 $25,400.00 $26,400.00 $30,500.00

NUMBER OF RECORDS IN TABLE= 41 LINES= 41(BEFORE TOTAL TESTS) NUMBER OF RECORDS IN TABLE= 41 LINES= 41(BEFORE TOTAL TESTS) NUMBER OF RECORDS IN TABLE= 10 LINES= 2

Page 37: Sampling Data for Better Understanding

37

Topics

Limiting the records processed

Counting Records

Determining The Domain of Data

Finding the biggest / smallest values

Dropping the Outliers

Skipping records in your sample

Page 38: Sampling Data for Better Understanding

38

-SET &HI = 2;-SET &LO = 2;DEFINE FILE EMPDATATOT_CTR/I8 WITH PIN = 1;ENDTABLE FILE EMPDATASUM TOT_CTRCOMPUTE BOT/I8 = TOT_CTR - &LO; BY DIV PRINT COMPUTE CTR/I8 = IF DIV NE LAST DIV THEN 1 ELSE LAST CTR + 1; COMPUTE HILO/A4 = IF CTR LE &HI THEN 'HIGH' ELSE IF CTR GT BOT THEN 'LOW' ELSE 'N'; COMPUTE JOIN_KEY/A20 = DIV || EDIT(FST.SALARY); COMPUTE OUTLIER/A3 = 'YES';BY DIV BY SALARY WHERE TOTAL HILO NE 'N';ON TABLE HOLD AS HOLDOUTL FORMAT FOCUS INDEX JOIN_KEYEND

Page 39: Sampling Data for Better Understanding

39

DIV TOT_CTR BOT SALARY FOCLIST CTR HILO SALARY JOIN_KEY OUTLIER CE 10 8 $25,000.00 1 1 HIGH $25,000.00 CE000000025000 YES CE 10 8 $25,400.00 2 2 HIGH $25,400.00 CE000000025400 YES CE 10 8 $33,300.00 3 3 N $33,300.00 CE000000033300 YES CE 10 8 $40,900.00 4 4 N $40,900.00 CE000000040900 YES CE 10 8 $43,000.00 5 5 N $43,000.00 CE000000043000 YES CE 10 8 $45,000.00 6 6 N $45,000.00 CE000000045000 YES CE 10 8 $49,500.00 7 7 N $49,500.00 CE000000049500 YES CE 10 8 $54,100.00 8 8 N $54,100.00 CE000000054100 YES CE 10 8 $62,500.00 9 9 LOW $62,500.00 CE000000062500 YES CE 10 8 $115,000.00 10 10 LOW $115,000.00 CE000000115000 YES CORP 8 6 $26,400.00 1 1 HIGH $26,400.00 CORP000000026400 YES CORP 8 6 $32,400.00 2 2 HIGH $32,400.00 CORP000000032400 YES CORP 8 6 $35,200.00 3 3 N $35,200.00 CORP000000035200 YES CORP 8 6 $55,500.00 4 4 N $55,500.00 CORP000000055500 YES CORP 8 6 $62,500.00 5 5 N $62,500.00 CORP000000062500 YES CORP 8 6 $62,500.00 6 6 N $62,500.00 CORP000000062500 YES CORP 8 6 $79,000.00 7 7 LOW $79,000.00 CORP000000079000 YES CORP 8 6 $83,000.00 8 8 LOW $83,000.00 CORP000000083000 YES NE 7 5 $19,300.00 1 1 HIGH $19,300.00 NE000000019300 YES NE 7 5 $32,300.00 2 2 HIGH $32,300.00 NE000000032300 YES NE 7 5 $39,000.00 3 3 N $39,000.00 NE000000039000 YES NE 7 5 $43,600.00 4 4 N $43,600.00 NE000000043600 YES NE 7 5 $52,000.00 5 5 N $52,000.00 NE000000052000 YES /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

Eliminated by WHERE TOTAL HILO NE 'N';

Eliminated by

WHERE TOTAL HILO NE 'N';

Eliminated by WHERE TOTAL HILO NE 'N';

Page 40: Sampling Data for Better Understanding

40

JOIN CLEAR *JOIN JOIN_KEY WITH SALARY IN EMPDATA TO JOIN_KEY IN HOLDOUTL AS JEMPOUT END

EMPDATA 01 S1 ************** *PIN **I *LASTNAME ** *FIRSTNAME ** *MIDINITIAL ** *************** ************** I SEG02 02 I KU .............. :SALARY : :FOCLIST : :CTR : :JOIN_KEY :K :............: I SEG01 03 I KLU .............. :DIV : :TOT_CTR : :BOT : :............:

Page 41: Sampling Data for Better Understanding

41

DEFINE FILE EMPDATA JOIN_KEY/A20 WITH SALARY = DIV || EDIT(SALARY);END

TABLE FILE EMPDATA" WITH OUTLIERS REMOVED"SUM MIN.SALARY MAX.SALARYBY DIV WHERE OUTLIER NE 'YES' END

Page 42: Sampling Data for Better Understanding

42

DEFINE FILE EMPDATA

TOT_CTR/I8 WITH PIN = 1;ENDTABLE FILE EMPDATASUM

TOT_CTR COMPUTE BOT/I8 = TOT_CTR - &LO; BY DIV PRINT COMPUTE CTR/I8 = IF DIV NE LAST DIV THEN 1 ELSE LAST CTR + 1;

COMPUTE HILO/A4 = IF CTR LE &HI THEN 'HIGH' ELSE IF CTR GT BOT THEN 'LOW' ELSE 'N'; COMPUTE JOIN_KEY/A20 = DIV || EDIT(FST.SALARY); COMPUTE OUTLIER/A3 = 'YES';BY DIV BY SALARY

WHERE TOTAL HILO NE 'N';ON TABLE HOLD AS HOLDOUTL FORMAT FOCUS INDEX JOIN_KEYENDJOIN CLEAR *

JOIN JOIN_KEY WITH SALARY IN EMPDATA TO JOIN_KEY IN HOLDOUTL AS JEMPOUT END DEFINE FILE EMPDATA

JOIN_KEY/A20 WITH SALARY = DIV || EDIT(SALARY);ENDTABLE FILE EMPDATA" WITH OUTLIERS REMOVED"SUM MIN.SALARY MAX.SALARYBY DIV

WHERE OUTLIER NE 'YES' END

Page 43: Sampling Data for Better Understanding

43

WITH OUTLIERS REMOVED

DIV MIN MAX SALARY SALARYCE $33,300.00 $54,100.00CORP $35,200.00 $62,500.00NE $39,000.00 $52,000.00SE $49,500.00 $50,500.00WE $33,000.00 $54,100.00

NUMBER OF RECORDS IN TABLE= 41 LINES= 41(BEFORE TOTAL TESTS) NUMBER OF RECORDS IN TABLE= 20 LINES= 5

Page 44: Sampling Data for Better Understanding

44

Topics

Limiting the records processed

Counting Records

Determining The Domain of Data

Finding the biggest / smallest values

Dropping the Outliers

Skipping records in your sample

Page 45: Sampling Data for Better Understanding

45

DEFINE FILE EMPDATAREC_NBR/I5 WITH PIN = LAST REC_NBR + 1;TAKE/I5 WITH PIN = IMOD(REC_NBR,10,'I5');END

TABLE FILE EMPDATAPRINT REC_NBR TAKE LASTNAME SALARYEND

Page 46: Sampling Data for Better Understanding

46

REC_NBR TAKE LASTNAME SALARY

1 1 VALINO $55,500.00

2 2 BELLA $62,500.00

3 3 CASSANOVA $70,000.00

4 4 ADAMS $62,500.00

5 5 ADDAMS $54,100.00

6 6 PATEL $55,500.00

7 7 SANCHEZ $83,000.00

8 8 SO $43,400.00

9 9 PULASKI $33,000.00

10 0 ANDERSON $32,400.00

11 1 RUSSO $19,300.00

12 2 WANG $49,500.00

13 3 CVEK $62,500.00

Page 47: Sampling Data for Better Understanding

47

REC_NBR TAKE LASTNAME SALARY

1 1 VALINO $55,500.00

11 1 RUSSO $19,300.00

21 1 DUBOIS $43,600.00

31 1 LIEBER $52,000.00

41 1 CONTI $32,300.00

DEFINE FILE EMPDATAREC_NBR/I5 WITH PIN = LAST REC_NBR + 1;TAKE/I5 WITH PIN = IMOD(REC_NBR,10,'I5');ENDTABLE FILE EMPDATAPRINT REC_NBR TAKE LASTNAME SALARY

WHERE TAKE EQ 1END

Page 48: Sampling Data for Better Understanding

48

SET DMPRECISION = 3-RUN -SET &DIV = 10000;-SET &PER = 100 * (1/&DIV);-SET &PER = &PER || '%';DEFINE FILE EMPDATAREC_NBR/I5 WITH PIN = LAST REC_NBR + 1;TAKE/I5 WITH PIN = IMOD(REC_NBR,&DIV,'I5');ENDTABLE FILE EMPDATA" SAMPLE &PER OF FILE "PRINTPIN LASTNAME FIRSTNAMEWHERE TAKE EQ 1END

SAMPLE 0.01% OF FILE PIN LASTNAME FIRSTNAME 000000010 VALINO DANIEL

NUMBER OF RECORDS IN TABLE= 1 LINES= 1

Page 49: Sampling Data for Better Understanding

49

DMPRECISION IS 0 2 / 3 = 1DMPRECISION IS 1 2 / 3 = 0.7DMPRECISION IS 2 2 / 3 = 0.67DMPRECISION IS 3 2 / 3 = 0.667DMPRECISION IS 4 2 / 3 = 0.6667DMPRECISION IS 5 2 / 3 = 0.66667DMPRECISION IS 6 2 / 3 = 0.666667DMPRECISION IS 7 2 / 3 = 0.6666667DMPRECISION IS 8 2 / 3 = 0.66666667DMPRECISION IS 9 2 / 3 = 0.666666667

-SET &NUM = 2;-SET &DIV = 3;-REPEAT REPEND FOR &I FROM 0 TO 9 STEP 1SET DMPRECISION = &I-RUN -SET &QUOT = &NUM / &DIV;-TYPE DMPRECISION IS &I &NUM / &DIV = &QUOT-REPEND

Page 50: Sampling Data for Better Understanding

50

The PRDNOR and PRDUNI functions generate reproducible random numbers:

{PRDNOR|PRDUNI}(seed, outfield)

PRDNOR generates reproducible double-precision random numbers that are normally distributed with an arithmetic mean of 0 and a standard deviation of 1.

PRDUNI generates reproducible double-precision random numbers uniformly distributed between 0 and 1

The RDNORM and RDUNIF functions generate random numbers:{RDNORM|RDUNIF}(outfield)

RDNORM generates double-precision random numbers that are normally distributed with an arithmetic mean of 0 and a standard deviation of 1.

RDUNIF generates double-precision random numbers uniformly distributed between 0 and 1

Reference: FOCUS for S/390 Using Functions Version 7.2 DN1001140.1101

Page 51: Sampling Data for Better Understanding

51

DEFINE FILE EMPDATAREC_NBR/I5 WITH PIN = LAST REC_NBR + 1;TAKE/I5 WITH PIN = INT(PRDUNI(37,'D12.5') * 10);ENDTABLE FILE EMPDATAPRINTREC_NBR TAKE PIN LASTNAME FIRSTNAMEWHERE TAKE EQ 1END

REC_NBR PRDUNI PIN LASTNAME FIRSTNAME

6 1 000000060 PATEL DORINA

10 1 000000100 ANDERSON TIM

12 1 000000120 WANG KATE

17 1 000000170 MORAN WILLIAM

28 1 000000280 MARTIN ARLEEN

41 1 000000410 CONTI MARSHALL

NUMBER OF RECORDS IN TABLE= 6 LINES= 6

Page 52: Sampling Data for Better Understanding

52

DEFINE FILE EMPLOYEECTR/I8C WITH DED_AMT = LAST CTR + 1;TAKE/I8C WITH DED_AMT = IMOD(CTR,10,'I8C');ENDTABLE FILE EMPLOYEEPRINTEMP_IDCOMPUTE SELECTED/A3 = 'YES';WHERE TAKE EQ 1WHERE RECORDLIMIT EQ 2ON TABLE HOLD AS EMPLIST FORMAT FOCUS INDEX EMP_IDEND-RUN-SET &RECS = &LINES;

NUMBER OF RECORDS IN TABLE= 2 LINES= 2

Page 53: Sampling Data for Better Understanding

53

JOIN CLEAR *JOIN EMP_ID IN EMPLOYEE TO ALL EMP_ID IN EMPLIST AS JEMPEL

01 S1 ************** *EMP_ID ** *LAST_NAME ** *FIRST_NAME ** *HIRE_DATE ** * ** *************** ************** I

+-----------------+-----------------+-----------------+---------- I FUNDTRAN I SEG01 I PAYINFO I ADDRESS 02 I U 03 I KM 04 I SH1 08 I S1 ************** .............. ************** ************** *BANK_NAME * :FOCLIST :: *DAT_INC ** *TYPE ** *BANK_CODE * :EMP_ID ::K *PCT_INC ** *ADDRESS_LN1 ** *BANK_ACCT * :SELECTED :: *SALARY ** *ADDRESS_LN2 ** *EFFECT_DATE * : :: *JOBCODE ** *ADDRESS_LN3 ** * * : :: * ** * ** ************** :............:: *************** *************** .............: ************** ************** JOINED C:\ibi\DEVSTU~2\Irv76\wfs\edatemp\...\emplist

Page 54: Sampling Data for Better Understanding

54

DEFINE FILE EMPLOYEETOT_DED_AMT/D12.2M = DED_AMT;END

TABLE FILE EMPLOYEE" Employee Sample Report "" Report has &RECS Employees"" &DATE &TOD"SUM SALARYBY EMP_ID SUMMARIZESUM GROSS TOT_DED_AMT AS DEDUCTIONS COMPUTE NET/D12.2M = GROSS - TOT_DED_AMT;BY EMP_IDBY PAY_DATESUM DED_AMT/D12.2MBY EMP_IDBY PAY_DATEBY DED_CODE WHERE SELECTED EQ 'YES';END

Page 55: Sampling Data for Better Understanding

55

DEFINE FILE EMPLOYEECTR/I8C WITH DED_AMT = LAST CTR + 1;TAKE/I8C WITH DED_AMT = IMOD(CTR,10,'I8C');ENDTABLE FILE EMPLOYEEPRINTEMP_ID

COMPUTE SELECTED/A3 = 'YES';WHERE TAKE EQ 1WHERE RECORDLIMIT EQ 2

ON TABLE HOLD AS EMPLIST FORMAT FOCUS INDEX EMP_ID

END

-SET &RECS = &LINES;JOIN CLEAR *

JOIN EMP_ID IN EMPLOYEE TO ALL EMP_ID IN EMPLIST AS JEMPEL

DEFINE FILE EMPLOYEETOT_DED_AMT/D12.2M = DED_AMT;END TABLE FILE EMPLOYEE" Employee Sample Report "

" Report has &RECS Employees"

" &DATE &TOD"SUM SALARYBY EMP_ID SUMMARIZESUM GROSS TOT_DED_AMT AS DEDUCTIONS COMPUTE NET/D12.2M = GROSS - TOT_DED_AMT;BY EMP_IDBY PAY_DATESUM DED_AMT/D12.2MBY EMP_IDBY PAY_DATEBY DED_CODE

WHERE SELECTED EQ 'YES';END

Page 56: Sampling Data for Better Understanding

56

Employee Sample Report Report has 2 Employees 08/07/08 08.03.04 EMP_ID SALARY PAY_DATEGROSS DEDUCTIONSNET DED_CODEDED_AMT 71382660 $21,000.00 81/11/30$833.33 $141.66 $691.67 CITY $0.83 FED $70.83 FICA $58.33 STAT $11.67 81/12/31$833.33 $141.66 $691.67 CITY $0.83 FED $70.83 SAVE $54.60

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

PAGE 3 Employee Sample Report Report has 2 Employees 08/07/08 08.03.04 EMP_ID SALARY PAY_DATEGROSS DEDUCTIONSNET DED_CODEDED_AMT 112847612 $13,200.00 82/04/30$1,100.00 $334.10 $765.90 STAT $20.02 82/05/28$1,100.00 $334.10 $765.90 CITY $1.43 FED $121.55 FICA $100.10 HLTH $22.75 LIFE $13.65 SAVE $54.60 STAT $20.02 82/06/30$1,100.00 $334.10 $765.90 CITY $1.43

Page 57: Sampling Data for Better Understanding

57

Page 58: Sampling Data for Better Understanding

58

Browser Query String to run a procedure located in an APP

//<servername> /ibi_apps/WFServlet?IBIF_ex=<focexec>&IBIAPP_app=<appmap>

Replace <servername> with the name of the server

Replace <focexec> with the name of the WebFocus procedure to run

Replace <appmap> with the virtual director (app map) from WebFocus server \srv76\wfs\etc\edasprof.prfWhich points to the physical folder where the procedure is stored.

For example:http://server1/ibi_apps/WFServlet?IBIF_ex=carinst&IBIAPP_app=ibisamp

Page 59: Sampling Data for Better Understanding

59

\ibi\apps\ibisamp\empdata.mas

FILENAME=EMPDATA, SUFFIX=FOC SEGNAME=EMPDATA, SEGTYPE=S1 FIELDNAME=PIN, ALIAS=ID, FORMAT=A9, INDEX=I, $ FIELDNAME=LASTNAME, ALIAS=LN, FORMAT=A15, $ FIELDNAME=FIRSTNAME, ALIAS=FN, FORMAT=A10, $ FIELDNAME=MIDINITIAL, ALIAS=MI, FORMAT=A1, $ FIELDNAME=DIV, ALIAS=CDIV, FORMAT=A4, $ FIELDNAME=DEPT, ALIAS=CDEPT, FORMAT=A20, $ FIELDNAME=JOBCLASS, ALIAS=CJCLAS, FORMAT=A8, $ FIELDNAME=TITLE, ALIAS=CFUNC, FORMAT=A20, $ FIELDNAME=SALARY, ALIAS=CSAL, FORMAT=D12.2M, $ FIELDNAME=HIREDATE, ALIAS=HDAT, FORMAT=YMD, $ $ DEFINE AREA/A13=DECODE DIV (NE 'NORTH EASTERN' SE 'SOUTH EASTERN' CE 'CENTRAL' WE 'WESTERN' CORP 'CORPORATE' ELSE 'INVALID AREA');$

Page 60: Sampling Data for Better Understanding

60

\ibi\ap

ps\ib

isamp

\emp

loyee.m

as

FILENAME=EMPLOYEE, SUFFIX=FOC

SEGNAME=EMPINFO, SEGTYPE=S1

FIELDNAME=EMP_ID, ALIAS=EID, FORMAT=A9, $

FIELDNAME=LAST_NAME, ALIAS=LN, FORMAT=A15, $

FIELDNAME=FIRST_NAME, ALIAS=FN, FORMAT=A10, $

FIELDNAME=HIRE_DATE, ALIAS=HDT, FORMAT=I6YMD, $

FIELDNAME=DEPARTMENT, ALIAS=DPT, FORMAT=A10, $

FIELDNAME=CURR_SAL, ALIAS=CSAL, FORMAT=D12.2M, $

FIELDNAME=CURR_JOBCODE, ALIAS=CJC, FORMAT=A3, $

FIELDNAME=ED_HRS, ALIAS=OJT, FORMAT=F6.2, $

SEGNAME=FUNDTRAN, SEGTYPE=U, PARENT=EMPINFO

FIELDNAME=BANK_NAME, ALIAS=BN, FORMAT=A20, $

FIELDNAME=BANK_CODE, ALIAS=BC, FORMAT=I6S, $

FIELDNAME=BANK_ACCT, ALIAS=BA, FORMAT=I9S, $

FIELDNAME=EFFECT_DATE, ALIAS=EDATE, FORMAT=I6YMD, $

SEGNAME=PAYINFO, SEGTYPE=SH1, PARENT=EMPINFO

FIELDNAME=DAT_INC, ALIAS=DI, FORMAT=I6YMD, $

FIELDNAME=PCT_INC, ALIAS=PI, FORMAT=F6.2, $

FIELDNAME=SALARY, ALIAS=SAL, FORMAT=D12.2M, $

FIELDNAME=JOBCODE, ALIAS=JBC, FORMAT=A3, $

SEGNAME=ADDRESS, SEGTYPE=S1, PARENT=EMPINFO

FIELDNAME=TYPE, ALIAS=AT, FORMAT=A4, $

FIELDNAME=ADDRESS_LN1, ALIAS=LN1, FORMAT=A20, $

FIELDNAME=ADDRESS_LN2, ALIAS=LN2, FORMAT=A20, $

FIELDNAME=ADDRESS_LN3, ALIAS=LN3, FORMAT=A20, $

FIELDNAME=ACCTNUMBER, ALIAS=ANO, FORMAT=I9L, $

SEGNAME=SALINFO, SEGTYPE=SH1, PARENT=EMPINFO

FIELDNAME=PAY_DATE, ALIAS=PD, FORMAT=I6YMD, $

FIELDNAME=GROSS, ALIAS=MO_PAY, FORMAT=D12.2M, $

SEGNAME=DEDUCT, SEGTYPE=S1, PARENT=SALINFO

FIELDNAME=DED_CODE, ALIAS=DC, FORMAT=A4, $

FIELDNAME=DED_AMT, ALIAS=DA, FORMAT=D12.2M, $

SEGNAME=JOBSEG, SEGTYPE=KU ,PARENT=PAYINFO, CRFILE=JOBFILE,

CRKEY=JOBCODE,$

SEGNAME=SECSEG, SEGTYPE=KLU,PARENT=JOBSEG, CRFILE=JOBFILE,$

SEGNAME=SKILLSEG,SEGTYPE=KL, PARENT=JOBSEG, CRFILE=JOBFILE,$

SEGNAME=ATTNDSEG,SEGTYPE=KM, PARENT=EMPINFO,

CRFILE=EDUCFILE,CRKEY=EMP_ID,$

SEGNAME=COURSEG, SEGTYPE=KLU,PARENT=ATTNDSEG,CRFILE=EDUCFILE,$