Optimization in SDD using Compress Function

24
A Presentation on Optimizing the IDB Process By Navin Kumar

Transcript of Optimization in SDD using Compress Function

Page 1: Optimization in SDD using Compress Function

A Presentation on Optimizing the IDB Process

By Navin Kumar

Page 2: Optimization in SDD using Compress Function

Process-I

Use of the Compress Option

Page 3: Optimization in SDD using Compress Function

The Compress option can greatly help reduce the execution time required for dataset creation.

In some cases upto ~40%Since IDB deals with huge volumes

of data this particular option is particularly pertinent to our day-to

-day programming.

Page 4: Optimization in SDD using Compress Function

Why use the compress option and what does it do?

Page 5: Optimization in SDD using Compress Function

Why use the compress option and what does it do?

Compressing a file is a process that reduces the number of bytes required to represent each observation.It converts each observation to a variable length record whereas in an un-compressed dataset each observation is a fixed length record.It means fewer I/O operations are required to read or write to the data during processing.

Page 6: Optimization in SDD using Compress Function

The Sub-Options with the Compress Option

Compress=yes or Compress=char The observations in a SAS data set are compressed by reducing repeated consecutive characters (including blanks) to two-byte or three-byte representations

Compress=binary This is highly effective for compressing medium to large (>=100Mb) blocks of binary data(numeric variables).

Datasets which have a lot of numeric variables (flags) like events, css and the like are most positively affected.

Page 7: Optimization in SDD using Compress Function

The REUSE OptionReuse=yes

This option specifies space to be reused, observations that are added to the SAS data set are inserted wherever enough free space exists, instead of at the end of the SAS data set.

Reuse=noThis is the default option and results in less efficient usage of space if you delete update or add many observations in a SAS dataset.

(An example would be the dataset Events requiring run-visit expansion or Vitals which requires transposing Vstestcd….basically any dataset which undergoes extensive record count change during its processing.)

It is hence advised to always use Reuse=yes option whenever we compress datasets.

Page 8: Optimization in SDD using Compress Function

Links describing in detail the Syntax, functions and

documentation of these OPTIONS have been attached at the end of

the presentation for reference.

Page 9: Optimization in SDD using Compress Function

How does it look like implemented in a code?

Page 10: Optimization in SDD using Compress Function

And here is an example of how the log will give a measure about the extent and effectiveness of the Compress option in your program.

Screenshots of Code and LogHere is an example of how the options statement is to be written

Page 11: Optimization in SDD using Compress Function

Some figures in Projects and Results of improved Efficiency

Project Name XXX1

Runtime(in Hours) Difference(in hours) Percent Reduction Compared to old Process

Without Compress option

2.553056 1.12083 ~44% Size of Compound

With Compress+Reuse option

1.432222 ~8.5Gb

With Compress option only

1.565556 0.9875

~39%

Page 12: Optimization in SDD using Compress Function

Some figures in Projects and Results of improved Efficiency

Project Name XXX2

Runtime(in Hours) Difference(in hours) Percent Reduction Compared to old Process

Without Compress option

10.14 3.54 ~35% Size of Compound

With Compress+Reuse option

6.6 ~44 Gb

With Compress option only

6.88 3.26

~32%

Page 13: Optimization in SDD using Compress Function

Some figures in Projects and Results of improved Efficiency

Project Name XXX3

Runtime(in Hours) Difference(in hours) Percent Reduction Compared to old Process

Without Compress option

1.8 0.7 ~40% Size of Compound

With Compress+Reuse option

1.1 ~44 Gb

Page 14: Optimization in SDD using Compress Function

Some figures in Projects and Results of improved Efficiency

Project Name XXX4

Runtime(in Hours) Difference(in hours) Percent Reduction Compared to old Process

Without Compress option

4.97 1.67 ~34% Size of Compound

With Compress+Reuse option

3.3 ~11.5 Gb

Page 15: Optimization in SDD using Compress Function

Do we have Trade-offs?

Yes. When we try to open a compressed dataset the data explorer is unable to open it

and shows blank variables.But!

Page 16: Optimization in SDD using Compress Function

Do we have Trade-offs?

We can work around it by just setting up the required dataset in a new dataset

Or

Page 17: Optimization in SDD using Compress Function

Do we have Trade-offs?

Use an option in the data step compress=no

Page 18: Optimization in SDD using Compress Function

Recommended Use and Applicability

It is to be used when dealing with big studies.Source Programming would benefit most by using it in their single driver program.It would help in situations when SDD is super slow.Last minute implementation of changes would speed up resulting in programmers going home on time .If both validation and source programmers are working on last minute changes near project deadlines there will be less time spent waiting for refreshes.Lesser number of complains and ill will feelings against SDD.

Page 19: Optimization in SDD using Compress Function

Recommended Use and Applicability

Validation Side Advantage

Since Validation programs are stand alone codes, different for each dataset, validation programmers have the freedom to choose and best implement Compress=Char or Compress= binary option depending on the dataset.

Page 20: Optimization in SDD using Compress Function

When is it advised not to use compress?

•Very small datasets. An example is the screen-shot below-Here a compressed dataset would be the same as an uncompressed one.

Page 21: Optimization in SDD using Compress Function

Advantage in its Use

An advantage in using compress in the options statement is that the I/O Engine automatically decides and switches between when to use and when not to use compress. Here is an example.

Page 22: Optimization in SDD using Compress Function

Projects List

Implemented so far in: BIV (LY2963016) in QACialis (LY450190) in QA

Tested in:BIVSolenzaCialisLA294

Hidden as they Projects for a Confidential

Client

Page 24: Optimization in SDD using Compress Function

THANK YOU