(ATS3-PLAT08) Optimizing Protocol Performance

(ATS03-PLAT08) Optimizing Protocol Performance

Eddy Vande WaterDirector, EMEA Field [email protected]

Andrew LeBeauAdvisory Product [email protected]

The information on the roadmap and future software development efforts are intended to outline general product direction and should not be relied on in making a purchasing decision.

Agenda

• Profiling and Refactoring• Data Access• Data Computing• Others key T&T• Server optimization• Summary

• Consider the first version (V1) of a “complete” protocol…perhaps ~30 components

• Protocol building is typically an incremental process with much iterative design. Therefore, completion of V1 represents the documentation of an intellectual process

• However, very significant optimizations can be achieved by reviewing V1 and considering major (perhaps complete) refactoring of the protocol, using the knowledge developed from building V1

Protocol Refactoring

• Identify protocol bottlenecks• Ctrl+T to toggle between options– Absolute compute time (sec)– Compute time as percentage of total execution time

Component Profiling

Demo: Protocol version 01

Protocol development flow• Get big file with activity on several

targets and lot of other props.• Need to pivot data• Only interested by one target• Need structure• Join my activity• Compute new property• Need additional data from db• Only interested by a range of data• Create nice report

Demo: Protocol version 02

28 seconds instead of 6 minutes!

Why?Because I used some simple principles!

• Keep the records as small as possible to do what you need. Don’t read in things just because they are there in the file; only read it what you will use! Don’t pass anything further down the pipeline than it is needed.

• If writing to disk to pass information between pipelines, caches are faster than delimited text (or any other file).

Data Access

• All create implicit caches• Filter before merging/caching• Reduce the number of properties• Merge on a sub-stream then join back• Sort before join – on the primary key• Cache Writer: Use Pre-Index options if the cache will later

be joined on

Merge / Join / Group / Sort / Cluster / etc.

• Database access should be tuned:– See

• (ATS3-PLAT04) Database Connectivity for Application Development • (ATS2-23) Managing Data Source Connections

– PP should be located close to the database server– Join in the database if possible– Use batch inserts, etc.– Use batches with the SQL Select for Each Data

Database

• Think about the order you need to do things

• Compared with…

When and Where to Calculate Properties

• Allows parallelization of computationally intensive tasks

• Need to pay attention to batch size – don’t make it too small– Performance can be almost linear with

number of cores (our numbers and customers’)

• Can be problematic for subprotocols using R, or other external apps

Parallel Processing in Subprotocols

Demo: Protocol version 2.0

• Prefer linear pipelines– Most efficient memory usage

• Avoid excessive branching– Branching pipes causes data cloning. This can be expensive for large data

records

• Avoid hash tables as caches– Use a file cache

• Reduce usage of caches and caching components– Merge, Group, Sort and Cluster create unseen caches– Be mindful of children nodes

Others key T&T

Others key T&T ctd.

• General relative speed of implementations: – Components >= Pilot Script >= Java

• Protocol Function– Use AJAX to call a protocol within a page– Can provide better performance if only needs to update part of a report

• Be careful!– Run To Completion (RTC) subprotocol can slow down protocol execution:

Use sparingly…– Check point are very good to debug but should not be kept while protocol is

finished.

__PoolID

• PP Server uses daemons and job pooling to speed up executing jobs

• Setting __PoolID sets which job pool your protocol is executed in• You CANNOT put the __PoolID parameter on the protocol itself

=> Admin discussion in (ATS3-PLAT11) Advanced Planning for AEP Deployments_Migrations

Job Pooling Illustration

• Built-in job pools (Some job pools are configured to run OOTB):– Warm-up pool– Keep-warm pool– Default pool

• Job pools and impersonation

Using __PoolID

Server optimization

Cluster • Built into Pipeline Pilot

Grid

• Leverages Existing Grid Engine• Sun GridEngine• PBS Pro• LSF• Custom Scripts

See (ATS3-PLAT11) Advanced Planning for AEP Deployments_MigrationsAnd also (ATS2-07) Solving Large Computing Challenges with Pipeline Pilot

http://portal.accelrys.net/groups/product-management/Enterprise/Shared%20Documents/TechSummit2012-Europe/Sessions-Working/(ATS3-PLAT11)%20Advanced%20Planning%20for%20AEP%20Deployments_Migrations.pptx

http://portal.accelrys.net/groups/product-management/Enterprise/Shared%20Documents/TechSummit2012-Europe/Sessions-Working/(ATS3-PLAT11)%20Advanced%20Planning%20for%20AEP%20Deployments_Migrations.pptx

http://portal.accelrys.net/groups/product-management/Enterprise/Shared%20Documents/TechSummit2011-NewJersey/Presentations-DRAFTS/(ATS2-07)%20Solving%20Large%20Computing%20Challenges%20with%20Pipeline%20Pilot.pptx





• Protocol Refactoring is a very critical step.

• Application of basic principles can improve dramatically

performances

• Fine tuning needs good knowledge of the context

• Use a specific job pool for your apps

• Accelrys Enterprise Platform is very scalable.

Summary

The information on the roadmap and future software development efforts are intended to outline general product direction and should not be relied on in making a purchasing decision.

For more information on the Accelrys Tech Summits and other IT & Developer information, please visit:https://community.accelrys.com/groups/it-dev

https://community.accelrys.com/groups/it-dev

(ATS3-PLAT08) Optimizing Protocol Performance

Technology

Transcript of (ATS3-PLAT08) Optimizing Protocol Performance