(ATS3-PLAT08) Optimizing Protocol Performance
-
Upload
accelrys -
Category
Technology
-
view
795 -
download
0
description
Transcript of (ATS3-PLAT08) Optimizing Protocol Performance
(ATS03-PLAT08) Optimizing Protocol Performance
Eddy Vande WaterDirector, EMEA Field [email protected]
Andrew LeBeauAdvisory Product [email protected]
The information on the roadmap and future software development efforts are intended to outline general product direction and should not be relied on in making a purchasing decision.
Agenda
• Profiling and Refactoring• Data Access• Data Computing• Others key T&T• Server optimization• Summary
• Consider the first version (V1) of a “complete” protocol…perhaps ~30 components
• Protocol building is typically an incremental process with much iterative design. Therefore, completion of V1 represents the documentation of an intellectual process
• However, very significant optimizations can be achieved by reviewing V1 and considering major (perhaps complete) refactoring of the protocol, using the knowledge developed from building V1
Protocol Refactoring
• Identify protocol bottlenecks• Ctrl+T to toggle between options– Absolute compute time (sec)– Compute time as percentage of total execution time
Component Profiling
Demo: Protocol version 01
Protocol development flow• Get big file with activity on several
targets and lot of other props.• Need to pivot data• Only interested by one target• Need structure• Join my activity• Compute new property• Need additional data from db• Only interested by a range of data• Create nice report
Demo: Protocol version 02
28 seconds instead of 6 minutes!
Why?Because I used some simple principles!
• Keep the records as small as possible to do what you need. Don’t read in things just because they are there in the file; only read it what you will use! Don’t pass anything further down the pipeline than it is needed.
• If writing to disk to pass information between pipelines, caches are faster than delimited text (or any other file).
Data Access
• All create implicit caches• Filter before merging/caching• Reduce the number of properties• Merge on a sub-stream then join back• Sort before join – on the primary key• Cache Writer: Use Pre-Index options if the cache will later
be joined on
Merge / Join / Group / Sort / Cluster / etc.
• Database access should be tuned:– See
• (ATS3-PLAT04) Database Connectivity for Application Development • (ATS2-23) Managing Data Source Connections
– PP should be located close to the database server– Join in the database if possible– Use batch inserts, etc.– Use batches with the SQL Select for Each Data
Database
• Think about the order you need to do things
• Compared with…
When and Where to Calculate Properties
• Allows parallelization of computationally intensive tasks
• Need to pay attention to batch size – don’t make it too small– Performance can be almost linear with
number of cores (our numbers and customers’)
• Can be problematic for subprotocols using R, or other external apps
Parallel Processing in Subprotocols
Demo: Protocol version 2.0
• Prefer linear pipelines– Most efficient memory usage
• Avoid excessive branching– Branching pipes causes data cloning. This can be expensive for large data
records
• Avoid hash tables as caches– Use a file cache
• Reduce usage of caches and caching components– Merge, Group, Sort and Cluster create unseen caches– Be mindful of children nodes
Others key T&T
Others key T&T ctd.
• General relative speed of implementations: – Components >= Pilot Script >= Java
• Protocol Function– Use AJAX to call a protocol within a page– Can provide better performance if only needs to update part of a report
• Be careful!– Run To Completion (RTC) subprotocol can slow down protocol execution:
Use sparingly…– Check point are very good to debug but should not be kept while protocol is
finished.
__PoolID
• PP Server uses daemons and job pooling to speed up executing jobs
• Setting __PoolID sets which job pool your protocol is executed in• You CANNOT put the __PoolID parameter on the protocol itself
=> Admin discussion in (ATS3-PLAT11) Advanced Planning for AEP Deployments_Migrations
Job Pooling Illustration
• Built-in job pools (Some job pools are configured to run OOTB):– Warm-up pool– Keep-warm pool– Default pool
• Job pools and impersonation
Using __PoolID
Server optimization
Cluster • Built into Pipeline Pilot
Grid
• Leverages Existing Grid Engine• Sun GridEngine• PBS Pro• LSF• Custom Scripts
See (ATS3-PLAT11) Advanced Planning for AEP Deployments_MigrationsAnd also (ATS2-07) Solving Large Computing Challenges with Pipeline Pilot
• Protocol Refactoring is a very critical step.
• Application of basic principles can improve dramatically
performances
• Fine tuning needs good knowledge of the context
• Use a specific job pool for your apps
• Accelrys Enterprise Platform is very scalable.
Summary
The information on the roadmap and future software development efforts are intended to outline general product direction and should not be relied on in making a purchasing decision.
For more information on the Accelrys Tech Summits and other IT & Developer information, please visit:https://community.accelrys.com/groups/it-dev