Follow the evidence: Troubleshooting Performance Issues

62
Follow the evidence: Troubleshooting performance issues T.K. Horeis, salesforce.com, Cloud and Industry Architect @TKHoreis

description

Are you hitting your Governor Limits? Is your system performance not up to expectations? Are you worried about your capacity to grow or merge multiple orgs? Then this session is for you. Join us as we line up the suspects, find out who's guilty, and how you can avoid being a victim in the closest thing to a murder-mystery at this year's DreamForce. We'll walk you through real situations, and most importantly, how we solved them.

Transcript of Follow the evidence: Troubleshooting Performance Issues

Follow the evidence: Troubleshooting performance issues

T.K. Horeis, salesforce.com, Cloud and Industry Architect@TKHoreis

Safe harborSafe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

T.K. HoreisCloud and Industry Architect@TKHoreis

Our customer, let’s call them Brand-X

Tip:Don’t Rush

To Judgement

> 9% > 10%

They Were Simply Using Too Much Capacity

Tip:Don’t Rush

To Judgement

How Did We Know?• Count• Long-running

operations• Total Runtime• CPU time• Db CPU time• Buffer Gets

Tip:

Submit a

Case To

Premier

Support

40% – 50%

20% - 25%

20%-25%

▪ Reduce combined buffer gets and DB CPU by

▪ Reduce combined runTime by

▪ Reduce combined App cpuTime by

So We Recommended That They Needed To…

Here’s What We Knew At The Time• Capacity usage was proportionally LARGE

• The EU org will be merged into NA

• EU users would use NA business processes only

• Some EMEA users (Ireland) have already been moved over.

And this is where our investigation begins…

Lots Of Evidence At The Scene Of The Crime

Lots Of Evidence At The Scene Of The Crime

Lots Of Evidence At The Scene Of The CrimeREST

SOAP

Visual Force Page Loads

So Just What Are They Loading So Many Times?

• /apex/qbdialer__sid 1814897• /apex/questionpage 6314• /apex/oppinformationrequest 5871• /apex/merchantaddys 5636• /apex/dnralert 5611• /apex/redemptionaddy 5503• /apex/account_geocoder 5080• /apex/pastdeals 4589• /apex/inlineoptions 3986• /apex/accountgoogleanalytics 3444

And When We Dig Further…

Where the median # of calls per day is ~1500.

Were those top 4 users REALLY making hundreds of thousands of calls per day?

Not Exactly…An old version of Power Dialer From Inside Sales was the source of this problem.

•Users that liberally use tabs can magnify browser issues

• Polling sidebar component created an issue

•Controller ran unnecessary queries

• Vendor produce a patch

•Browser issues can linger if users don’t restart

Bottom Line: A 64% Reduction in VF loads

Lessons Learned▪ It’s easier to find the culprit when you isolate the scenario.▪ Don’t make assumptions. Let the data guide you.

• Just because it’s an AppExchange app, doesn’t mean it’s correct.

▪ Trust, but verify

Reporting

Reporting Usage Is Off The Charts!

•~50% of all db gets come from reports•Daily Report Stats:

•~4000 reports being run daily

•45K-60K report executions daily

•10B-15B gets from reports

•6 users were using 9-13% of their report gets daily.

So Where Would You Start?

Why 4000 Unique Reports?Lack of Report Governance

Anyone can create any report they wantRun it as often as they wantMany reports were are the same or almost the sameLather, Rinse, Repeat Problem with bad reports

➢ Restrict who can create reports

➢ Restrict reporting access to some objects.

➢ Establish process to collect reporting requests

➢ Create in demand reports, so they can be shared.

➢ Use scheduled reports and dynamic dashboards

➢ Training, Training, Training

Bottom Line: A work in progress…

Why 45k-60k Report Executions Per Day?

6 users routinely executed a set of similar reports thousands of times / day.

It turns out they were using a browser script to continually refresh.

Six

What Would You Do?

Possible SolutionsRestrict Who Can Run Reports

Tell users to stop running the script

Restrict Login Hours

Improve the report

Implement workflows

Customer indicated that they couldn’t do that

This was done, but wasn’t immediately effective.

Customer wasn’t willing to do that just yet.

Done (see improvement in db gets)

PlannedTip:What you see

as the

problem, could

simply be another symptom.

Bottom Line: A 34% Reduction in the Execs

Why 10-15B Db Gets Are Coming From Reports?

❑ Hundreds of unselective reports that have > 1M gets/exec

❑ Many unselective reports that have 15M-35M gets/exec

❑ Two key report profiles contribute ~30% of overall get count

Problems Solutions

➢Restrict Who Can Create Reports

➢Your job doesn’t stop with implementing, you need to train them (cheatsheets, index lists, sample reports).

➢Revisit the project to audit, profile, and optimize.

Document available indexes Primary keys • Id • Name • OwnerId

Foreign keys • Lookup • Master-detail • CreatedById • LastModifiedById

Audit dates • CreatedDate • LastActivityDate • LastModifiedDate • SystemModstamp

Custom fields • Unique • External ID

List of Indexed Fields

Create your own indexes

Need Additional Indexes?▪ Salesforce support can create single and two-column custom indexes on most fields (with

the exception of multi-picklists, formula fields that reference other objects, and Long Text Area/Rich Text Area)

▪ Open a case➢ Sample Query➢ Bind Variables➢ Org ID and user ID

who would run the query➢ Suggested index – OPTIONAL

▪ If they create an index, then SAVE and DOCUMENT where the index was created

Understand SOQL query optimization

Standard indexes • Simple predicate targets < 20% of total records or 666K • AND predicate targets < 20% of total records • OR predicate targets < 10% or 333K of total records

Custom indexes • Simple predicate targets < 10% of total records or 333K • AND predicate targets < 666K of total records • OR predicate targets < 10% or 333K of total records

LIKE • Tests first 100K rows for selectivity

Predicates that can lead to full table scans • Does not contain• Leading wildcards • Formula fields

• NULL • Not equal • Contains• Not In

Cheat Sheet: Selectivity Rules

❑Key Opportunity Report

• 9-13% of total report gets/day

❑Custom Object Report

• 0-16% of report buffer gets /day

❑Various Account Reports

❑Various Lead Reports

❑Various Opportunity Reports

Potential Improvements Identified

WHERE Opportunity.Writeup_Status_del__c = 'Needs Details‘WHERE Opportunity.Writeup_Status_del__c = 'Needs DQ'

Can’t Index

• This filter field is a complex formula field.✓ Create sister field using trigger.✓ Index sister field.✓ Modify dependent reports.

Potential Improvements Identified

WHERE Call_List_Priority__c IS [NOT] NULLOR Call_List_Priority__c < '.00000000000000000201‘

• ‘<‘ and ‘>’ operators can’t optimize text fields

❑ Key Opportunity Report

• 9-13% of total report gets/day

❑ Custom Object Report

• 0-16% of report buffer gets /day

❑ Various Account Reports

❑ Various Lead Reports

❑ Various Opportunity Reports

✓ Investigate process that populates this field✓ Retype field as NUMBER✓ Modify reports so < operator can be optimized✓ Index field

Potential Improvements Identified❑Key Opportunity Report

• 9-13% of total report gets/day

❑Custom Object Report

• 0-16% of report buffer gets /day

❑Various Account Reports

❑Various Lead Reports

❑Various Opportunity Reports

• Using non-selective queries• Customer has wide objects (i.e. Oppty)• Determine appropriate selective field• Index field• Create Skinny Tables

Bottom Line: A 98% Reduction in Db Gets

Lessons Learned▪ Report on reports to isolate the problem

• Cost of individual reports• Number of report runs per day• Non-selective queries• Problematic queries

▪ Report Governance• Who is the author of problematic reports? • Custom Report Types models that the masses can customize

▪ Is it really a reporting problem?• Data Model• Workflow, Business process, etc.

Migration

What’s Wrong With Our Migration?Customer expressed concerns because using bulk API uploaded insert of 17k accounts and it took 27 minutes with workflows, validations, triggers.

What’s wrong here?

They also requested that we turn code coverage to zero.

Why?

What We FoundDupeBlocker was in use and the vendor, advises turning off triggers during migrations. They update a custom object record during account updates / inserts / deletes, which can cause significant contention problems that slow down bulk DML operations.

The request to turn code coverage tests to 0% was related to poorly constructed tests that required triggers to be active. This is clearly at odds with Salesforce Best Practice. The below was provided as a workaround.

Tip:Trust, but verify.

if(custom_setting__c.getInstance().disable_all_triggers__c == true) return;

We suggested adjusting batch sizes on bulk loads from the default to a lower number. A value of 60 was found to be optimal.

Tip:Follow Best

Practices. They’

ve been created

for a reason.

Bottom Line: 66% Load Time Improvement

Disable actions that fire on insertTriggers

Workflow Rules

Validation Rules

Defer sharing calculations

Just make

sure you turn

it back on again!!!

OR … … load with Public default sharing

Prep your data to avoid overhead

What We Found

Tip:Trust, but verify.

What’s the concern here?

Upon further investigation, we found:

Lessons Learned▪ Trust, but verify

• Initial reports of causality may be misleading• Follow the data to the cause• Non-selective queries• Problematic queries

▪ Follow Best Practices• Prep your data in advance for the best results• Understand how to structure your bulk operations• You may need to turn off sharing and other automatic functionality to improve

performance.

Resources

Architect Core Resource page• Featured content for architects• Articles, papers, blog posts, events• Follow us on Twitter

Updated weekly!

http://developer.force.com/architect

ResourcesApex Governor Limits

http://www.salesforce.com/us/developer/docs/apexcode/Content/apex_gov_limits.htm

Best Practices for Deployments with Large Data Volumeshttp://www.salesforce.com/us/developer/docs/ldv/salesforce_large_data_volumes_bp.pdf

Loading Large Data Sets with the Force.com Bulk API http://wiki.developerforce.com/page/Loading_Large_Data_Sets_with_the_Force.com_Bulk_API

Report Performancehttps://na1.salesforce.com/help/doc/en/salesforce_reportperformance_cheatsheet.pdf

Additional ResourcesBulk API Developers Guide http://www.salesforce.com/us/developer/docs/api_asynch/api_bulk.pdf

Bulk API Errors http://www.salesforce.com/us/developer/docs/api_asynch/Content/asynch_api_reference_errors.htm

Batch Apex http://www.salesforce.com/us/developer/docs/apexcode/index_Left.htm#StartTopic=Content/apex_batch.htm

Failing Safe with Apex Data Loader http://tedhusted.blogspot.com/2012/04/failing-safe-with-apex-data-loader-for.html

Tools– Data Loader - http://wiki.developerforce.com/page/Data_Loader– Dell Boomi - http://www.boomi.com/– IBM CastIron - http://ibm.co/PO0Qv8– Informatica - http://bit.ly/OeRcCi

Additional Info

Record Lock Lifecycle

▪ Record Locks = Data Integrity▪ Salesforce locks a record before executing a DML operation.

- This is done before the starting the save process.▪ The save process is documented on this page: Triggers and Order of Execution.

- Records will remain locked until commit.

▪ Salesforce will wait up to 10 seconds to lock a record before throwing an UNABLE_TO_LOCK_ROW error.

- Even if no errors occur, waiting for locks can significantly slow DML operations.

Parent-Child Relationships

▪ Insert of Contact requires locking the parent Account.▪ Insert or Update of Event requires locking both the parent Contact and

the parent Account.▪ Insert or Update of Task requires locking both the parent Contact and

parent Account, if the Task is marked as complete.▪ Insert of Case requires locking the parent Contact and parent Account.▪ In objects that are part of a master/detail relationship, updating a detail

record requires locking the parent if roll-up summary fields exist.

Multi-threaded Operations

▪ Locking should be taken into consideration for API integrations, data loads, apex future methods, etc. where requests will be run in parallel.

▪ Ideally requests that run concurrently should not require the same locks.

- Different requests can’t update same records- Different requests can’t update multiple children of the same parent

*

Prioritize the Data into Tiers

Tier 3: On Premise

Tier 1 Objects

Tier 2 Objects

SFDCAPI

SOA

Tier 1 : Normal Data▪ Normal SFDC data ▪ List views, standard reporting, and search▪ Data set should be ~< 10 Million records▪ Can include snapshot summary of key data to facilitate query

optimization

Tier 2 : Storage Objects▪ Custom Objects in a read-only approximation▪ No standard reporting and search▪ Use Visualforce pages to make filtered search queries and limit

view to users▪ Data set should be ~< 50 Million records

Tier 3 : On - Premise Objects▪ Stored in an on-premise/datawarehouse database

▪ Viewable only through mashups

▪ Integration processes move objects from SFDC to Tier 3, and from Tier 3 to other Tiers

▪ Part of a larger SOA framework

▪ Data set can be > 50 Million records

Denormalizing Data Increases Performance

▪Select name from contact where Account.SLA_Serial_Number__c IN :ListOfSerialNums

▪Solution

▪Copy SLA_Serial_Number to a field in Contact (don’t use a formula) and make the field an External Id

▪OR query for the account IDs with the SLA Serial number first

Batch Apex

▪ Running concurrent Batch Apex jobs on the same record set can lead to contention.

- Concurrent jobs will likely perform DML that requires locks on the same records.

▪ One workaround is to parallelize jobs by using an Autonumber field and a Formula field using the MOD function on the Autonumber field.

- This ensures that each job will operate on an entirely different record set.- Be careful with records that require locking their parent records.

▪ For example the child in a master-detail relationship.

Batch Apex – Avoid Errors in Bulk DMLglobal void execute(Database.BatchableContext bc, List<Account> scope) {for(Account acc : scope) { /* do something */ }Database.SaveResult[] results = Database.update(scope, false);/* look at results */

}

▪ If an error occurs in this save operation, Salesforce will try to save it one record at a time.

▪ This performs significantly slower than a bulk save without errors.

▪ Don’t lead the witness.▪ Confirm answers from multiple sources.▪ Ask about median, peak, exceptional event volumes.▪ When you think you’ve got it all, is there anything we’re forgetting?▪ Don’t just think about the requirements for today. Ask about growth

projections and make some yourself.▪ Know your governor limits and bulk limits, etc.

The Key To Good Discovery Questions

Questions To Ask▪ Stakeholder Expectations

– What are the performance criteria?– How long will migration/synchronization take?– Are there business-mandated update

windows?

▪ Reporting Requirements– How many rows are expected in results?– Do they need to view all results or is paging a

possibility?– Understand Performance expectations.– Are Reports Segmented or Filtered?– Report Samples may drive additional

questions.

▪ Data Access / Sharing– What is the likely growth in data volumes?– Will the current plan support that growth?– How can data updates be segmented?– If not, start planning NOW.– Dig deep on role hierarchy. How many

levels?– Is there a large number of OR complex

sharing rules?– How many records are owned by particular

users? (max. vs. median)– Will they be using Territory Management? If

so, how many levels?

▪ Data Integration– What data will be native to SFDC vs.

updated from an external master?– Will changes be driven from SFDC or the

external system?– How frequently will it change? Update

windows or immediately?– Do you need workflows or triggers?

▪ Replication– Do I need backups for compliance?– What is my backup strategy?– When and how can I archive the different

types of data in Salesforce?

Questions To Ask▪ Rollback

– What happens if an integration load or synch fails?

– How long will rollback take?– What services are affected during a rollback?– What is the mitigation plan?

▪ Purging / Archiving Data– Is some data transient?– What data can be archived?– When can it be archived?– What are the restore/recover requirements?– What are the Compliance / legal requirements

on access / availability?

Analyzing Performance – Developer Console▪ Use the Developer Console to analyze server-side performance.▪ Apex debug logs will be generated for every server request you perform

while the window is open.▪ Several tools are available in the Developer Console to find performance

hotspots.

Leveraging Bulk API Best Practices▪ Use Parallel Mode When possible.

– See FAQ for scenarios when you would serial processing▪ Organize Batches to Minimize Lock Contention▪ Be Aware of Operations that Increase Lock Contention

– Creating new Users– Updating ownership of records with private sharing– Updating user roles– Updating territory hierarchies.– SOLUTION: Create separate jobs to process data in serial mode.

▪ Minimize # of Fields– Foreign keys, lookup relationships, and summary fields are frequent culprits.

▪ Minimize or better yet Eliminate Workflow Actions.▪ Minimize or Eliminate Triggers▪ Optimize Batch Size

– any batch that takes more than 10 minutes is suspended and returned to the queue for later processing. The best course of action is to submit batches that process in less than 10 minutes.

– Techniques– Start with 5000 records and adjust the batch size based on processing time. If it takes more than five minutes to process a batch, it

may be beneficial to reduce the batch size. If it takes a few seconds, the batch size should be increased. If you get a timeout error when processing a batch, split your batch into smaller batches, and try again.

▪ Defer complex sharing rules▪ Speed of operation; Insert then Update then Upsert (involves implicit query)▪ Group and sequence data to avoid parent record locking▪ Remember database statistics calculate overnight, wait to do performance testing▪ Tune the batch size (HTTP keepalives, GZIP compression)

Leveraging Batch Apex Best Practices▪ Consider setting it to the maximum that the execute can support without running into governor

limits. Default is 200.▪ If you are operating on large volume of data, limit the Query Locator size and consider running

concurrent Batch Apex jobs. ▪ Chain Batch Apex Jobs:

Using Apex SchedulerIn the finish method of the Batch Apex job, create an Apex Scheduler instance to run just once and schedule the next Batch Apex job.

• Using Email ServicesCreate an Apex class that implements Message.InboundEmailHandler interface,Configure Inbound Email Service Handler, In the finish method of the Batch Apex job, send an email to the Inbound Email Handler,In the Inbound Email Handler class, submit the next Batch Apex job.