Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

45

Transcript of Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Page 1: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”
Page 2: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Thomas KyteOracle Corporation

Finding out what’s wrong – in search of “fast=true”

Page 3: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Who am I

• Been with Oracle for 10 years

• User of Oracle for almost 16 years

• The “Tom” behind AskTom in Oracle Magazine

www.oracle.com/oramag

• Expert One on One Oracle• Beginning Oracle

Page 4: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

My Approach

Page 5: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

What we’ll be doing

• My 3 best friends – autotrace, tkprof and statspack• Their friends – dbms_profiler, runstats, and even

jdev• Build(ing) a test environment• Design to perform, don’t tune to perform (it is a 4

letter word)• Have metrics – and live up to them (and only them)• Benchmark, Benchmark, Benchmark• Instrument your code• Don’t look for shortcuts

Page 6: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Break Glass in case of Emergency

• Triage– In most cases you have no prior information. You are starting from

scratch– Get a statspack– Isolate an application

• Trace it

• If you implement some of the ideas I’ll talk about– You won’t have to triage (you can predict)– When you need to isolate where the issue is – you can– Finding the problem is almost always harder then fixing it! (time

entry example)

Page 7: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

My three best friends

• Autotrace– Too easy– Explain plan– The statistics– Demo007.sql

Page 8: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

My three best friends We have a big customer table which is referenced by many other tables.

The table's primary key is NUMBER(12). However the foreign key columns of the some of the tables which is referencing this customer table were defined as NUMBER. (Seen as NUMBER(38) in data dictionary).

This datatype difference causes a data conversion during the checking of the foreign keys and thus a performance loss. (invalid conclusion based on guessing – no hard facts to back it up)

In order to prevent it, we tried to alter the NUMBER columns to the same type of our customer table's primary key - NUMBER(12).

Since Oracle doesn't allow us to do this unless the column values are NULL, we had to insert them with their rowids to a temporary table, setting them to NULL, altering the column to NUMBER(12), and populating the original values from the temporary table again. (doing a TON of work and they will see no gain from this – other then tons of IO)

This is a very slow process. So that we can't even see its end after 8 hours of working and we had to stop it. My question is, could you please offer us a faster way of doing this operation? (yes we can – stop doing it!)

The total size of the tables we have to alter is about 220 million rows, we are running all DML in parallel and nologging mode, we are creating the temporary table by create as select

• Review demo008.sql and demo008.prf • @trace and select count(*) example

Page 9: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

My three best friends

• Statspack– Keep a history

• I've been asked to look at a site who have just carried out a DB upgrade. The feeling is that there has been some performance degradation since the upgrade. The performance hit has not been quantified yet but is 'felt throughout the day'.

– Use it in sickness and in health– 15-30 minute windows between snaps at most

• Averaging out over 8 hours is meaningless– Don’t try to remove every last wait

• When looking through the statspack report, I found that events, "direct path read" and "direct path write", are always on the top 2 of "Top 5 Wait Events" section as follows. They had 2.92 seconds of waits on it– over a 30minute window!

• Watch for the “so what” waits – control file parallel write

Page 10: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

My three best friends

• So, you have a 20 page statspack report. Now what

– Make sure it is 15-30 minutes long– Make sure timed statistics were on

STATSPACK report for DB Name DB Id Instance Inst Num Release Cluster Host------------ ----------- -------- -------- ----------- ------- ----ORA9I 2272536868 ora9i 1 9.2.0.1.0 NO aria  Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- --------Begin Snap: 1 30-Dec-02 09:58:58 67,254 3.0 End Snap: 2 30-Dec-02 10:14:52 67,260 3.0 Elapsed: 15.90 (mins) Cache Sizes (end)~~~~~~~~~~~~~~~~~ Buffer Cache: 96M Std Block Size: 8K Shared Pool Size: 112M Log Buffer: 512K

Page 11: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

My three best friends

• Check hard parses, want almost none• Executes & TPS is an indicator of the “load” on the system• Others are useful for reference

Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 75,733.92 20,737.70 Logical reads: 1,535.11 420.35 Block changes: 449.56 123.10 Physical reads: 562.99 154.16 Physical writes: 62.53 17.12 User calls: 8.04 2.20 Parses: 56.43 15.45 Hard parses: 0.38 0.10 Sorts: 11.63 3.19 Logons: 0.30 0.08 Executes: 94.21 25.80 Transactions: 3.65  % Blocks changed per Read: 29.29 Recursive Call %: 97.14 Rollback per transaction %: 9.41 Rows per Sort: 752.04

Page 12: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

My three best friends

• The ratios – Library hit, Soft parse are the ones I zero in on• Buffer hit – not so much• Execute to parse – depends on system type

– Negative = bad– 20-40% not unreasonable for web based (stateless)– Client/server should be much much higher.

Instance Efficiency Percentages (Target 100%)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 100.00 Redo NoWait %: 100.00 Buffer Hit %: 94.94 In-memory Sort %: 98.50 Library Hit %: 99.81 Soft Parse %: 99.33 Execute to Parse %: 40.10 Latch Hit %: 99.91Parse CPU to Parse Elapsd %: 86.12 % Non-Parse CPU: 94.69  Shared Pool Statistics Begin End ------ ------ Memory Usage %: 86.95 85.16 % SQL with executions>1: 66.38 67.40 % Memory for SQL w/exec>1: 67.28 69.68

Page 13: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

My three best friends

• Top 5 timed events– Not waits in 9iR2 and up – timed events, includes CPU time– This was a 4 CPU machine, I used 508 out of 3,600 CPU seconds– Direct path write caught my eye

• Direct loads• PDML• Uncached lobs• Sorts to disk

• The rest of the report helps me figure out the top of the report

Top 5 Timed Events~~~~~~~~~~~~~~~~~~ % TotalEvent Waits Time (s) Ela Time-------------------------------- ------------ ----------- --------CPU time 508 37.39direct path write 5,168 279 20.50db file scattered read 38,554 270 19.85log file sync 2,610 88 6.50direct path read 2,702 86 6.29 ------------------------------------------------------

Page 14: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Their three best friends

• DBMS_PROFILER– Source code profiler for PLSQL– Find the low hanging fruit

• Runstats– Simple test harness to show differences in approaches– Demo009.sql

• Jdeveloper– Source code DEBUGGER for PLSQL– Interactive develop/compile/debug or…– Debug from afar

• Submit web page• Attach debugger to mod_plsql invoked procedure

– Quick Demo

Page 15: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Build a Test Environment

• Them: Our production application is behaving badly, it is doing <something>

• Me: When you ran it in test, what was the outcome there

• Them: Test? What is “test”• Me: When you tested this process in your test

environment – that duplicate of production what happened there?

• Them: Oh – well, we don’t have a test environment….

Page 16: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Build a Test Environment

• Certain large institution– Upgraded– “Seemed Slower”– Flipped 16 switches and didn’t have the ability unswitch them

• A certain city asked on Friday if there were any “gotchas” with upgrading – they just wanted to know for tomorrow

• Testing will– Benchmark it, test scale, test performance– Verify fixes actually work– Assure you that upgrade script actually works– Make sure major things don’t knock you out

Page 17: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Build a Test Environment

• The rules of the game:– Don’t test or develop on an empty database

• Importing stats doesn’t cut it• If you can read a query plan and with 100%

accuracy say “its good” – I want to hire you– Don’t test with a single user– Don’t test in a dust free lab

• Stubbed out API’s• Less work then in real life

Page 18: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Design to perform, don’t tune it

• Specific models work better then generic• Try 15 ways to do something

– Consider features you’ve never used• Hash clusters• IOTs• B*tree clusters

– People example (tall skinny table as an index)– Denormalization example (name search)– Demo iot_heap.sql

Page 19: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Design to perform, don’t tune it

I have a table with a blob field, for example:Create table trx ( trxId Number(18), trxType Varchar2(20), objValue Blob )

Blob Fields contains a java serialised object, different objects based on types, though all of them implements same interface. We have always accessed this object through a J2EE container, so it works fine so far. Now users want to use reports using sqlplus, crystal reports etc. So they want a solution to this blob issue.

• Went on to discuss how *slow* it is to parse this out• built tons of views to give a relational view of these structures• slow as slow can possibly be• not to mention a slightly CPU intensive solution• mad at the database now

Page 20: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Design to perform, don’t tune it

• Store the data in such a way that the most FREQUENT or CRITICAL queries/applications can use it

– “we execute this query a billion times an hour – tune it”

select fact.* from fact, members where (fact.fk_mem = members.id or fact.fk_id1 in (correlated subquery) or fact.fk_id2 in (correlated subquery) or (fact.fk_mem is null and fact.fk_id1 is null and fact.fk_id2 is null) ) and members.id = 2

Page 21: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Have metrics and live up to them

• Make it go faster• Them: “Tune this, it is going slow”• Me: “Well, how fast does it have to go?”• Them: “I don’t know, just needs to go faster”• Me: “How do we know when we are done?”• Them: “When it is going fast enough”• Me: “And what is that?”• Them: “Don’t know, we’ll know it when we see

it”

Page 22: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Have metrics and live up to them

• I have a theory – it can always go 1% faster– You never reach “zero” as it is always 1% of a smaller number– The cost of each 1% in time, energy and money exponentially

increases

• You need things like– You’ll have 1,000 users of which 100 will be active

concurrently. They must have response times of 0.25 seconds for this <well defined> transaction

• What I usually hear– It’s going to have lots and lots of users doing stuff. It’s gotta

be really fast.

• You cannot design, let alone size, that system.

Page 23: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Have metrics and live up to them

• Because everyone says so…• Them: “We rebuild our indexes every week”• Me: “Why?”• Them: “Everyone knows you need to”• Me: “Have you ever put together solid metrics that prove you

are doing a good thing?”• Them: “Why?”• Me: “Because by rebuilding the indexes <any ‘maintenance

operation’ could go here actually>, you might actually be decreasing performance, increasing the workload on the machine, and just in general wasting lots of time and energy”

• Them: “But it is common knowledge that rebuilding indexes is best, we don’t need to prove it”

Page 24: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

HELP!!!! Riddle me this batman...

Why does an rebuilding an index cause increased redolog generation AFTER the index has been built?

I have a table 35 million rows and an index (nothing is partitioned) Transactions against this table are constant. It's always 500,000 rows per day. This generally creates 10 logs a day

Once a month the indexes are rebuilt. (Alter index rebuild)On the day following the indexes rebuild 50 logs are createdOn the following days 45...40...35...30....25....down to 10 at 10 logs this remains constant at 10 Mining the logs we see that we have increase INTERNAL INDEX UPDATES

Why does this happen?? Is this always the case??

Page 25: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Have metrics and live up to them

• Keep metrics– Statspack– Application level statistics

• Evaluate against them– Do an index rebuild– Come back tomorrow and verify you did more good

then harm

Page 26: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Benchmark, Benchmark, Benchmark

• Small time benchmarking– Interested in finding the difference between two approaches– Look at number of Latches approach 1 takes vs 2– More important then elapsed time

• Big Time Benchmarking– Most people skip this– Too expensive– You must test with representative data– You must test with realistic inputs– You must verify/validate the results– This is not a chore

Page 27: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Instrumentation

• What they heck is this?• Common pushback:

– This is overhead– This will slow down the code– This is extra code I don’t need

• My answer– Why are you curious about the v$ tables, guess what

they are– SQL_TRACE?– The entire Events subsystem– Do I practice what I preach?

Page 28: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Instrumentation

• Databases were born to write to – every click on every web site I build comes with a builtin insert

• For the last 24 hours, I have observed that accessing your site, clicking on 'ask question', 'read question','Review Question' etc., all gone slow (taking around 2-3 minutes instead of couple of seconds before). Are others facing the same problem?

• I just went to my statistics page, generated right from my audit trail as part of my system

• Owarepl• A certain Apps implementation I worked on

– (that didn’t do this)

Page 29: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Instrumentation

• Use DBMS_APPLICATION_INFO everywhere!– Demo005.sql

• Debug.f– Demo006.sql

• Make it so your applications can all enable SQL_TRACE!

• If you use java – use the J2SE/EE industry standard logging. Help us help you

• Audit is 5 letters, not 4

Page 30: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Don’t look for short cuts

• Everyone is looking for fast=true– (it doesn’t exist)

• Frequently I’m asked:– Tell me about these undocumented parameters– Where can I learn about Oracle internals

• These are people who haven’t read the concepts guide yet.

• Undocumented – for a reason– _trace_files_public for example– Neat event for bitmaps in 7.3.2 – killed intermedia in 8.0

Page 31: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

“Question Authority.”“Question Authority.”

- Unknown

QuoteAnd now, for something completely different

Page 32: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

There are lots of “experts” out there• Make them prove everything• Statements that should raise your eyebrows:

– It is my opinion...– I claim...– I think...– I feel…

• Everything can (and should) be proven• TKPROF goes a long way here• Statspack is great• “Runstats” is a tool I use as well (search

asktom for runstats)

• Things change, expect that• It only takes a single counter case

Page 33: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

Answers

Questionsand

Q&A

Page 34: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

“Rebuilding an index will always save space and increase performance.”

“Rebuilding an index will always save space and increase performance.”

Quote

Page 35: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

“Raw is faster, if you are IO bound, go RAW.”

“Raw is faster, if you are IO bound, go RAW.”

Quote

Page 36: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

“When coding in PL/SQL, one should always use explicit cursors.”

“When coding in PL/SQL, one should always use explicit cursors.”

Quote

Page 37: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

[email protected]> declare 2 l_cnt number; 3 l_start number; 4 cursor c is select count(*) from dual; 5 begin 6 l_start := dbms_utility.get_time; 7 for i in 1 .. 10000 8 loop 9 select count(*) into l_cnt from dual; 10 end loop; 11 dbms_output.put_line( (dbms_utility.get_time-l_start) || ' hsecs' ); 12 13 l_start := dbms_utility.get_time; 14 for i in 1 .. 10000 15 loop 16 open c; 17 fetch c into l_cnt; 18 close c; 19 end loop; 20 dbms_output.put_line( (dbms_utility.get_time-l_start) || ' hsecs' ); 21 end; 22 /198 hsecs260 hsecs

PL/SQL procedure successfully completed.

Page 38: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

“Adding more CPU will make all systems faster for sure.”

“Adding more CPU will make all systems faster for sure.”

Quote

Page 39: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

“Index space is never reused.”

“Index space is never reused.”

Quote

Page 40: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

“NOLOGGING stops all redo log from being generated on that object or tablespace.”

“NOLOGGING stops all redo log from being generated on that object or tablespace.”

Quote

Page 41: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

“A table should optimally be in one extent or as few extents as possible.”

“A table should optimally be in one extent or as few extents as possible.”

Quote

Page 42: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

“The most selective fields must be first in an index.”

“The most selective fields must be first in an index.”

Quote

Page 43: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

“You should commit frequently to save resources and time.”

“You should commit frequently to save resources and time.”

Quote

Page 44: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”

“A cold backup is better and/or easier then a hot backup.”

“A cold backup is better and/or easier then a hot backup.”

Quote

Page 45: Thomas Kyte Oracle Corporation Finding out what’s wrong – in search of “fast=true”