David Clement

September 2004

Oracle versions 7-8

Statspack resembles Bstat and Estat in reporting system-wide performance for a period of time. Unlike Bstat and Estat, Statspack can be run from within SQL*Plus. It has its own user, perfstat, with default password perfstat, and its own tables and views in the Tools tablespace. The report that it generates is even longer and more complex than the one that Estat generates, but it is much better organized. The most informative data is on the first page, and the rest of the report provides a finely detailed drill-down.

To install Statspack, connect as the system DBA and run the scripts that drop and create the Statspack objects. For example, in a Unix or Linux environment you could do this.

sqlplus "/ as sysdba" @$ORACLE_HOME/rdbms/admin/spdrop.sql
sqlplus "/ as sysdba" @$ORACLE_HOME/rdbms/admin/spcreate.sql 
The second script is interactive; you will have to provide the perfstat user's password and default and temporary tablespaces.

To start the Statspack performance analysis, issue the following commands from the SQL*Plus prompt.

connect perfstat/perfstat
execute statspack.snap;
This reads the system performance metrics and stores them in tables for later use. At the end of the sampling interval, whether it be 10 minutes or an hour, run the Statspack snapshot procedure again, just as before. Statspack does its analysis by comparing snapshots, so it requires at least two of them.

You can run the procedure as often as you like. Statspack gurus (such as Oracle's Tom Kyte) recommend running the snapshot procedure at fairly short intervals of 10 to 15 minutes. You can list the snapshot IDs and times with the following query.

select snap_id, snap_time from stats$snapshot;
If the Statspack tables grow large, they can be purged occasionally with another script, sppurge.sql.

Once you have snapshots, you can generate Statspack reports with the script spreport.sql. This is another interactive script, which expects snapshot IDs so that it can tell which snapshots to compare.

The first page of the report is a summary page (see below for a sample page). It has five sections, consisting of a header, the Cache Sizes, the Load Profile, the Instance Efficiency Percentages, and the Top 5 Wait Events. The first two sections are mainly for reference, so that the analyst can tell on what server and for what time this report was run, and how the SGA of the instance was configured.

The Load Profile shows what the system is doing. These numbers are more meaningful in relation to the application design and to each other than in isolation. For example, if the application design involves extensive calculations and comparisons, the number of physical reads (which represent hard disk i/o) should probably be low in comparison to the number of logical reads. In such a design, there is little need for extensive reads from disk. A data warehouse, on the other hand, which would typically retrieve large amounts of data for ad hoc queries, would show a far higher proportion of physical to logical reads.

The number of transactions per second should be high if the application is a classic OLTP application, involving many quick updates of individual records, but low if the application is processing substantial amounts of data in (for instance) batch jobs.

One of the most revealing sets of numbers is the parsing percentages. Applications submit predictable SQL statements; even large applications have only a few thousand separate SQL statements. So a well-tuned application should show little hard parsing. Hard parsing comes either from unpredictable SQL -- that is, the users are making up queries on the fly, as in a data warehouse -- or from a flaw in the application that is forcing previously parsed statements to be parsed again.

A useful way of reading the Load Profile section is to compare it across Statspack reports. This is a quick, reliable way of checking how stable the Oracle instance is.

After the Load Profile comes the section on Instance Efficiency Percentages. These numbers should be close to 100 percent. Figures lower than about 95 to 97 percent are cause for concern, although naturally the application architecture and performance determine what is unacceptable.

The shared pool statistics, under Instance Efficiency Percentages, describe how SQL statements, once parsed, are maintained in the SGA. The first line shows how much of the shared pool is in use, and the next two lines show how much is used by SQL statements that are executed more than once during the time the report covers. In general, the higher the reuse of SQL statements, the more efficient the application. However, this is completely dependent on how long the statements take to execute versus how long the Statspack report covers.

The last of the major sections is the Top 5 Wait Events. Some Statspack gurus argue that this is the most important section in the summary page. It definitely reveals a lot about where the instance is spending its time. This list excludes wait events that might show mere idling (as when the instance is waiting for another server to transmit data). The wait events are listed in descending order of the percentage of total wait time that they consume. Some common wait events and their causes are listed in the table after the sample summary page.

Statspack Report: Sample Summary Page

STATSPACK report for

DB Name         DB Id    Instance     Inst Num Release     OPS Host
------------ ----------- ------------ -------- ----------- --- ------------
TEST          3397982562 TEST                1   NO  hpux5

                Snap Id     Snap Time      Sessions
                ------- ------------------ --------
 Begin Snap:         62 03-Jun-03 15:15:02      234
   End Snap:         63 03-Jun-03 16:15:02      234
    Elapsed:                  60.00 (mins)

Cache Sizes
           db_block_buffers       76800          log_buffer:    2097152
              db_block_size        8192    shared_pool_size:  200000000

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:            330,129.18              1,852.18
              Logical reads:             39,292.95                220.45
              Block changes:              2,080.05                 11.67
             Physical reads:              1,788.78                 10.04
            Physical writes:                133.85                  0.75
                 User calls:              1,900.53                 10.66
                     Parses:                304.20                  1.71
                Hard parses:                  7.02                  0.04
                      Sorts:                594.51                  3.34
                     Logons:                  0.06                  0.00
                   Executes:              1,753.15                  9.84
               Transactions:                178.24

  % Blocks changed per Read:    5.29    Recursive Call %:   59.25
 Rollback per transaction %:    0.29       Rows per Sort:   13.14

Instance Efficiency Percentages (Target 100%)
            Buffer Nowait %:   99.97       Redo NoWait %:  100.00
            Buffer  Hit   %:   95.45    In-memory Sort %:  100.00
            Library Hit   %:   99.15        Soft Parse %:   97.69
         Execute to Parse %:   82.65         Latch Hit %:   99.94
Parse CPU to Parse Elapsd %:   71.30     % Non-Parse CPU:   96.91

 Shared Pool Statistics        Begin   End
                               -----   -----
             Memory Usage %:   67.35   71.78
    % SQL with executions>1:   41.03   39.86
  % Memory for SQL w/exec>1:   40.43   39.58

Top 5 Wait Events
                                                              Wait     % Total
Event                                               Waits  Time (cs)   Wt Time
-------------------------------------------- ------------ ------------ -------
db file sequential read                         1,537,677      371,787   35.98
log file sync                                     265,557      270,484   26.18
log file parallel write                           285,344      180,055   17.43
db file scattered read                            930,129      116,226   11.25
latch free                                         98,534       61,813    5.98

Statspack Report: Some Wait Events

Name Description
buffer busy wait A server process is trying to access a buffer in the database buffer cache that another server process is holding. For example, while a process is writing to a buffer, other processes have to wait to read that buffer. There are several causes of this event but it indicates that the system is i/o-bound in some way.
db file parallel write The DBWR process has waited on multiple simultaneous i/o requests (dirty block buffer cache writes in this case).
db file scattered read A server process is accessing data in multiple non-contiguous blocks in the database buffer cache, and another process had to wait for this to finish. Typically this results from a full table scan, or from a full index scan. If this is the most frequent wait event, the system is probably healthy, unless the database design is such that there should be few or no full table scans, as in a classic OLTP design.
db file sequential read A server process is accessing data in contiguous blocks in the database buffer cache, and another process had to wait for this to finish. Typically this results from contention for access to index segments. Again, if this is a common wait event, it is probably not a cause for concern unless the database was designed not to do this, as in a data warehouse.
direct path read (write) A process waited for completion of a data read or write from a database file directly into another process's PGA (process global area) bypassing the SGA (system global area). Often this implies time-consuming sorts on disk, because the SGA does not manage sort segments.
enqueue A process is waiting for a row-level lock to be released so that it can lock the same row (for an update, for example). This lock should be visible in v$lock.
latch free A process is waiting to acquire one of Oracle's many memory-access latches. Because hard parsing and cursor allocation both use many latches, unsharable SQL statements and poor cursor (closing too much, not closing enough) are common causes of waits on latches.
library cache load lock More than one process has tried to load an object into the library cache at the same time.
library cache pin A process is waiting to be able to pin an object in the library cache, normally because it wants to parse or compile that object. The object might be a view or a stored procedure.
log buffer space The redo log buffer has filled up before LGWR has been able to clear it. Typically, this means the application is generating more redo than the system can handle as tuned.
log file parallel write LGWR itself waited for redo blocks to be written to the whole redo group.
log file sequential read A process waited for ARCH to finish archiving the redo logs.
log file sync A process waited for LGWR to finish its writes to the redo logs: this happens during a commit and is one result of overuse of the commit statement.
SQL*Net more data from client A process had to wait for multiple data packets to be reassembled; this commonly means that long SQL statements have been submitted in-line.