Index Bloating

David Clement

September 2004

Oracle versions 7-8

Oracle's default form of index is the b*tree index segment. This is a sorted list of row identifiers that can be accessed by a balanced tree. This tip won't go into binary sort algorithms and index balancing, but as a quick reminder, the basic idea is that, if you're looking up the row with the indexed last name 'SMITH', you check to see if 'SMITH' is in the first half (A-M) or the last (N-Z), then in the N-Z block to see if 'SMITH' is in the first half of that (N-S) or the last (T-Z), then in the N-S block to see if it's in N-O-P or Q-R-S, and then you can rapidly match it under S for an index unique access. Two notable advantages of this kind of indexing are that it works with any sequence of values, as long as they're sorted, and that it is more efficient the longer the sequence.

The blocks in an Oracle index segment form a doubly-linked list as well as a tree-sorted list. This is to optimize index range scanning. If a query needs to retrieve rows for June through July from a large table, it can look up June and then scan blocks by means of the list pointers until it reaches August. This works because the indexed values are sorted.

A logical implication of this indexing scheme is that if an indexed row is deleted, the free space left in the index segment cannot be reclaimed immediately. The indexed rowids have to be in sorted order, so there is no way that the next indexed item can be written into the first free space. Oracle also needs to allow for rollbacks; it cannot be sure, until a commit, that the free space will not be needed again by the same item that was just deleted. Oracle therefore responds to the deletion of an indexed row by merely leaving the space for that item empty in the index segment. In fact it is flagged as empty, not actually cleared out. If a new row is inserted that should use that space, then the space will be recycled, but not otherwise.

This means that the indexes of volatile tables are in danger of bloating, as I like to call it. Index bloating occurs when more and more space in the index segment is unused because indexed rows are deleted but never replaced. In extreme cases the majority of the blocks in the index segment are empty. I have seen a case in which one index wasted 800 megabytes of disk. This imposes a high penalty on an index range scan; naturally, index unique access is less affected.

This situation can be fairly easily rectified by rebuilding indexes. Oracle allows index rebuilding while users are connected and working. This is the syntax:

alter index indexname rebuild; 
Rebuilding an index is transparent to the end users, but it involves sorting all the data and reallocating the storage in the index segment, so it is not the least expensive process.

An index of a static or slowly changing table is in no danger of bloating. So it is not a good idea to write a script that rebuilds all the indexes in a tablespace or a database. Most of the CPU cycles consumed by such a script are wasted.

Instead, indexes should be periodically checked for usage and only those that use far more space than they need should be rebuilt. The Oracle data dictionary includes a temporary table called index_stats that holds the most recent result of an analyze index indexname validate structure command. This table includes three columns, btree_space, used_space, and pct_used, that will tell you how efficiently the index uses space.

Some senior DBAs advocate rebuilding any index that uses less than 75 percent of its allocated space. This seems a reasonable rule of thumb.