How is SQL Parsed?

David Clement

September 2004

Oracle versions 7-8

SQL parsing is a sequence of six steps in two phases called "soft parsing" and "hard parsing." Soft parsing consists of the first three steps of the process, and hard parsing of the last three.

The first step in the soft parse is a syntax check of the SQL statement that has been submitted to the parser. This step determines if SQL keywords are identifiable, if a legal name follows the keyword "from," and so forth and so on.

The second step is a semantics check to determine if the tables, columns, stored code, and other entities referenced exist in the environment and if they are related to each other as the SQL statement requires, that is, if the columns unambiguously belong to the tables and so forth. At the same time, the parser checks permissions to determine if the user can access the data structures referenced. This design choice is basically why Oracle returns the ORA-00942 error, "Table does not exist," both for a semantics violation and for a permissions violation.

Now comes the outline check. At this point in the process, it is evident that this particular SQL statement could have been processed before, and so the parser now tries to find the statement in the SQL cache (also called the library cache) in the shared pool of the System Global Area (SGA). This cache contains recently processed SQL statements. The parser does not actually compare SQL statements byte by byte. Instead, it generates a hash value from the input literal and compares this to the hash values stored in the SQL cache. The hashing algorithm is affected by capitalization and even by white space, so the result of the comparison is almost the same as the result of a byte-by-byte or character-by-character comparison of string literals. If the hash value is matched, then the cursor and other memory structures previously allocated for the statement can be reused, if they still exist.

If the SQL statement could not be found in the cache or the memory structures have been lost (for example, because the cursor was overwritten by another cursor that was more urgently needed) then three more steps are required. Performing these three steps is what distinguishes the "hard parse" from the "soft parse."

Step four of the parsing process is generating execution plans for the SQL statement. The memory structures used for this purpose cannot be allowed to be corrupted and are therefore latched, that is, access to them is controlled by other memory structures called "latches." The execution plan generator must check for the latches and wait until they are released and acquire them before it can actually run.

Two Oracle initialization parameters, optimizer_search_limit and optimizer_max_permutations, limit the number of execution plans that will be generated. Their default values (5! and 80000) represent a compromise between the need to find the best execution plan and the need to finish generating execution plans in a reasonable amount of time.

Next, step five, the optimizer can assign a cost to each step in each execution plan. One of the optimizer's primary sources of information is database statistics such as the number of rows in a table, the table's high-water mark, the height of the indexes on that table, and so forth. The other primary source of information is operation ranks, assigned by Oracle engineers to the various row-source operations in the knowledge that certain data access methods and database kernel subroutines take longer than others.

From these two kinds of values, the optimizer calculates low costs for simple operations on small tables, and high costs for complex operations on large tables (unless something misleads it). Once this calculation is done, the optimizer simply chooses the cheapest execution plan from those available.

The final step is to generate a binary representation of the chosen execution plan and to cache it for fast access.

The soft parse consumes much less time than the hard parse. Also, since hard parsing requires memory latching, it scales worse than soft parsing (since anything serialized scales worse than an equivalent concurrent operation). For these reasons, it is important to use coding techniques that reduce hard parsing to a minimum.