Tricks with Analytic Functions
4 March 2008
Oracle versions 8-10g
Oracle introduced SQL analytic functions in 8.1.6. As I recall, the documentation stated that their purpose was to support data mining operations in data warehouses. However, the use of these functions is by no means limited to data warehousing. They are intellectually interesting and an amazing performance enhancement at the same time. The enhancement is that, as a rule, a SQL query using analytic functions can generate results from one single pass for complex operations that otherwise would require self-joins or multiple scans of the same data.
This tip page provides a couple of examples of using SQL analytic functions in real-world situations. It does not introduce SQL analytic functions to the puzzled junior user or provide examples that are already widely available. For introductions to SQL analytic functions, you can see Martin Zahn's helpful articles "Analytic Functions in Oracle 8i and 9i" and "The Power of Analytic Functions." I choose to cite Zahn rather than other sites because Zahn covers the syntax in full, gives examples very concisely, and provides a really clever application of analytic functions to deduplication. There are lots of other good authors, such as Rene Nyffenegger, and Tom Kyte has been discussing the uses of analytic functions for a long time, if you follow "Ask Tom." Finally, there is a book on the subject that I have just found and have not read yet, by R. W. Earp and S. S. Bagui, Advanced SQL Functions in Oracle 10g. It looks solid. All of these people explain the syntax and the most common applications, such as Top-N querying and data pivoting.
This tip is not meant to duplicate their work. Instead of that, I want to collect a few examples of cases where analytic functions were used
2. A Data Window
In this example, data is acquired by polling hardware devices, and the results are inserted into an Oracle table. Every now and then, on demand, the application has to show the user the current trend. Is the hardware running hotter? Is it cooling down?
One way to do that is to use the
In the outer query, the
3. Only the Latest Row
It is a common problem that a table contains a mixture of current and historical data and there is a frequent need to report the current data. There are several ways to model your way out of this problem. For instance, you could update rows instead of inserting, so that the table always contains only current data. If the historical data is critical for some other need, you might consider using a before-insert trigger to save the out-of-date row elsewhere before updating it. Another possibility is to use a flag column for the current rows. These data modeling solutions really seem better to me than code solutions.
However, I often see tables that mingle historical data and current data, for which you have to write queries that look like this.
The correlated subquery is ANSI-standard SQL, but it is not very fast, and it scales badly: the larger the table, the worse the query will perform. Another possibility is to rewrite the query so that it uses an in-line view, like this.
The in-line view may not always perform well. It will depend on reading the same table twice and hash-joining the results. A more elegant solution would be to read the table only once. This can be done with a SQL analytic function as follows.
This is simpler than the previous example. The nested
Performance improvements from this kind of query rewriting are sometimes as large as reducing a 20-hour job to 20 minutes.
4. More Sources
Tom Kyte covers SQL analytic functions in Chapter 12 of his book, Expert One-on-One Oracle (see the sidebars to this page) and I've just recently read a nice article by Gary Goodman on the Hotsos Web site, "The Performance Dangers of Database Independent Applications." Both of these sources, as well as those cited onat the beginning of this article, have influenced this tip page.