Semi-Annual SingingSQL Newsletter #8
December 31, 2008
If you wish to unsubscribe at any time, just drop me a line at email@example.com.
I’m taking a lowest-common-denominator approach to the format of this newsletter, sending it as a straight-ASCII note so anyone at all can access it securely, but if you’d rather read something more formatted, go to OnlineNewsletter08. I’ll also have links scattered throughout for supplementary materials.
Although this material is copyrighted, feel free to forward this newsletter, as is, to any friends who you think might be interested.
I’ve had a few people ask about coming class offerings, and it occurred to me that I don’t have a good way to announce them if they come up suddenly, because I want to keep my promise that this newsletter only comes just four times a year. Therefore, I’m starting a new list for anyone interested in announcements of classes and other offerings (such as presentations to local OAUGs) that might come up without enough warning to make it into this newsletter. If you are interested (and there is certainly no commitment in this, on your part) just drop me a line at firstname.lastname@example.org, and I’ll add you to this new list. Please be sure to include your location and some range of area within which you’d consider attending something if I happened by. The area can be as broad as a continent or as narrow as a town – “Let me know if anything comes up in Asia.” to “Let me know if you have a class coming to Darwin, Australia.” Even if you’re off the beaten path, we might get lucky. I won’t send notes for this second list except to the people in the area where I’m offering something, and I doubt you’ll see more than a note or two from this new list per year, even if your area is broad.
If you have any comments, ideas for future newsletter content, or suggestions, feel free to drop me a line, at email@example.com – I’m always happy to hear from you!
To see a cumulative listing of all news, including old news from past newsletters, see all news.
To see a cumulative listing of all links, including links already listed in past newsletters, see all links.
By David Jordan
David has been working with relational databases since 1984 and Oracle
beginning with version 4 on the IBM PC AT. He has been dedicated full
time to performance related issues for several years now. He can be reached at firstname.lastname@example.org.
Like anyone who performs a lot of tuning, I frequently receive poorly running SQL statements from unfamiliar applications. In one recent case I received the following:
FROM STUDENT S
WHERE S.REC_NUM > StuRecNum AND
EXISTS (SELECT *
FROM TESTS T
WHERE T.STUDENT = S.REC_NUM);
I have altered the problem domain to STUDENTS and TESTS to obscure the source as it is part of a vendor written application, but everything else remains the same.
The note told me the statement was consuming six hours in the QA system and only two hours in production. I checked, and sure enough QA was using a Nested Loop join while production was performing a Hash join of a full scan of each table's index.
It's very convenient to have an environment with similar or greater volumes that demonstrates a better execution plan, and the temptation was to write back with a hint to force the better plan and be done with it quickly.
The problem with the quick fix was that the basic form of this query really bothered me. Why was it selecting the minimum of the column REC_NUM where that same column is greater than a bind variable? And why was it consuming at least two hours doing it? A little investigation revealed that the REC_NUM column is the system assigned key for each table. OK, so why is it asking for the next record greater than the one in hand in this manner? What business meaning could that possibly have? To answer that question I had to look to the application for context:
CURSOR testsCur(StuRecParm NUMBER)
IS SELECT REC_NUM
WHERE STUDENT = StuRecParm AND
(LOCKED IS NULL OR LOCKED <> 'A')
ORDER BY DATE, TIME;
StuRecNum := 0;
FROM STUDENT S
WHERE S.REC_NUM > StuRecNum AND
EXISTS (SELECT *
FROM TESTS T
WHERE T.STUDENT = S.REC_NUM);
EXIT WHEN StuRecNum IS NULL;
EXIT WHEN testsCur%NOTFOUND;
CalcTest(recNum, 'n', 'y', 30);
COMMIT; -- commit after each student
The StuRecNum variable is a manual pointer into the student table and each execution of the query shifts the pointer one record, with the cost of each execution inversely proportional to its current progress. The end result is a full-table scan where rather than touching each record once, it finds and sorts N records the first time, then N-1, N-2, N-3, …, 1, 0. That integer series sums to N(N + 1)/2 which while not geometric is still overly susceptible to table size. The reason it was taking a minimum of two hours is now quite clear. The reason the Hash join is faster than the Nested Loop join is also apparent. The break even point for a NL join via index access is typically around 20% of the table, but the average result set for all executions is of course 50%.
The developer didn't seem to understand that they could have just opened a cursor on the student table and fetched from it and the database would keep track of the cursor's position in the data set. So how might such extreme procedural coding have come about? My impression is that this was a file based or IMS application that was recoded for relational database storage, but translated literally rather than being interpreted. It has the flavor of a hierarchical database in that it steps through each parent record (student) before stepping though the child record (test). This procedural navigation is standard in that environment and someone not familiar relational data may not realize that it wasn't necessary.
Further investigation revealed that every record in the TEST table has a matching record in the student table (no join filter action) so in relational environment the student table need never be referenced at all. The following would suffice to replace the entire procedure:
CURSOR TEST_CUR IS
WHERE (LOCKED IS NULL OR LOCKED <> 'A')
ORDER BY STUDENT, DATE, TIME;
FOR TEST_REC IN TEST_CUR LOOP
CalcTest(recNum, 'n', 'y', 30);
This cursor retains the ordering of test within student without touching the student records. This version commits only once which would dramatically reduce log file sync waits (each student has only a few tests) and also avoids fetching across commits.
If you are tuning SQL from an unfamiliar application, don't assume the statement is written correctly. In fact don’t even assume it is required! Dig a little deeper if it doesn't feel right. In this case the application saved six hours by eliminating the problem query rather than a mere four hours through tuning. While most code does make sense you will eventually run across something like this.
If you are a developer take the time to fully understand the power of set processing and SQL. When you code a problem procedurally you are forcing a specific execution plan, one that more often than not is sub-optimal. While you may not be rigidly converting non-relational code to a relational DB or considering N(N + 1)/2 operations for a simple table scan you can still easily fall prey to the procedural trap.
More about the Big Picture (by Dan Tow)
My very first newsletter article in this series of newsletters was Performance and Tuning, the Big Picture, and I recommend it, still, for background to this current article. I think there is much more to say on the subject of the big picture in performance and tuning, focusing on the process of more or less continuous improvement of the slow SQL that tends to dominate IT performance problems. In that earlier article, I imply that there should in general be two sources of high-priority SQL tuning problems:
· Ad-hoc SQL tuning problems that come up because a specific business process is failing to perform fast enough to meet the business need, and this failure is already expensive, or is likely to be expensive, soon.
· Top-runtime SQL that is measured to be in the set of the few SQL statements that dominate the SQL runtime and load on a whole production database, or on a test database simulating future production load for a system under development. On most business-application database instances, no more than 20 (and often just a handful) SQL statements are high-enough load to reward tuning with substantial benefits to the business. Ideally, this top-runtime SQL is measured around the clock for a period of a week or more to produce a single, measure of cumulative runtime per statement that allows a single quantitative ranking of the top-runtime tuning candidates on each database instance.
Imagine (if this is not already true!) that you belong to a team charged with correcting SQL performance problems across an IT organization. (Perhaps you are the team, just you, alone!). Some problems will be assigned to you from outside the team, while you may find other problems proactively:
Let’s first consider ad-hoc SQL tuning problems, which likely come from outside the team, originating either with end user complaints or with developers outside the team who need help tuning a slow process:
The first question to consider is the nature and priority of the problem, from the perspective of the business. Some standard questions can shed light on the priority:
· Is the process online, such as an online form, or user-driven batch, such as a periodic user report, or middleware, usually some sort of batch process run in the background, often apart from users’ awareness, for example a process that keeps data in synch within (for example, materialized views) or between database instances? Online delays are generally the most expensive, minute-for-minute, because the generally must take place during peak hours on the system, when it is most heavily loaded, and because they directly and immediately cost end user productivity, as the end user wastes time “watching the hourglass turn.” User-driven batch load can generally run in the background – the user need not be idle while it runs, so ideally the productivity cost is low. Still, though, it may be useful to the business to present the result, such as a report, on the same business day it is requested, so it may be best to run it during peak business hours, adding to peak load, and if the end user requests it at 4:00, he or she would probably rather see the request return at 4:01 than at 4:50, even if that individual can do useful work while waiting. Also, occasionally, user-driven batch load lies squarely on the critical path for a vital business process, such as shipping product before a quarter-end deadline, so when that load is slow, it may endanger revenue, or it may place externally committed deadlines, such as production of quarter-close results, or payment of payroll, at risk, with potentially high costs to the business. Middleware processes are the least likely to matter much, when it is slow. These can usually run during the least-loaded hours, so their impact on load may not matter – they simply consume resources that would have been idle during the hours they run, anyway. Even middleware has deadlines, though, even while end users may not be conscious of those deadlines. An overnight process refreshing a reporting database, or refreshing materialized views may need to complete by morning work hours so that users have access to the refreshed data as their workday begins. At the very least, such refreshes should complete before the next refresh begins! Also, on Oracle, any too-slow query servicing a refresh process may simply error out, with a snapshot-too-old error, so that a performance problem becomes a problem preventing the process from completing at all. On non-Oracle databases, which do not use rollback to see time-consistent data, there may be no snapshot-too-old error, but this means the process is either seeing dirty data (which may be progressively more nonsensical if the query sees data spread over hours of changes), or the process is read-locking the data over its long runtime, with likely disastrous results to other processes. (While snapshot-too-old errors on Oracle are sometimes mildly inconvenient, I believe that Oracle’s rollback strategy that triggers those errors is vastly better than the alternatives of either locking rows for long reads or risking confusing and misleading results with dirty (time-inconsistent) data.)
· If the process is a slow on-line process:
o Is it affecting more than just the productivity of its internal users? Does it prevent meeting some external business deadline, or does it keep customers on the phone waiting, while the internal user waits for the screen to bring up the customer’s data, or is it directly used by customers, perhaps, online? (If we pay an employee to perform a slow online process, the employee may be frustrated and less productive than he or she would be if the process was faster, but the employee is unlikely to quit. A customer that is frustrated by a slow online process, on the other hand, may very well find another vendor, and a loss of market share will likely cost far more than the loss of a few hours, or even a few hundred hours, of employee productivity!)
o How much time does the slow process cost per week? This requires knowing not only the length of the process delay that you are trying to fix, but also the frequency with which that delay occurs! An organization may declare that any online delay longer than a second is a defect, by definition, but it is nevertheless true that the business-cost priority of a 10-second online employee-user delay defect that happens once per month is over ten-thousand times lower than the priority of a 10-second online employee-user delay that happens 500 times per day! This question is particularly important to ask if you have competing priorities (and who doesn’t!?) and you suspect that the driving cause for the ad-hoc tuning complaint is some sort of failure to meet a more-or-less arbitrary performance target, rather than a business complaint driven directly by actual pain to the business.
· If the process is a slow user-driven batch process:
o Is the slow batch process preventing some business-critical process from completing on time, for example, delaying shipment of product or preventing (or threatening to prevent) the business from meeting an external business deadline?
o Is user productivity harmed by the process being slow? (If the answer is “Yes,” this would generally imply that this is a process that needs to run during business hours, likely during peak-load hours.)
o Can the process be run conveniently during nighttime, off-peak hours? (If so, then there may be no costs to the slow performance at all (yet, anyway), as long as the process completes without error and completes before users arrive the next morning.)
o How much cumulative load (hours per week, for example) comes from this process? Assuming that the process needs to run during peak hours, the primary cost of batch delays can be simply the load created by the batch process, especially (on a well-tuned system) the increase in CPU load.
· If the process is essentially middleware:
o Is there any reason at all that it needs to run during peak hours? If not, is it run off-peak, and, if not, how soon can this be corrected?
o Is there a failure to meet some true business requirement (not just an arbitrary internal deadline) owing to how slow the middleware is, currently? Is the particular performance problem being addressed the entire reason for that failure, or are other, possibly-more-important problems contributing to that failure? (Don’t blame the last link in the chain, necessarily, if the real problem is that the whole chain is too long!)
Next, consider the priority of problems found through monitoring for high-load, high-runtime SQL:
First, it is vital to consider whether you have the data and analysis in place to properly rank these problems according to true load-related and runtime-related costs to the business. The objective is to have a single prioritized list at any given time of problems for any team or individual working those problems. It does little good to have multiple, competing, inconsistent lists of priority-orders for problems, where, for example, a given statement potentially to be tuned is top-priority on one list, tenth-priority on another list, and nonexistent on a third list – this just creates confusion about what the actual priority is! (It is inevitable that priorities shift, over time, as usage patterns shift, problems are solved or go away on their own, new problems appear, and we simply learn more, so I am not suggesting that the priority list should not change, only that we should have one priority list at a time! What should this list reflect? As I argue in the earlier article Performance and Tuning, the Big Picture, time is the proper, natural dimension for prioritizing performance problems, the cumulative sum of time consumed by a problem, especially, in our case, by a given repeating SQL statement. This simultaneously takes into account the amount of time consumed per execution, and the frequency of the execution, so a statement ranks according to the product of average time consumed per execution and its run frequency, or, equivalently, according to the sum of the actual individual times consumed across all the runs. I find that technical measures, such as cumulative logical I/Os, or cumulative physical I/Os, are very poor substitutes for time consumed and map poorly to the runtime and load measures that map directly to real business costs.
Technically, there are actually two measures of “time consumed” that matter, in ranking SQL statements:
1. There is the wall-clock time consumed to execute the statement, end-to-end.
2. There is the amount of resource-time consumed, for a system resource or resources that may become overloaded, especially CPU time and physical I/O time. CPU time, in particular, costs more than most appreciate, if the process runs at peak. Even if you have spare CPU at peak, cutting peak load may delay the need for a costly upgrade, or may allow you to decommission CPUs with surprisingly large savings in licensing and support costs, for hardware and especially for software, even if you cannot sell back the decommissioned CPUs. Even if the savings for a single fix appear marginal (possibly too small to allow a whole CPU to be taken off-line), when you combine savings for one fix with savings for other fixes, these can really add up!
Most SQL statements are single-threaded, and spend almost all their runtime in some combination of CPU time and physical I/O time, so for most statements these two measures are nearly the same, if we choose to give equal weight to CPU resources consumed and disk resources consumed. When SQL uses parallel execution plans, however, or encounters unusual delays, such as locking, these measures can be out-of-synch, somewhat, although any compromise between them would be necessarily a judgment call. In an ideal world, also, we would give unequal weight to wall-clock times consumed, in auto-ranking statements to be tuned, weighting online runtimes higher than batch, and weighting peak-hours batch higher than off-hours batch, for example. In practice, I find it easier to report according to a single set of summed times, without weighting, and then to consider adjustments in priority during the actual process of addressing the top time-consuming SQL. (I have a set of proprietary scripts that do the data collection and reporting (on Oracle) of what I call “TopSQL” for my consulting practice. These have hundreds of hours of my personal development time behind them, so I don’t generally give them away (sorry!) – good SQL ranking of proactively-discovered performance issues is a major advantage, but if you consider what I say, above, you may “re-invent my wheel”!)
Assume that you have a single list prioritizing the highest-runtime SQL according to total runtime over the past week or two, collected and reported with automated tools. How might you then adjust that set of priorities so that you fix problems in the ideal order for the business, picking the “biggest, lowest-hanging fruit” first? I have heard many assume that it is important to measure “efficiency” of the statement, for example, logical I/Os per row returned, to get an early view of the likelihood that a query can be much improved, to avoid wasting time on “unsolvable” performance problems – high-load SQL that is already perfectly tuned. I do not recommend this – in my experience, almost every statement near-enough to the top of a high-load SQL list can be improved by a factor of at least two, and frequently by factors of five or ten or more, and “efficiency” measures do a very poor job of predicting the potential for easy, major gains! (This is not to say that all major gains are easy, but automatable measures of efficiency utterly fail to predict which problems will be easy and which will be hard!)
There are some adjustments you can make, though, to a priority list based strictly on cumulative runtimes:
· Does the process, according to when it runs, how often, from what module, etc., appear to be on online process, or batch? If it is a batch process, is it something like a user-servicing report, or is it just background middleware?
· Does the process run mainly off-hours, or does it appear that it should run off-hours? (The solution may be simply to alter the runtime schedule, rather than to tune the SQL.)
· See the above questions regarding ad-hoc SQL priorities – many of these questions can be answered, or at least guessed at with reasonable confidence, based on data about run frequency and times, average rows returned, modules driving the SQL, and by examining the SQL, itself, without necessarily knowing the business use of the SQL in advance.
· When did the problem appear last, and did it appear multiple times, in a pattern that implies it is likely to re-appear, or was the usage pattern one that makes it appear to have disappeared? Don’t waste too much time on a problem that has likely gone away on its own – this is especially true if you are proactively examining SQL on a development-testing platform, where SQL is under more flux than is usual for production code.
With adjustments such as above, and absolute numbers telling you runtime hours per week, you can have some confidence of how much a many-fold speedup to a high-load SQL tuning problem would be worth. If the fix turns out to require merely a new index, or a change to just the code specifying a single SQL statement, then the gain is likely well worth the effort, even if the statement is fairly low on the priority list. If, however, the fix involves a major rewrite of the application or schema design, for example, an application rewrite to avoid the SQL altogether, or a major denormalization, then the fix might not be worth the cost, especially for problems that appear minor after making the above priority adjustments.
To see a cumulative listing of all research problems (the one below, and the earlier ones from the earlier newsletters), see all research problems. There has been no solution submitted yet for the earlier-published research problems.
Occasionally, cost-based optimizers conclude that the most-efficient path to the data in a SQL query is to read two un-joined rowsets, independently, and to find the Cartesian product, that is, to find all possible combinations of the rows in the two sets, where the product, in general, has m*n rows, where m is the number of rows in the first set, and n is the number of rows in the second set. (After the Cartesian product is performed, there may be more joins, usually conventional joins done as nested loops or hash joins, and it is often the case that after these additional joins are performed, there will eventually be a link-up between the two sets originally joined by Cartesian product, where the later-joined tables ultimately join to each other and to both Cartesian-joined sets. The result of these later joins that join the two Cartesian sets to each other can discard many, or even most, of the rows joined in the intermediate results prior to completing the execution plan.
In general, I have found that these Cartesian joins look attractive to the optimizer when the optimizer estimates that one or both of the rowsets being joined has only a single row (or even less), on average. When at least one of the Cartesian joined sets does have just one row (or none, even better!), this works out nicely, in general. The problem is that when the estimate is wrong, even by a little bit, disaster can result. Consider the following hypothetical case:
FROM Order_Details OD, Orders O, Order_Sources OS
Let’s say that there are 10-million order details, with 1-thousandth of those shipped since yesterday (and the optimizer knows this), 1-million orders, and 500 order sources, with histograms reflecting the facts that 20 order sources have Source_Country=’JAPAN’ and 20 order sources have Source_Type=’RESELLER’. The optimizer makes its usual assumption of statistical independence between these two conditions, calculating that most likely there is at most one row meeting both conditions. Under this assumption, the best possible path to the data is to go first to the Order_Sources table, discarding all but one row (or discarding all rows, and halting execution right there!), then to read the 10000 recently-shipped order details with an index on Ship_Date, forming a Cartesian product between those details and the one Order_Source, then reach the orders through its primary key, with nested loops from the 10000 rows in the Cartesian product. If all goes well, that really is the best plan! What happens if just one assumption is a bit off, though? What if it happens that all the Japanese order sources are resellers? (Perhaps the business model in Japan is rather different than for the rest of the company’s operations!) Well, in this case, the Cartesian product will find 200-thousand combinations of Order_Detail and Order_Source, and the optimizer will perform 200-thousand nested loops to Orders. (These will reach any given order at least 20 times, possibly with pretty good caching, but surely with very wasteful logical I/O!) The plan is now a terrible idea, nowhere near as good as a simple plan following nested loops from Order_Details, to Orders, to Order_Sources, and worse still compared to a plan that pre-cached the rows from Order_Sources, then did a hash join of those rows to the result of the nested-loops join between the other two tables. Even if the correlation between these two conditions was only very weak, and there were just two Order_Sources meeting both conditions, the Cartesian product would be a dramatically bad idea.
So, simply, the research problem is how to avoid all the cases of harmful, non-robust Cartesian products, without losing their occasional benefits?