Quarterly SingingSQL Newsletter #1 

February 6, 2006

Introduction

This is the first edition of the planned quarterly SingingSQL newsletter, I (Dan Tow) will begin with a few remarks. If you found this newsletter without subscribing, but you wish to subscribe to receive new editions by email as they appear, just drop me a line at dantow@singingsql.com. If you wish to unsubscribe at any time, just send a note to the same address with UNSUBSCRIBE in the subject line. I’ll likely eventually get around to setting up a proper email group, through some third party, so that eventually people can subscribe and unsubscribe without communicating with me, but this is simplest for a start, I think. I also understand that BCC ’d addresses sometimes trigger overzealous spam filters to block mail, so let me know if you think it might become an issue for your email system.
 
 I’m taking a lowest-common-denominator approach to the format of the newsletter, sending it as a straight-ASCII note so anyone at all can access it securely, but if you’d rather read something more-formatted, you can follow a link from the email newsletter to these online, more-formatted editions. I’ll also have links scattered throughout for supplementary materials.
 
The layout of this newsletter will almost certainly evolve over time, but for now I’m planning on some regular features:
  1. SingingSQL News: Recent and coming SingingSQL events, and shameless plugs. This first edition will cover news back further in time than future editions.
  2. Links: Links to externally published online material that may be of interest. If this is my own work, I’ll say so, and describe it in brief. If you know of others’ online published work, including your own, that you’d recommend, let me know, and I’ll try to include it, with my own description if I find the time, and/or with your brief description.
  3. Featured Article(s) and/or Script(s): These will be written by me, usually just one or two short articles per newsletter.
  4. Research Problems: These will be some combination of questions, and or answers to earlier questions. I’d welcome contributions from any of you, if you have answers to these research questions I’m proposing, or new questions that you would like to propose, and I’ll give credit to my contributors, of course.

Although this material is copyrighted, feel free to forward this newsletter, as is, to any friends who you think might be interested.

If you have any comments or suggestions, feel free to drop me a line, at dantow@singingsql.com – I’m always happy to hear from you!

SingingSQL News of Public Events

Old news

  • SQL Tuning is now available in Polish and Russian.
  • I presented When Good Optimizers Make Bad Choices at the Hotsos Conference, Dallas, Texas, in March, 2005, and again, with some additions, at the Northern California Oracle Users Group, August, 2005.
  • I presented my Top-Down SQL Tuning Seminar at Oracle Day, in Iceland(!), May, 2005, and at the Northern California Oracle Users Group, May, 2005. This is a one-hour crash introduction to and summary of the material in my book and 2-3-day course.

New news

  • I’ll be presenting my 2-day Top-Down SQL Tuning course based on my book, through Miracle, in Denmark , February 15th and 16th, and in Iceland February 20th and 21st. It is not too late to sign up, if you’ve always dreamed of visiting Denmark or Iceland in the middle of winter! DenmarkClass IcelandClass 
  • I'll be presenting Fixing Broken SQL at the Hotsos Conference, Dallas Texas , on March 7, 2006 . This will describe methods for finding and fixing functional bugs in other developers’ SQL as a side-effect of your efforts to tune that SQL, even with only minimal knowledge of the business context of that SQL. HotsosPresentation. 
  • Tentatively. I'll be presenting my 2-day Top-Down SQL Tuning course, June 1-2, in the San Francisco Bay Area - email dantow@singingsql.com well ahead of time, if you are interested. Cost: $1000 per person.

Links

Externally Published Articles

Range-Keyed Queries: In this O’Reilly-sponsored article, I describe strategies to handle a recurring class of problems where part of the table key is a plain-vanilla ID, but another part of the key specifies a range, most-commonly a date range that specifies the dates for which the record applies. For example, time-dependent historical data is frequently specified with the date on which the record first appeared, and the date on which the record ceased to be current.

Wrong-Errors Bugs: A New Class of Bug: In this O’Reilly-sponsored article, I describe ways to eliminate a new class of bug I’ve described, dubbed the wrong-errors bug, and I propose RDBMS enhancements that would make these bugs much less common and problematic.

Presentation Available Online

When Good Optimizers Make Bad Choices (Sometimes): In this presentation to the Northern California Oracle Users Group (NoCOUG), I described why human SQL tuners can still “beat” cost-based optimizers at the “game” of SQL tuning, by exploiting enormous “unfair” advantages that human tuners have over even the best optimizers. The presentation is accompanied by a White Paper and files with supplementary material, including all of the material necessary to reproduce the examples presented. This is an enhanced version of a presentation I gave earlier, at the 2005 Hotsos Symposium.

Featured Article

Performance and Tuning, the Big Picture

With this being the first SingingSQL newsletter, I think I’ll cover the big picture in this first article. The material covered is empirical, not theoretical, reflecting my personal experiences in 16 years of tuning. Your own experiences may differ – add them to mine, and you may approach something closer to the universal tuning-world truth.

I come from a physical-sciences background, and we always had to pay very close attention to the units that went with any given measurement – 36 miles is a very different distance than 36 millimeters, and you cannot even properly compare 36 millimeters to 36 grams – comparing millimeters to grams amounts to comparing apples-and-oranges, so to speak! If you gather together a roomful of performance specialists (as often happens when an enterprise develops a really desperate performance problem!), you are likely to hear a babble of units, “context switches per second, CPU cycles, network round trips, bits per second bandwidth consumed, megabytes memory consumed, system calls per second, pages per second, swap-ins per second, logical reads, physical reads,” as specialists in hardware, UNIX administration, network administration, and database performance all view the problems from their own perspectives, like so many blind men each feeling a piece of an elephant. It can seem impossible to determine what the highest-priority problem is when each narrow specialist expresses the problem in units that mean very little to the other specialists, and when there is no obvious way to compare two “problems” that are each expressed in apparently-incompatible units. The key to resolving this problem is to recognize that there is ultimately only one thing that really matters, in tuning and performance:

MONEY

If the “performance problem” does not (and will not, in the future) somehow cost the business money, then it is no problem at all! If one performance problem costs the business twice as much as a second performance problem, then solving the first problem is twice as important! There are three components that make a performance problem cost money:

  1. The importance of the business process being delayed, in cost-per-hour of delay in completing the business process.
  2. The cost/hour of the physical resources consumed during the delays – a business process may finish early enough to cost the business nothing, but require so much hardware that an performance fix might save high costs, allowing the business to run on less-expensive hardware, with fewer CPUs, for example.
  3. The duration of the summed delays.

Working out the relative costs of the performance problems is very useful because there are typically hundreds, even thousands, of tiny problems and only a handful of significant problems, and you’ll save enormous losses by solving the problems in order of there importance. (Even the few significant problems often differ in importance by large factors. Separate problems take significant time to resolve, typically, so you cannot just solve them a hundred at a time. You can, however, generally expect that any given problem will be solved efficiently once it is the clear focus of a qualified expert’s attention – I’ve never yet found a high-priority problem that couldn’t be solved!) Unfortunately, it is impractical to directly measure the money lost during a series of IT-system delays. We must substitute something we can measure, in the place of money, preferably something that maps very closely to true money costs. There are two sorts of costs to consider:

  1. Extraordinary costs: Occasionally, it is obvious to the business that one or two particular processes are responsible for almost all of the performance-related costs. In this scenario, generally, one or two business processes are causing obvious pain. These may well threaten the core health of the business.
  2. Routine costs: Frequently, performance-related costs are spread over many processes, and the business may have only the vaguest idea of the relative costs among the delays seen by the diverse processes.

In the first case, it is imperative to focus on the time spent by the business-critical processes that are causing extraordinary pain to the business. Time spent by non-critical processes, in the first case, can be initially ignored. In the second case, you can treat the time spent in all processes roughly equally, and the objective is to reduce all delays, across the whole system. There is a trade-off involved in the choice between focusing on individual processes or on the whole system:

    • If we focus on a single chosen process, we'll get excellent information regarding where that process spends its time, and exactly what to fix to make that process faster, which may be very different from what we might fix if we looked at the whole system. This is clearly the way to go if you have obvious extraordinary costs from a single horribly broken process.
    • If we do not know which process performance issues are costing the company the most money, we could still just choose one process at a time that seems likely to be important (perhaps a process chosen according to who complains the most – the “squeaky wheel” process), and we could still gain some advantages of a single-process focus when analyzing that process. We could then repeat this analysis for one business process after another. This may be tempting, but I believe it is a mistake. Whole-systems analyses are inherently imprecise, for example, often making the assumption that time spent in one process has the same importance as time spent in another, but choosing which business process to tune next according to a vague, unmeasured impression of the importance of that process’s performance problems is even more likely to fail to find the most important fixes early (missing high-impact optimization opportunities until very late in the process).

Whether we focus on problems in a single process (extraordinary costs) or perform whole-system analysis (routine costs), we can adopt a single stand-in for money, a stand-in that we can measure:

TIME

If a problem does not keep an end user, or the whole business, waiting for some result to appear or some significant process to complete for more time than the business can very comfortably afford, then there is no performance problem! If the problem does cause a long wait for a result or a process completion, then how much it matters is simply a function of how long the wait takes, and the business importance of the business process being delayed – the ultimate importance of the problem does not depend on whether the delay is for two minutes of network latency, two minutes of paging, two minutes of physical reads, or of logical reads, or any other operation! The end user is unlikely to know the reason for the wait, and is even less likely to care! Once we recognize that time is our focus, the units problem resolves itself. We may choose seconds, or minutes, or hours, or hours of delay per hour of wall-clock time, but we no longer have a problem of comparing apples and oranges once all performance issues under attack are expressed, and compared, in terms of units of time. (A self-appointed performance expert may persist in describing an alleged problem in units that do not map to time. I recommend that you insist that the problem be expressed in terms of the time lost, at least approximately, and that the translation to time be justified. If the “expert” has no idea how much time a problem costs, or even how to find out, but still insists that the problem is important, I would suggest finding a new expert.)

Almost always, we find that a very modest number of problems are responsible for a large fraction of the delays we need to get rid of. Often, we need repair only a single source of delay to enormously speed a given business-critical process, and even looking at a whole, complex IT system, we can usually eliminate over half the runtime with fewer than 20 fixes, and we can safely ignore all other performance issues as insignificant. If we do not express problems in terms of runtime lost, however, we may need hundreds of guesses before we stumble upon the set of fixes that eliminates most performance-related business losses, or we may, more likely, never find the right problems to focus on.

In my own career, I did not even begin, in 1989, with a focus on database delays, which I knew nothing about measuring. I happened to know a little bit about looking at UNIX system CPU and I/O statistics, so that was naturally my first focus. While searching for, and finding, performance problems in hardware, in operating systems, in local-area and wide-area networks, in special database bottlenecks (such as locking and latching), and in routine database runtime (performing excess physical and logical I/O to execute inefficient SQL), I’ve developed a simple rule of thumb regarding where to look for business-applications performance problems first, how to reduce what could be a very complex search to something that is usually much simpler, focusing, as needed, on either a single process that requires tuning, or on the whole system at once:

Determine the time spent executing the handful of
 highest-runtime SQL statements.
 
If that SQL time accounts for most of the overall
 runtime of the processes that execute those
 statements, and the processes you are concerned
 with, don’t worry about anything else.
  
If you can reduce your whole system-tuning problem to the problem of reducing the runtimes of a handful of SQL statements, then you are nearly finished! There is a popular conception that the CBO (cost-based optimizer) usually leaves very little room for improvement, but this is not so. My own surprisingly consistent experience, across a highly-diverse set of business applications is that, given the top-ten highest-runtime SQL statements on a production system, at least 9 out of 10 can generally be tuned to run at least twice as fast as before, most of them achieving a speedup of at least 5-or-10-fold!
 
 ****************************************************************************************
 (The above statement is the most important assertion in this article, I believe. If you don’t know this surprising fact, it is all too natural to assume that you’re just stuck with most performance issues, and to give up too soon, when you don’t know how to solve the harder problems quickly. If you are not seeing roughly these results, I strongly recommend another, deeper look!)
 *****************************************************************************************

A typical breakdown of Top-10 SQL, after tuning, would look roughly like:

  • 3 statements fixed with targeted index changes. These are relatively low-risk because you know in advance that they fix a Top-10 performance problem, and it is very unlikely that they will create an even bigger problem than they have fixed. (There is a real art to minimizing the chances of creating any significant new problem, though – a good topic for a future article.)
  • 3 statements fixed with inside-the-box SQL changes, the sort of changes the CBO might have come up with on its own, if it was as clever and well-informed as a good SQL tuner. (The human tuner has information that is unavailable to the CBO, and far fewer constraints, so it is not really a fair contest, but that keeps us in business;-) These changes have super-low risk because they are guaranteed not to change the functional result of the changed SQL, and (except for reducing overall load) they will have no effect on the rest of the system.
  • 3 statements fixed with moderate outside-the-box SQL changes. These involve more than just replacing the old query with a functionally-perfectly-identical query. Instead, they require reformulating the approach the application takes to the problem, for example, figuring out that the query need not return so many rows, or that it can safely change its functional behavior in some unlikely corner case. Although these changes are usually not severe, for example, not usually requiring denormalization, these are changes the CBO could never make on its own, because it dare not risk a functional change inconsistent with the exact SQL the database is asked to execute.
  • 1 statement with no recommended fix. Even this statement could likely be fixed if the stakes were really high, but it is just not worth the trouble, because the fix (such as a radical denormalization) is more expensive than the problem.

Research Problem

Stored Outlines Sensitive to Schema?

Is it possible to store two different outlines for the same SQL under the DEFAULT category, where that SQL is interpreted differently, depending on which SQL it is interpreted from, and to have each schema use the appropriate outline? For example, in one schema, "Orders," which is referenced in a query, might be a simple, local table, while in another schema, "Orders" is a UNION-ALL view of the local table and a remote-linked archive table, and the execution plans of the query must necessarily depend on the schema from which the query runs. If it is impossible to store two outlines under the DEFAULT category, what happens when the user attempts to execute the SQL, with user_stored_outlines=TRUE, from the wrong schema, the schema to which the outline does not apply? Does Oracle simply ignore the outline, or does it generate a plan that is wrong for that schema, even a plan that results in a wrong-rows bug? Categories seem to be a mechanism to let the process point to the appropriate set of outlines, depending on which schema the application attaches to, but this is inconvenient if the application has not been specifically written to know that it needs to ALTER SESSION to set user_stored_outlines=<The Correct Category for That Schema> when it connects to a particular schema. Is there some way to automate an ALTER SESSION command so that it happens automatically (some sort of trigger on connect?) when the schema is set or changed?

Copyright 2006 Dan Tow, All rights reserved

Home