Notes
Slide Show
Outline
1
Fixing Broken SQL
  • March, 2006
2
Overview
  • Introduction – the Game of Fixing Broken SQL
  • SQL That Hits Duplicate Rows
  • SQL That Returns Unexpected Results
  • SQL That Doesn’t Grow with the Data
3
Introduction – Game of Fixing Broken SQL
  • Think of tuning as a game: based on my own experiences over more than the past decade:
  • A SQL tuning specialist can tune much more SQL than he or she can originate – tune other people’s SQL!
  • You’ll know far less about the application than the original SQL coder – they know the purpose best!
  • You could just focus on making individual queries faster – just be a better optimizer than the CBO.
4
Introduction – Game of Fixing Broken SQL
  • Don’t stop with just a performance-only fix!
  • Just changing the performance of the SQL will often fail to deliver the best performance fix! – the best performance fix is often combined with a functional fix!
  • Even without knowing the purpose of the SQL code, you can often deliver a functional fix that is independent of the performance fix!
5
Introduction – Game of Fixing Broken SQL
  • If the SQL coder knows the code’s purpose best, how can you find a functional fix?
  • As a SQL tuning specialist, you see much more SQL than an average coder – you will know SQL better than the average coder!
  • There are recurring patterns that frequently point to functionally broken SQL, and you can recognize these patterns even without knowing the application, without knowing the detailed purpose of the SQL!
6
Problem 1
  • What’s wrong with this query?


  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name
  •   FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID;
7
Answer 1
  • Note that Order_Types is never referenced!
  • There is no join to T!!


  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name
  •   FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID;
8
Problem 1.1
  • What happens here?


  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name
  •   FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID;
9
Answer 1.1
  • First, the database will get a Cartesian Product of all the rows in Order_Types and all the rows returned by the query you want (the query in black).
  • Then, the database will perform a sort-unique operation to discard all the duplicates generated by that Cartesian product.


  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name
  •   FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID;
10
Answer 1.1(cont.)
  • Functionally, the query result is probably fine! (When is it not?!)
  • The physical I/O is probably fine, since any extra reads will be to the same blocks, redundantly. This will actually improve your cache hit ratio (while harming your performance!).
  • The logical I/O may (or may not) be increased significantly, depending on the execution plan, but the CPU usage will almost certainly be increased significantly. (Looking for SQL with excessive logical I/O won’t always find these problems!)
  • Without the DISTINCT, we lose an implicit sort. We might need an “ORDER BY” to restore that sort.
  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name
  •   FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID;
11
Fixed Query 1, Conservative Fix
  • Note the added “ORDER BY”


  • SELECT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID
  • ORDER BY O.Order_ID, L.Line_ID;
12
Overview
  • Introduction – the Game of Fixing Broken SQL
  • SQL That Hits Duplicate Rows
  • SQL That Returns Unexpected Results
  • SQL That Doesn’t Grow with the Data
13
Problem 2
  • What’s wrong with this query?


  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID;
14
Answer 2
  • Maybe, nothing is wrong with this query.
  • Possibly, the DISTINCT is just a back-handed way to get a necessary sort.
  • Possibly, the sort is unnecessary and is a (probably mild) waste of time.
  • However, the DISTINCT may be a red flag pointing to a hidden many-to-many join.


  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID;
15
Problem 2.1
  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID;
  • .1  SELECT STATEMENT
  • ..2  SORT UNIQUE
  • ...3  NESTED LOOPS
  • ....4  NESTED LOOPS
  • .....5  NESTED LOOPS
  • ......6  TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS
  • .......7  INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id
  • ......6  TABLE ACCESS BY INDEX ROWID 1*ORDERS
  • .......7  INDEX RANGE SCAN ORDERS_N4: customer_id
  • .....5  TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES
  • ......6  INDEX RANGE SCAN ORDER_LINES_N1: order_id
  • ....4  TABLE ACCESS BY INDEX ROWID 3*ITEMS
  • .....5 INDEX RANGE SCAN ITEMS_U1: item_id,org_id
16
Clues 2.1
  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID;
  • .1  SELECT STATEMENT
  • ..2  SORT UNIQUE
  • ...3  NESTED LOOPS
  • ....4  NESTED LOOPS
  • .....5  NESTED LOOPS
  • ......6  TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS
  • .......7  INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id
  • ......6  TABLE ACCESS BY INDEX ROWID 1*ORDERS
  • .......7  INDEX RANGE SCAN ORDERS_N4: customer_id
  • .....5  TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES
  • ......6  INDEX RANGE SCAN ORDER_LINES_N1: order_id
  • ....4  TABLE ACCESS BY INDEX ROWID 3*ITEMS
  • .....5 INDEX RANGE SCAN ITEMS_U1: item_id,org_id
17
Answer 2.1
  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID;
  • .1  SELECT STATEMENT
  • ..2  SORT UNIQUE
  • ...3  NESTED LOOPS
  • ....4  NESTED LOOPS
  • .....5  NESTED LOOPS
  • ......6  TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS
  • .......7  INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id
  • ......6  TABLE ACCESS BY INDEX ROWID 1*ORDERS
  • .......7  INDEX RANGE SCAN ORDERS_N4: customer_id
  • .....5  TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES
  • ......6  INDEX RANGE SCAN ORDER_LINES_N1: order_id
  • ....4  TABLE ACCESS BY INDEX ROWID 3*ITEMS
  • .....5 INDEX RANGE SCAN ITEMS_U1: item_id,org_id
  • WHERE clause is missing condition on I.Org_ID that would make the join to Items unique.
18
Issues with Problem 2
  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID;
  • Observations regarding the non-unique join to Items:
  • If every Item sharing the same Item_ID also shares the same Name, then this query will return one row per Order, thanks to the DISTINCT. If different Orgs might name an Item differently, then the query is almost certainly wrong, functionally.
  • The inner join to I.Item_ID might succeed where an inner join to I.Item_ID and to I.Org_ID might fail, causing a subtle functional difference between the two alternatives.
  • The query implies a design question: should there be an Items table keyed on Item_ID, alone, and an Items_Orgs table keyed on  Item_ID, Org_ID?
  • What are the odds that the original developers thought all this through and got it right?!?
  • What are the odds that the corner cases were well tested?!?
19
Fixed Query 2, Alternative Fixes
  • SELECT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND I.Org_ID = :B2
  •    AND C.Customer_ID = O.Customer_ID
  • ORDER BY O.Order_ID, L.Quantity, I.Name;


  • SELECT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND I.Org_ID = L.Shipping_Org_ID
  •    AND C.Customer_ID = O.Customer_ID
  • ORDER BY O.Order_ID, L.Quantity, I.Name;
20
Many-to-Many Joins
  • Many-to-many joins are almost always wrong!
  • Many-to-many joins come from a combination of SQL errors and database-design errors.
  • DISTINCT is a common band-aid that covers up most, but usually not all of the row errors that result from erroneous many-to-many joins! (I do not mean by this that the resulting SQL is correct!)
  • The underlying problem calls for more than a partial, band-aid fix!
21
A Modest Proposal
  • Oracle should have a parameter, settable at the session level especially for testing, that triggers an error any time the session executes SQL with a many-to-many join!


  • More-sophisticated uniqueness constraints (for example, uniqueness of a column combination just for a particular subset of a table) would help, here.
22
Problem 3
  • What’s wrong with this query?


  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name Customer, L.Quantity Qty, I.Name Item
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND I.Org_ID = :B2
  •    AND C.Customer_ID = O.Customer_ID;
23
Problem 3
  • Consider a possible result:
  • 2 lines, shipped to two addresses:
  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name Customer, L.Quantity Qty, I.Name Item
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND I.Org_ID = :B2
  •    AND C.Customer_ID = O.Customer_ID;


  • ORDER_ID ORDER_DATE CUSTOMER   QTY ITEM
  • -------- ---------- ---------- --- ----------------
  • 38485    07-MAR-06  C. Millsap 1   SQL For Dummies
  • 38485    07-MAR-06  C. Millsap 2   SQL For Dummies
  • What if the second line had QTY=1 ?!?
24
Problem 3
  • Consider a possible result:
  • 2 lines, shipped to two addresses, same Qty, Same Item:
  • SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name Customer, L.Quantity Qty, I.Name Item
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND I.Org_ID = :B2
  •    AND C.Customer_ID = O.Customer_ID;


  • ORDER_ID ORDER_DATE CUSTOMER   QTY ITEM
  • -------- ---------- ---------- --- ----------------
  • 38485    07-MAR-06  C. Millsap 1   SQL For Dummies
  • Does this accurately reflect the shipped order ?!?
  • Get rid of the DISTINCT!
25
Problem 4
  • What’s wrong with this query?


  • SELECT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name
  •   FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID
  • GROUP BY O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name;
26
Answer 4
  • This is just Problem 1 in disguise: the GROUP BY just performs a back-handed DISTINCT operation!


  • SELECT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name
  •   FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND C.Customer_ID = O.Customer_ID
  • GROUP BY O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name;
27
Problem 5
  • Is there a difference?


  • SELECT … FROM … WHERE …
  •    AND :B1 = 1
  • UNION SELECT … FROM … WHERE …
  •    AND :B1 = 2;
  • Versus:
  • SELECT … FROM … WHERE …
  •    AND :B1 = 1
  • UNION ALL SELECT … FROM … WHERE …
  •    AND :B1 = 2;
28
Problem 5
  • SELECT … FROM … WHERE …
  •    AND :B1 = 1
  • UNION SELECT … FROM … WHERE …
  •    AND :B1 = 2;
  • We normally think of UNION as an operation that discards duplicates between different SELECT blocks, but the above case never has such duplicates.
  • However, UNION also discards duplicates (most likely unintentionally!) within each SELECT block!
  • UNION acts like DISTINCT, here!
29
Unnecessary Sort-Unique Operations
  • DISTINCT is more often wrong than right, frequently (imperfectly!) patching a deeper, underlying problem.
  • DISTINCT is not the only way to get a bad sort-unique step, though!
  • UNION can trigger an incorrect sort-unique step: remember that it does not just discard duplicates in different SELECT blocks!.
  • GROUP BY can trigger an incorrect sort-unique step, too.
30
Overview
  • Introduction – the Game of Fixing Broken SQL
  • SQL That Hits Duplicate Rows
  • SQL That Returns Unexpected Results
  • SQL That Doesn’t Grow with the Data
31
SQL That Returns Unexpected Results
  • Fun with Outer Joins
  • View-Using SQL
32
Fun with Outer Joins
  • Ever heard something like this?


  • “No wonder this application has so many performance problems – just look at how many outer joins are scattered all through the SQL!”
33
Fun with Outer Joins
  • Ever heard something like this?


  • “No wonder this application has so many performance problems – just look at how many outer joins are scattered all through the SQL!”


  • The “problem of outer joins” is largely a myth!
  • I have never seen a query where an outer join was the true root cause of a query’s poor performance!
  • Correct outer joins perform as well as inner joins!
  • However, outer joins are subtle and easy to mis-code.
34
Problem 6
  • What’s wrong with this query?


  • SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID(+) = L.Item_ID
  •    AND NVL(I.Org_ID,:B2) = :B2
  •    AND C.Customer_ID = O.Customer_ID;
35
Problem 6
  • SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID(+) = L.Item_ID
  •    AND NVL(I.Org_ID,:B2) = :B2
  •    AND C.Customer_ID = O.Customer_ID;


  • It’s functionally fine in the case where there is a matching item having the correct Org_ID (even if there are also matching items with other Org_IDs).
  • It’s functionally fine when there are no matching Items at all.
36
Problem 6
  • SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID(+) = L.Item_ID
  •    AND NVL(I.Org_ID,:B2) = :B2
  •    AND C.Customer_ID = O.Customer_ID;


  • It is almost certainly wrong in the (possibly very rare) case where there are matching Items, but they all have the wrong Org_IDs!
37
Problem 6
  • SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID(+) = L.Item_ID
  •    AND NVL(I.Org_ID,:B2) = :B2
  •    AND C.Customer_ID = O.Customer_ID;


  • It performs badly because it joins to more Items than necessary!
38
Fixed Query 6
  • SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item
  •   FROM Orders O, Order_Lines L, Items I, Customers C
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID(+) = L.Item_ID
  •    AND I.Org_ID(+) = :B2
  •    AND C.Customer_ID = O.Customer_ID;


  • Now, the outer join is clean: order lines are never discarded, and performance is better!
39
Problem 7
  • What’s wrong with this query?


  • SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item,     IT.Name Type
  •   FROM Orders O, Order_Lines L, Items I, Customers C,
  •        Item_Types IT
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID(+) = L.Item_ID
  •    AND I.Org_ID(+) = :B2
  •    AND C.Customer_ID = O.Customer_ID
  •    AND I.Item_Type_ID = IT.Item_Type_ID;
40
Problem 7
  • SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item,     IT.Name Type
  •   FROM Orders O, Order_Lines L, Items I, Customers C,
  •        Item_Types IT
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID(+) = L.Item_ID
  •    AND I.Org_ID(+) = :B2
  •    AND C.Customer_ID = O.Customer_ID
  •    AND I.Item_Type_ID = IT.Item_Type_ID;


  • The inner join from I to IT discards the outer case of the outer join from L to I.
  • The “(+)”s could be removed without changing the functional result, and this would be clearer, if this is the desired result.
  • If you don’t want all inner joins, make I to IT outer.
41
First Fix 7
  • SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item,     IT.Name Type
  •   FROM Orders O, Order_Lines L, Items I, Customers C,
  •        Item_Types IT
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID = L.Item_ID
  •    AND I.Org_ID = :B2
  •    AND C.Customer_ID = O.Customer_ID
  •    AND I.Item_Type_ID = IT.Item_Type_ID;


  • The result is now consistent with the original functionality, but is clearer to understand!
42
Alternate Fix 7
  • SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item,     IT.Name Type
  •   FROM Orders O, Order_Lines L, Items I, Customers C,
  •        Item_Types IT
  •  WHERE O.Customer_ID = :B1
  •    AND L.Order_ID = O.Order_ID
  •    AND I.Item_ID(+) = L.Item_ID
  •    AND I.Org_ID(+) = :B2
  •    AND C.Customer_ID = O.Customer_ID
  •    AND I.Item_Type_ID = IT.Item_Type_ID(+);


  • The result is now consistent with the intent implied by the original outer join from L to I!
43
Problem 8
  • What’s wrong with this query?


  • SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name
  •   FROM Customers C, Customer_Events CE, Events Ev, Employees Em
  • WHERE C.Customer_ID = :B1
  •   AND C.Customer_ID = CE.Customer_ID
  •   AND CE.Event_ID = Ev.Event_ID
  •   AND CE.Recorded_By = Em.Employee_ID
  • ORDER BY CE.Event_Date;
44
View-Using SQL
  • SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, Em.Last_Name , Em.First_Name
  •   FROM Customers C, Customer_Events CE, Events Ev, Employees Em
  • WHERE C.Customer_ID = :B1
  •   AND C.Customer_ID = CE.Customer_ID
  •   AND CE.Event_ID = Ev.Event_ID
  •   AND CE.Recorded_By = Em.Employee_ID
  • ORDER BY CE.Event_Date;


  • The query seems fine, but you happen to notice that there is no “Employees” table mentioned in the execution plan!
  • You look up the Employees definition:
45
View-Using SQL
  • Does this view create a problem in the context of the view-using SQL in the last slide?


  • CREATE VIEW Employees AS
  • SELECT P.Person_ID Employee_ID, P.Last_Name, P.First_Name, D.Name Department_Name, M.Full_Name Manager_Name
  • FROM Persons P, Persons M, Person_Roles R,
  • Departments D
  • WHERE P.Mgr_ID = M.Person_ID(+)
  •   AND P.Department_ID = D.Department_ID(+)
  •   AND P.Person_ID = R.Person_ID
  •   AND R.Role_Code = 'EMP'
  •   AND SYSDATE BETWEEN R.Start_Date AND
  •                       NVL(R.END_DATE,SYSDATE);



46
First Fix, Problem 8
  • Don’t lose records just because they were recorded by someone who is no longer an employee!


  • SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name
  •   FROM Customers C, Customer_Events CE, Events Ev, Employees Em
  • WHERE C.Customer_ID = :B1
  •   AND C.Customer_ID = CE.Customer_ID
  •   AND CE.Event_ID = Ev.Event_ID
  •   AND CE.Recorded_By = Em.Employee_ID(+)
  • ORDER BY CE.Event_Date;
47
Second Fix, Problem 8
  • Don’t null out recorded-by names just because they were recorded by someone who is no longer an employee!
  • This performs better!
  • Are we done?


  • SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name
  •   FROM Customers C, Customer_Events CE, Events Ev, Persons Em
  • WHERE C.Customer_ID = :B1
  •   AND C.Customer_ID = CE.Customer_ID
  •   AND CE.Event_ID = Ev.Event_ID
  •   AND CE.Recorded_By = Em.Person_ID(+)
  • ORDER BY CE.Event_Date;
48
Problem 8 (Cont.)
  • Are we done?


  • SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name
  •   FROM Customers C, Customer_Events CE, Events Ev, Persons Em
  • WHERE C.Customer_ID = :B1
  •   AND C.Customer_ID = CE.Customer_ID
  •   AND CE.Event_ID = Ev.Event_ID
  •   AND CE.Recorded_By = Em.Person_ID(+)
  • ORDER BY CE.Event_Date;
49
Third Fix, Problem 8
  • Now we are done!


  • SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name
  •   FROM Persons C, Customer_Events CE, Events Ev, Persons Em
  • WHERE C.Person_ID = :B1
  •   AND C.Person_ID = CE.Customer_ID
  •   AND CE.Event_ID = Ev.Event_ID
  •   AND CE.Recorded_By = Em.Person_ID(+)
  • ORDER BY CE.Event_Date;
50
Overview
  • Introduction – the Game of Fixing Broken SQL
  • SQL That Hits Duplicate Rows
  • SQL That Returns Unexpected Results
  • SQL That Doesn’t Grow with the Data
51
Problem 9
  • What’s wrong with this query?


  • SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID
  •   FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE
  •  WHERE O.Customer_ID = :B1
  •    AND O.Order_ID = L.Order_ID
  •    AND O.Order_ID = E.Order_ID
  •    AND E.Event_Type = 'LATE_SHIPMENT'
  •    AND L.Line_ID = LE.Line_ID
  • ORDER BY O.Due_Date;
52
Problem 9
  • SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID
  •  FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE
  • WHERE O.Customer_ID = :B1
  •   AND O.Order_ID = L.Order_ID
  •   AND O.Order_ID = E.Order_ID
  •   AND E.Event_Type = 'LATE_SHIPMENT'
  •   AND L.Line_ID = LE.Line_ID
  • ORDER BY O.Due_Date;


  • Note Order_Events and Order_Line_Events, with line events presumably serving as details for the order events.
53
Problem 9
  • SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID
  •  FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE
  • WHERE O.Customer_ID = :B1
  •   AND O.Order_ID = L.Order_ID
  •   AND O.Order_ID = E.Order_ID
  •   AND E.Event_Type = 'LATE_SHIPMENT'
  •   AND L.Line_ID = LE.Line_ID
  • ORDER BY O.Due_Date;


  • Order_Events and Order_Line_Events both join to the same order, in any row this query returns, but do they pertain to the same event?!?
  • This query implies significant built-in assumptions:
54
Problem 9
  • SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID
  •  FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE
  • WHERE O.Customer_ID = :B1
  •   AND O.Order_ID = L.Order_ID
  •   AND O.Order_ID = E.Order_ID
  •   AND E.Event_Type = 'LATE_SHIPMENT'
  •   AND L.Line_ID = LE.Line_ID
  • ORDER BY O.Due_Date;


  • Assumptions (These may be valid, today, but tomorrow?):
  • There is at most one event of type 'LATE_SHIPMENT' per order.
  • There are no line event details for any event types other than the 'LATE_SHIPMENT' event.
  • There is at most one order with any given due date per customer.
55
Fix for Problem 9
  • SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, E.Order_Event_ID, L.Line_ID, L.Item_ID
  •  FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE
  • WHERE O.Customer_ID = :B1
  •   AND O.Order_ID = L.Order_ID
  •   AND O.Order_ID = E.Order_ID
  •   AND E.Event_Type = 'LATE_SHIPMENT'
  •   AND E.Order_Event_ID = LE.Order_Event_ID
  •   AND L.Line_ID = LE.Line_ID
  • ORDER BY O.Due_Date, O.Order_ID, E.Order_Event_ID, L.Line_ID;


  • Robust Behavior:
  • Now guarantees one row per late-shipment event line, even under future data conditions.
  • Sorts sensibly, due date first, orders next, order events, next, order lines last.
56
SQL That Doesn’t Grow with the Data
  • Usually requires SQL changes.
  • Often requires schema design changes (hard to make, but even harder to make later!).
  • Seeing the assumptions requires at least a little bit of understanding of the application, but most often no more than you can guess from table and column names!
57
Questions?