Thursday, 11 September 2014

Efficient SQL Statements

EXISTS vs. IN

The EXISTS function searches for the presence of a single row meeting the stated criteria as opposed to the IN statement which looks for all occurrences.
TABLE1 - 1000 rows
TABLE2 - 1000 rows
(A)
SELECT t1.id
FROM   table1 t1
WHERE  t1.code IN (SELECT t2.code 
                   FROM   table2 t2);
(B)
SELECT t1.id
FROM   table1 t1
WHERE  EXISTS (SELECT '1' 
               FROM   table2 t2 
               WHERE  t2.code = t1.code)
For query A, all rows in TABLE2 will be read for every row in TABLE1. The effect will be 1,000,000 rows read from items. In the case of query B, a maximum of 1 row from TABLE2 will be read for each row of TABLE1, thus reducing the processing overhead of the statement.
Rule of thumb:
  • If the majority of the filtering criteria are in the subquery then the IN variation may be more performant.
  • If the majority of the filtering criteria are in the top query then the EXISTS variation may be more performant.
I would suggest they you should try both variants and see which works the best.

INEQUALITIES

If a query uses inequalities (item_no > 100) the optimizer must estimate the number of rows returned before it can decide the best way to retrieve the data. This estimation is prone to errors. If you are aware of the data and it's distribution you can use optimizer hints to encourage or discourage full table scans to improve performance.
If an index is being used for a range scan on the column in question, the performance can be improved by substituting >= for >. In this case, item_no > 100 becomes item_no >= 101. In the first case, a full scan of the index will occur. In the second case, Oracle jumps straight to the first index entry with an item_no of 101 and range scans from this point. For large indexes this may significantly reduce the number of blocks read.

IMPROVING PARSE SPEED

Execution plans for SELECT statements are cached by the server, but unless the exact same statement is repeated the stored execution plan details will not be reused. Even differing spaces in the statement will cause this lookup to fail. Use of bind variables allows you to repeatedly use the same statements whilst changing the WHERE clause criteria. Assuming the statement does not have a cached execution plan it must be parsed before execution. The parse phase for statements can be decreased by efficient use of aliasing. If an alias is not present, the engine must resolve which tables own the specified columns. The following is an example.

Bad Statement Good Statement
SELECT first_name,
       last_name,
       country
FROM   employee,
       countries
WHERE  country_id = id
AND    lastname   = 'HALL';
SELECT e.first_name,
       e.last_name,
       c.country
FROM   employee e,
       countries c
WHERE  e.country_id = c.id
AND    e.last_name  = 'HALL';

No comments:

Post a Comment