EXISTS vs. IN
The EXISTS function searches for the presence of a single row meeting the stated criteria as opposed to the IN statement which looks for all occurrences.TABLE1 - 1000 rows TABLE2 - 1000 rows
(A) SELECT t1.id FROM table1 t1 WHERE t1.code IN (SELECT t2.code FROM table2 t2);
For query A, all rows in TABLE2 will be read for every row in TABLE1. The effect will be 1,000,000 rows read from items. In the case of query B, a maximum of 1 row from TABLE2 will be read for each row of TABLE1, thus reducing the processing overhead of the statement.(B) SELECT t1.id FROM table1 t1 WHERE EXISTS (SELECT '1' FROM table2 t2 WHERE t2.code = t1.code)
Rule of thumb:
- If the majority of the filtering criteria are in the subquery then the
IN
variation may be more performant. - If the majority of the filtering criteria are in the top query then the
EXISTS
variation may be more performant.
INEQUALITIES
If a query uses inequalities (item_no > 100) the optimizer must estimate the number of rows returned before it can decide the best way to retrieve the data. This estimation is prone to errors. If you are aware of the data and it's distribution you can use optimizer hints to encourage or discourage full table scans to improve performance.If an index is being used for a range scan on the column in question, the performance can be improved by substituting >= for >. In this case, item_no > 100 becomes item_no >= 101. In the first case, a full scan of the index will occur. In the second case, Oracle jumps straight to the first index entry with an item_no of 101 and range scans from this point. For large indexes this may significantly reduce the number of blocks read.
IMPROVING PARSE SPEED
Execution plans for SELECT statements are cached by the server, but unless the exact same statement is repeated the stored execution plan details will not be reused. Even differing spaces in the statement will cause this lookup to fail. Use of bind variables allows you to repeatedly use the same statements whilst changing the WHERE clause criteria. Assuming the statement does not have a cached execution plan it must be parsed before execution. The parse phase for statements can be decreased by efficient use of aliasing. If an alias is not present, the engine must resolve which tables own the specified columns. The following is an example.Bad Statement | Good Statement |
---|---|
SELECT first_name, last_name, country FROM employee, countries WHERE country_id = id AND lastname = 'HALL'; |
SELECT e.first_name, e.last_name, c.country FROM employee e, countries c WHERE e.country_id = c.id AND e.last_name = 'HALL'; |