Subquery optimization for IN is not as
effective as for the = operator or for the
IN(
operator.
value_list)
A typical case for poor IN subquery
performance is when the subquery returns a small number of
rows but the outer query returns a large number of rows to be
compared to the subquery result.
The problem is that, for a statement that uses an
IN subquery, the optimizer rewrites it as a
correlated subquery. Consider the following statement that
uses an uncorrelated subquery:
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return
M and N
rows, respectively, the execution time becomes on the order of
O(M×N),
rather than
O(M+N)
as it would be for an uncorrelated subquery.
An implication is that an IN subquery can
be much slower than a query written using an
IN(
operator that lists the same values that the subquery would
return.
value_list)
In general, you cannot modify a table and select from the same table in a subquery. For example, this limitation applies to statements of the following forms:
DELETE FROM t WHERE ... (SELECT ... FROM t ...);
UPDATE t ... WHERE col = (SELECT ... FROM t ...);
{INSERT|REPLACE} INTO t (SELECT ... FROM t ...);
Exception: The preceding prohibition does not apply if you are
using a subquery for the modified table in the
FROM clause. Example:
UPDATE t ... WHERE col = (SELECT * FROM (SELECT ... FROM t...) AS _t ...);
Here the result from the subquery in the
FROM clause is stored as a temporary table,
so the relevant rows in t have already been
selected by the time the update to t takes
place.
Row comparison operations are only partially supported:
For ,
expr [NOT] IN
subqueryexpr can be an
n-tuple (specified using row
constructor syntax) and the subquery can return rows of
n-tuples. The permitted syntax
is therefore more specifically expressed as
row_constructor [NOT]
IN table_subquery
For ,
expr
op {ALL|ANY|SOME}
subqueryexpr must be a scalar value and
the subquery must be a column subquery; it cannot return
multiple-column rows.
In other words, for a subquery that returns rows of
n-tuples, this is supported:
(expr_1, ...,expr_n) [NOT] INtable_subquery
But this is not supported:
(expr_1, ...,expr_n)op{ALL|ANY|SOME}subquery
The reason for supporting row comparisons for
IN but not for the others is that
IN is implemented by rewriting it as a
sequence of =
comparisons and AND operations.
This approach cannot be used for ALL,
ANY, or SOME.
Subqueries in the FROM clause cannot be
correlated subqueries. They are materialized in whole
(evaluated to produce a result set) before evaluating the
outer query, so they cannot be evaluated per row of the outer
query.
MySQL does not support LIMIT in subqueries
for certain subquery operators:
mysql>SELECT * FROM t1->WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1);ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
The optimizer is more mature for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as a join.
An exception occurs for the case where an
IN subquery can be rewritten as a
SELECT
DISTINCT join. Example:
SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition);
That statement can be rewritten as follows:
SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition;
But in this case, the join requires an extra
DISTINCT operation and is not more
efficient than the subquery.
MySQL permits a subquery to refer to a stored function that
has data-modifying side effects such as inserting rows into a
table. For example, if f() inserts rows,
the following query can modify data:
SELECT ... WHERE x IN (SELECT f() ...);
This behavior is an extension to the SQL standard. In MySQL,
it can produce indeterminate results because
f() might be executed a different number of
times for different executions of a given query depending on
how the optimizer chooses to handle it.
For statement-based or mixed-format replication, one implication of this indeterminism is that such a query can produce different results on the master and its slaves.