MySQL supports the following JOIN syntaxes
for the table_references part of
SELECT statements and
multiple-table DELETE and
UPDATE statements:
table_references:escaped_table_reference[,escaped_table_reference] ...escaped_table_reference:table_reference| { OJtable_reference}table_reference:table_factor|join_tabletable_factor:tbl_name[PARTITION (partition_names)] [[AS]alias] [index_hint_list] |table_subquery[AS]alias| (table_references)join_table:table_reference[INNER | CROSS] JOINtable_factor[join_condition] |table_referenceSTRAIGHT_JOINtable_factor|table_referenceSTRAIGHT_JOINtable_factorONconditional_expr|table_reference{LEFT|RIGHT} [OUTER] JOINtable_referencejoin_condition|table_referenceNATURAL [{LEFT|RIGHT} [OUTER]] JOINtable_factorjoin_condition: ONconditional_expr| USING (column_list)index_hint_list:index_hint[,index_hint] ...index_hint: USE {INDEX|KEY} [FOR {JOIN|ORDER BY|GROUP BY}] ([index_list]) | IGNORE {INDEX|KEY} [FOR {JOIN|ORDER BY|GROUP BY}] (index_list) | FORCE {INDEX|KEY} [FOR {JOIN|ORDER BY|GROUP BY}] (index_list)index_list:index_name[,index_name] ...
A table reference is also known as a join expression.
In MySQL 5.6.2 and later, a table reference (when it refers to a
partitioned table) may contain a PARTITION
option, including a comma-separated list of partitions,
subpartitions, or both. This option follows the name of the
table and precedes any alias declaration. The effect of this
option is that rows are selected only from the listed partitions
or subpartitions—in other words, any partitions or
subpartitions not named in the list are ignored For more
information, see Section 19.5, “Partition Selection”.
The syntax of table_factor is
extended in comparison with the SQL Standard. The latter accepts
only table_reference, not a list of
them inside a pair of parentheses.
This is a conservative extension if we consider each comma in a
list of table_reference items as
equivalent to an inner join. For example:
SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)
is equivalent to:
SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)
In MySQL, JOIN, CROSS
JOIN, and INNER JOIN are syntactic
equivalents (they can replace each other). In standard SQL, they
are not equivalent. INNER JOIN is used with
an ON clause, CROSS JOIN
is used otherwise.
In general, parentheses can be ignored in join expressions containing only inner join operations. MySQL also supports nested joins (see Section 8.2.1.11, “Nested Join Optimization”).
Index hints can be specified to affect how the MySQL optimizer makes use of indexes. For more information, see Section 8.9.3, “Index Hints”.
The following list describes general factors to take into account when writing joins.
A table reference can be aliased using
or
tbl_name AS
alias_nametbl_name alias_name:
SELECT t1.name, t2.salary FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name; SELECT t1.name, t2.salary FROM employee t1 INNER JOIN info t2 ON t1.name = t2.name;
A table_subquery is also known as
a subquery in the FROM clause. Such
subqueries must include an alias to
give the subquery result a table name. A trivial example
follows; see also Section 13.2.10.8, “Subqueries in the FROM Clause”.
SELECT * FROM (SELECT 1, 2, 3) AS t1;
INNER JOIN and ,
(comma) are semantically equivalent in the absence of a join
condition: both produce a Cartesian product between the
specified tables (that is, each and every row in the first
table is joined to each and every row in the second table).
However, the precedence of the comma operator is less than
that of INNER JOIN, CROSS
JOIN, LEFT JOIN, and so on. If
you mix comma joins with the other join types when there is
a join condition, an error of the form Unknown
column ' may occur. Information about dealing with
this problem is given later in this section.
col_name' in 'on
clause'
The conditional_expr used with
ON is any conditional expression of the
form that can be used in a WHERE clause.
Generally, you should use the ON clause
for conditions that specify how to join tables, and the
WHERE clause to restrict which rows you
want in the result set.
If there is no matching row for the right table in the
ON or USING part in a
LEFT JOIN, a row with all columns set to
NULL is used for the right table. You can
use this fact to find rows in a table that have no
counterpart in another table:
SELECT left_tbl.* FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id = right_tbl.id WHERE right_tbl.id IS NULL;
This example finds all rows in left_tbl
with an id value that is not present in
right_tbl (that is, all rows in
left_tbl with no corresponding row in
right_tbl). See
Section 8.2.1.9, “LEFT JOIN and RIGHT JOIN Optimization”.
The
USING(
clause names a list of columns that must exist in both
tables. If tables column_list)a and
b both contain columns
c1, c2, and
c3, the following join compares
corresponding columns from the two tables:
a LEFT JOIN b USING (c1,c2,c3)
The NATURAL [LEFT] JOIN of two tables is
defined to be semantically equivalent to an INNER
JOIN or a LEFT JOIN with a
USING clause that names all columns that
exist in both tables.
RIGHT JOIN works analogously to
LEFT JOIN. To keep code portable across
databases, it is recommended that you use LEFT
JOIN instead of RIGHT JOIN.
The { OJ ... } syntax shown in the join
syntax description exists only for compatibility with ODBC.
The curly braces in the syntax should be written literally;
they are not metasyntax as used elsewhere in syntax
descriptions.
SELECT left_tbl.*
FROM { OJ left_tbl LEFT OUTER JOIN right_tbl ON left_tbl.id = right_tbl.id }
WHERE right_tbl.id IS NULL;
You can use other types of joins within { OJ ...
}, such as INNER JOIN or
RIGHT OUTER JOIN. This helps with
compatibility with some third-party applications, but is not
official ODBC syntax.
STRAIGHT_JOIN is similar to
JOIN, except that the left table is
always read before the right table. This can be used for
those (few) cases for which the join optimizer puts the
tables in the wrong order.
Some join examples:
SELECT * FROM table1, table2; SELECT * FROM table1 INNER JOIN table2 ON table1.id=table2.id; SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id; SELECT * FROM table1 LEFT JOIN table2 USING (id); SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id LEFT JOIN table3 ON table2.id=table3.id;
Join Processing Changes in MySQL 5.0.12
Natural joins and joins with USING,
including outer join variants, are processed according to the
SQL:2003 standard. The goal was to align the syntax and
semantics of MySQL with respect to NATURAL
JOIN and JOIN ... USING according
to SQL:2003. However, these changes in join processing can
result in different output columns for some joins. Also, some
queries that appeared to work correctly in older versions
(prior to 5.0.12) must be rewritten to comply with the
standard.
These changes have five main aspects:
The way that MySQL determines the result columns of
NATURAL or USING join
operations (and thus the result of the entire
FROM clause).
Expansion of SELECT * and SELECT
into a list
of selected columns.
tbl_name.*
Resolution of column names in NATURAL or
USING joins.
Transformation of NATURAL or
USING joins into JOIN ...
ON.
Resolution of column names in the ON
condition of a JOIN ... ON.
The following list provides more detail about several effects of current join processing versus join processing in older versions. The term “previously” means “prior to MySQL 5.0.12.”
The columns of a NATURAL join or a
USING join may be different from
previously. Specifically, redundant output columns no longer
appear, and the order of columns for SELECT
* expansion may be different from before.
Consider this set of statements:
CREATE TABLE t1 (i INT, j INT); CREATE TABLE t2 (k INT, j INT); INSERT INTO t1 VALUES(1,1); INSERT INTO t2 VALUES(1,1); SELECT * FROM t1 NATURAL JOIN t2; SELECT * FROM t1 JOIN t2 USING (j);
Previously, the statements produced this output:
+------+------+------+------+ | i | j | k | j | +------+------+------+------+ | 1 | 1 | 1 | 1 | +------+------+------+------+ +------+------+------+------+ | i | j | k | j | +------+------+------+------+ | 1 | 1 | 1 | 1 | +------+------+------+------+
In the first SELECT
statement, column j appears in both
tables and thus becomes a join column, so, according to
standard SQL, it should appear only once in the output, not
twice. Similarly, in the second SELECT statement, column
j is named in the
USING clause and should appear only once
in the output, not twice. But in both cases, the redundant
column is not eliminated. Also, the order of the columns is
not correct according to standard SQL.
Now the statements produce this output:
+------+------+------+ | j | i | k | +------+------+------+ | 1 | 1 | 1 | +------+------+------+ +------+------+------+ | j | i | k | +------+------+------+ | 1 | 1 | 1 | +------+------+------+
The redundant column is eliminated and the column order is correct according to standard SQL:
First, coalesced common columns of the two joined tables, in the order in which they occur in the first table
Second, columns unique to the first table, in order in which they occur in that table
Third, columns unique to the second table, in order in which they occur in that table
The single result column that replaces two common columns is
defined using the coalesce operation. That is, for two
t1.a and t2.a the
resulting single join column a is defined
as a = COALESCE(t1.a, t2.a), where:
COALESCE(x, y) = (CASE WHEN V1 IS NOT NULL THEN V1 ELSE V2 END)
If the join operation is any other join, the result columns of the join consists of the concatenation of all columns of the joined tables. This is the same as previously.
A consequence of the definition of coalesced columns is
that, for outer joins, the coalesced column contains the
value of the non-NULL column if one of
the two columns is always NULL. If
neither or both columns are NULL, both
common columns have the same value, so it doesn't matter
which one is chosen as the value of the coalesced column. A
simple way to interpret this is to consider that a coalesced
column of an outer join is represented by the common column
of the inner table of a JOIN. Suppose
that the tables t1(a,b) and
t2(a,c) have the following contents:
t1 t2 ---- ---- 1 x 2 z 2 y 3 w
Then:
mysql> SELECT * FROM t1 NATURAL LEFT JOIN t2;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | x | NULL |
| 2 | y | z |
+------+------+------+
Here column a contains the values of
t1.a.
mysql> SELECT * FROM t1 NATURAL RIGHT JOIN t2;
+------+------+------+
| a | c | b |
+------+------+------+
| 2 | z | y |
| 3 | w | NULL |
+------+------+------+
Here column a contains the values of
t2.a.
Compare these results to the otherwise equivalent queries
with JOIN ... ON:
mysql> SELECT * FROM t1 LEFT JOIN t2 ON (t1.a = t2.a);
+------+------+------+------+
| a | b | a | c |
+------+------+------+------+
| 1 | x | NULL | NULL |
| 2 | y | 2 | z |
+------+------+------+------+
mysql> SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a = t2.a);
+------+------+------+------+
| a | b | a | c |
+------+------+------+------+
| 2 | y | 2 | z |
| NULL | NULL | 3 | w |
+------+------+------+------+
Previously, a USING clause could be
rewritten as an ON clause that compares
corresponding columns. For example, the following two
clauses were semantically identical:
a LEFT JOIN b USING (c1,c2,c3) a LEFT JOIN b ON a.c1=b.c1 AND a.c2=b.c2 AND a.c3=b.c3
Now the two clauses no longer are quite the same:
With respect to determining which rows satisfy the join condition, both joins remain semantically identical.
With respect to determining which columns to display for
SELECT * expansion, the two joins are
not semantically identical. The USING
join selects the coalesced value of corresponding
columns, whereas the ON join selects
all columns from all tables. For the preceding
USING join, SELECT
* selects these values:
COALESCE(a.c1,b.c1), COALESCE(a.c2,b.c2), COALESCE(a.c3,b.c3)
For the ON join, SELECT
* selects these values:
a.c1, a.c2, a.c3, b.c1, b.c2, b.c3
With an inner join,
COALESCE(a.c1,b.c1) is
the same as either a.c1 or
b.c1 because both columns will have
the same value. With an outer join (such as
LEFT JOIN), one of the two columns
can be NULL. That column will be
omitted from the result.
The evaluation of multi-way natural joins differs in a very
important way that affects the result of
NATURAL or USING joins
and that can require query rewriting. Suppose that you have
three tables t1(a,b),
t2(c,b), and t3(a,c)
that each have one row: t1(1,2),
t2(10,2), and
t3(7,10). Suppose also that you have this
NATURAL JOIN on the three tables:
SELECT ... FROM t1 NATURAL JOIN t2 NATURAL JOIN t3;
Previously, the left operand of the second join was
considered to be t2, whereas it should be
the nested join (t1 NATURAL JOIN t2). As
a result, the columns of t3 are checked
for common columns only in t2, and, if
t3 has common columns with
t1, these columns are not used as
equi-join columns. Thus, previously, the preceding query was
transformed to the following equi-join:
SELECT ... FROM t1, t2, t3 WHERE t1.b = t2.b AND t2.c = t3.c;
That join is missing one more equi-join predicate
(t1.a = t3.a). As a result, it produces
one row, not the empty result that it should. The correct
equivalent query is this:
SELECT ... FROM t1, t2, t3 WHERE t1.b = t2.b AND t2.c = t3.c AND t1.a = t3.a;
If you require the same query result in current versions of MySQL as in older versions, rewrite the natural join as the first equi-join.
Previously, the comma operator (,) and
JOIN both had the same precedence, so the
join expression t1, t2 JOIN t3 was
interpreted as ((t1, t2) JOIN t3). Now
JOIN has higher precedence, so the
expression is interpreted as (t1, (t2 JOIN
t3)). This change affects statements that use an
ON clause, because that clause can refer
only to columns in the operands of the join, and the change
in precedence changes interpretation of what those operands
are.
Example:
CREATE TABLE t1 (i1 INT, j1 INT); CREATE TABLE t2 (i2 INT, j2 INT); CREATE TABLE t3 (i3 INT, j3 INT); INSERT INTO t1 VALUES(1,1); INSERT INTO t2 VALUES(1,1); INSERT INTO t3 VALUES(1,1); SELECT * FROM t1, t2 JOIN t3 ON (t1.i1 = t3.i3);
Previously, the SELECT was
legal due to the implicit grouping of
t1,t2 as (t1,t2). Now
the JOIN takes precedence, so the
operands for the ON clause are
t2 and t3. Because
t1.i1 is not a column in either of the
operands, the result is an Unknown column 't1.i1'
in 'on clause' error. To allow the join to be
processed, group the first two tables explicitly with
parentheses so that the operands for the
ON clause are (t1,t2)
and t3:
SELECT * FROM (t1, t2) JOIN t3 ON (t1.i1 = t3.i3);
Alternatively, avoid the use of the comma operator and use
JOIN instead:
SELECT * FROM t1 JOIN t2 JOIN t3 ON (t1.i1 = t3.i3);
This change also applies to statements that mix the comma
operator with INNER JOIN, CROSS
JOIN, LEFT JOIN, and
RIGHT JOIN, all of which now have higher
precedence than the comma operator.
Previously, the ON clause could refer to
columns in tables named to its right. Now an
ON clause can refer only to its operands.
Example:
CREATE TABLE t1 (i1 INT); CREATE TABLE t2 (i2 INT); CREATE TABLE t3 (i3 INT); SELECT * FROM t1 JOIN t2 ON (i1 = i3) JOIN t3;
Previously, the SELECT
statement was legal. Now the statement fails with an
Unknown column 'i3' in 'on clause' error
because i3 is a column in
t3, which is not an operand of the
ON clause. The statement should be
rewritten as follows:
SELECT * FROM t1 JOIN t2 JOIN t3 ON (i1 = i3);
Resolution of column names in NATURAL or
USING joins is different than previously.
For column names that are outside the
FROM clause, MySQL now handles a superset
of the queries compared to previously. That is, in cases
when MySQL formerly issued an error that some column is
ambiguous, the query now is handled correctly. This is due
to the fact that MySQL now treats the common columns of
NATURAL or USING joins
as a single column, so when a query refers to such columns,
the query compiler does not consider them as ambiguous.
Example:
SELECT * FROM t1 NATURAL JOIN t2 WHERE b > 1;
Previously, this query would produce an error ERROR
1052 (23000): Column 'b' in where clause is
ambiguous. Now the query produces the correct
result:
+------+------+------+ | b | c | y | +------+------+------+ | 4 | 2 | 3 | +------+------+------+
One extension of MySQL compared to the SQL:2003 standard is
that MySQL enables you to qualify the common (coalesced)
columns of NATURAL or
USING joins (just as previously), while
the standard disallows that.