Pre-General Availability Draft: 2017-07-17
A common table expression (CTE) is a named temporary result set that exists within the scope of a single statement and that can be referred to later within that statement, possibly multiple times. The following discussion describes how to write statements that use CTEs.
For information about CTE optimization, see Section 8.2.2.3, “Optimizing Derived Tables, View References, and Common Table Expressions”.
To specify common table expressions, use a
WITH clause that has one or
more comma-separated subclauses, each of which provides a
subquery that produces a result set and a name associated with
the subquery. The following example defines CTEs named
cte1 and cte2 in the
WITH clause, and refers to them
in the top-level SELECT that
follows the WITH clause:
WITH
cte1 AS (SELECT a, b FROM table1),
cte2 AS (SELECT c, d FROM table2)
SELECT b, d FROM cte1 JOIN cte2
WHERE cte1.a = cte2.c;
In the statement containing the
WITH clause, each CTE name can
be referenced to access the corresponding CTE result set.
A CTE name can be referenced in other CTEs, enabling CTEs to be defined based on other CTEs.
A CTE can refer to itself to define a recursive CTE. Common applications of recursive CTEs include series generation and traversal of hierarchical or tree-structured data.
Common table expressions are an optional part of the syntax
for DML statements. They are defined using a
WITH clause:
with_clause:
WITH [RECURSIVE]
cte_name [(col_name [, col_name] ...)] AS (subquery)
[, cte_name [(col_name [, col_name] ...)] AS (subquery)] ...
cte_name names a single common
table expression and can be used as a table reference in the
statement containing the WITH
clause.
The subquery part of AS
( is called the
“subquery of the CTE” and is what produces the
CTE result set. The parentheses following
subquery)AS are required.
A common table expression is recursive if its subquery refers
to its own name. The RECURSIVE keyword must
be included if any CTE in the
WITH clause is recursive. For
more information, see
Recursive Common Table Expressions.
Determination of column names for a given CTE occurs as follows:
If a parenthesized list of names follows the CTE name, those names are the column names:
WITH cte (col1, col2) AS ( SELECT 1, 2 UNION ALL SELECT 3, 4 ) SELECT col1, col2 FROM cte;The number of names in the list must be the same as the number of columns in the result set.
Otherwise, the column names come from the select list of the first
SELECTwithin theAS (part:subquery)WITH cte AS ( SELECT 1 AS col1, 2 AS col2 UNION ALL SELECT 3, 4 ) SELECT col1, col2 FROM cte;
A WITH clause is permitted in
these contexts:
At the beginning of
SELECT,UPDATE, andDELETEstatements.WITH ... SELECT ... WITH ... UPDATE ... WITH ... DELETE ...At the beginning of subqueries (including derived table subqueries):
SELECT ... WHERE id IN (WITH ... SELECT ...) ... SELECT * FROM (WITH ... SELECT ...) AS dt ...Immediately preceding
SELECTfor statements that include aSELECTstatement:INSERT ... WITH ... SELECT ... REPLACE ... WITH ... SELECT ... CREATE TABLE ... WITH ... SELECT ... CREATE VIEW ... WITH ... SELECT ... DECLARE CURSOR ... WITH ... SELECT ... EXPLAIN ... WITH ... SELECT ...
Only one WITH clause is
permitted at the same level.
WITH followed by
WITH at the same level is not
permitted, so this is illegal:
WITH cte1 AS (...) WITH cte2 AS (...) SELECT ...
However, a statement can contain multiple
WITH clauses if they occur at
different levels:
WITH cte1 AS (SELECT 1)
SELECT * FROM (WITH cte2 AS (SELECT 2) SELECT * FROM cte2 JOIN cte1) AS dt;
A WITH clause can define one or
more common table expressions, but each CTE name must be
unique to the clause. This is illegal:
WITH cte1 AS (...), cte1 AS (...) SELECT ...To make the statement legal, define the CTEs with unique names:
WITH cte1 AS (...), cte2 AS (...) SELECT ...A CTE can refer to itself or to other CTEs:
A self-referencing CTE is recursive.
A CTE can refer to CTEs defined earlier in the same
WITHclause, but not those defined later.This constraint rules out mutually-recursive CTEs, where
cte1referencescte2andcte2referencescte1because one of those references must be to a CTE defined later.A CTE in a given query block can refer to CTEs defined in query blocks at a more outer level, but not CTEs defined in query blocks at a more inner level.
For resolving references to objects with the same names,
derived tables hide CTEs; and CTEs hide base tables,
TEMPORARY tables, and views. Name
resolution occurs by searching for objects in the same query
block, then proceeding to outer blocks in turn while no object
with the name is found.
A CTE cannot contain outer references. As with derived tables, which also prohibit outer references, this is a MySQL restriction, not a restriction of the SQL standard. For additional syntax considerations specific to recursive CTEs, see Recursive Common Table Expressions.
A recursive common table expression is one having a subquery that refers to its own name. For example:
WITH RECURSIVE cte (n) AS
(
SELECT 1
UNION ALL
SELECT n + 1 FROM cte WHERE n < 5
)
SELECT * FROM cte;When executed, the statement produces this result, a single column containing a simple linear sequence:
+------+
| n |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------+A recursive CTE has this structure:
The
WITHclause must begin withWITH RECURSIVEif any CTE in theWITHclause refers to itself. (If no CTE refers to itself,RECURSIVEis permitted but not required.)If you forget
RECURSIVEfor a recursive CTE, this error is a likely result:ERROR 1146 (42S02): Table 'cte_name' doesn't existThe recursive CTE subquery has two parts, separated by
UNION [ALL]orUNION DISTINCT:SELECT ... -- return initial row set UNION ALL SELECT ... -- return additional row setsThe first
SELECTproduces the initial row or rows for the CTE and does not refer to the CTE name. The secondSELECTproduces additional rows and recurses by referring to the CTE name in itsFROMclause. Recursion ends when this part produces no new rows. Thus, a recursive CTE consists of a nonrecursiveSELECTpart followed by a recursiveSELECTpart.Either
SELECTpart can itself be a union of multipleSELECTstatements.The types of the CTE result columns are inferred from the column types of the nonrecursive
SELECTpart only, and the columns are all nullable. For type determination, the recursiveSELECTpart is ignored.If the nonrecursive and recursive parts are separated by
UNION DISTINCT, duplicate rows are eliminated. This is useful for queries that perform transitive closures, to avoid infinite loops.Each iteration of the recursive part operates only on the rows produced by the previous iteration. If the recursive part has multiple query blocks, they all operate for each iteration on the rows produced by the previous iteration, and their results are concatenated for the next iteration.
The recursive CTE subquery shown earlier has this nonrecursive part that retrieves a single row to produce the initial row set:
SELECT 1The CTE subquery also has this recursive part:
SELECT n + 1 FROM cte WHERE n < 5
At each iteration, that SELECT
produces a row with a new value one greater than the value of
n from the previous row set. The first
iteration operates on the initial row set
(1) and produces 1+1=2;
the second iteration operates on the first iteration's row set
(2) and produces 2+1=3;
and so forth. This continues until recursion ends, when
n is no longer less than 5.
If the recursive part of a CTE produces wider values for a column than the nonrecursive part, it may be necessary to widen the column in the nonrecursive part to avoid data truncation. Consider this statement:
WITH RECURSIVE cte AS
(
SELECT 1 AS n, 'abc' AS str
UNION ALL
SELECT n + 1, CONCAT(str, str) FROM cte WHERE n < 3
)
SELECT * FROM cte;In nonstrict SQL mode, the statement produces this output:
+------+------+
| n | str |
+------+------+
| 1 | abc |
| 2 | abc |
| 3 | abc |
+------+------+
The str column values are all
'abc' because the nonrecursive
SELECT determines the column
widths. Consequently, the wider str values
produced by the recursive
SELECT are truncated.
In strict SQL mode, the statement produces an error:
ERROR 1406 (22001): Data too long for column 'str' at row 1
To address this issue, use
CAST() in the nonrecursive
SELECT to make the
str column wider:
WITH RECURSIVE cte AS
(
SELECT 1 AS n, CAST('abc' AS CHAR(20)) AS str
UNION ALL
SELECT n + 1, CONCAT(str, str) FROM cte WHERE n < 3
)
SELECT * FROM cte;Now the statement produces this result, without truncation:
+------+--------------+
| n | str |
+------+--------------+
| 1 | abc |
| 2 | abcabc |
| 3 | abcabcabcabc |
+------+--------------+Columns are accessed by name, not position, which means that columns in the recursive part can access columns in the nonrecursive part that have a different position, as this CTE illustrates:
WITH RECURSIVE cte AS
(
SELECT 1 AS n, 1 AS p, -1 AS q
UNION ALL
SELECT n + 1, q * 2, p * 2 FROM cte WHERE n < 5
)
SELECT * FROM cte;
Because p in one row is derived from
q in the previous row, and vice versa, the
positive and negative values values swap positions in each
successive row of the output:
+------+------+------+
| n | p | q |
+------+------+------+
| 1 | 1 | -1 |
| 2 | -2 | 2 |
| 3 | 4 | -4 |
| 4 | -8 | 8 |
| 5 | 16 | -16 |
+------+------+------+Some syntax constraints apply within recursive CTE subqueries:
The recursive
SELECTpart must not contain these constructs:Aggregate functions such as
SUM()GROUP BYORDER BYLIMITDISTINCT
This constraint does not apply to the nonrecursive
SELECTpart of a recursive CTE. The prohibition onDISTINCTapplies only toUNIONmembers;UNION DISTINCTis permitted.The recursive
SELECTpart must reference the CTE only once and only in itsFROMclause, not in any subquery. It can reference tables other than the CTE and join them with the CTE. If used in a join like this, the CTE must not be on the right side of aLEFT JOIN.
These constraints come from the SQL standard, other than the
MySQL-specific exclusions of ORDER BY,
LIMIT, and DISTINCT.
It is important for recursive CTEs that the recursive
SELECT part include a condition
to terminate recursion. Otherwise, recursion terminates only
if some kind of resource exhaustion occurs. As a development
technique to guard against a runaway recursive CTE, you can
force termination by placing a limit on execution time:
The
max_execution_timesystem variable enforces an execution timeout for allSELECTstatements executed within the current session.The
MAX_EXECUTION_TIMEoptimizer hint enforces a per-query execution timeout for theSELECTstatement in which it appears.
Suppose that a recursive CTE is mistakenly written with no recursion execution termination condition:
WITH RECURSIVE cte (n) AS
(
SELECT 1
UNION ALL
SELECT n + 1 FROM cte
)
SELECT * FROM cte;To guard against infinite recursion, set a per-session timeout by executing a statement like this prior to executing the CTE statement:
SET max_execution_time = 1000; -- impose one second timeoutAlternatively, include an optimizer hint within the CTE statement itself:
WITH RECURSIVE cte (n) AS
(
SELECT 1
UNION ALL
SELECT n + 1 FROM cte
)
SELECT /*+ MAX_EXECUTION_TIME(1000) */ * FROM cte;
If a recursive query without an execution time limit enters an
infinite loop, you can terminate it from another session using
KILL QUERY.
Within the session itself, the client program used to run the
query might provide a way to kill the query. For example, in
mysql, typing Control+C
interrupts the current statement.
For recursive CTEs, EXPLAIN
output rows for recursive
SELECT parts display
Recursive in the Extra
column.
Cost estimates displayed by
EXPLAIN represent cost per
iteration, which might differ considerably from total cost.
The optimizer cannot predict the number of iterations because
it cannot predict when the WHERE clause
will become false.
CTE actual cost may also be affected by result set size. A CTE that produces many rows may require an internal temporary table large enough to be converted from in-memory to on-disk format and may suffer a performance penalty. If so, increasing the permitted in-memory temporary table size may improve performance; see Section 8.4.4, “Internal Temporary Table Use in MySQL”.
As mentioned previously, recursive common table expressions (CTEs) are frequently used for series generation and traversing hierarchical or tree-structured data. This section shows some simple examples of these techniques.
Fibonacci Series Generation
A Fibonacci series begins with the two numbers 0 and 1 (or 1
and 1) and each number after that is the sum of the previous
two numbers. A recursive common table expression can generate
a Fibonacci series if each row produced by the recursive
SELECT has access to the two
previous numbers from the series. The following CTE generates
a 10-number series using 0 and 1 as the first two numbers:
WITH RECURSIVE fibonacci (n, fib_n, next_fib_n) AS
(
SELECT 1, 0, 1
UNION ALL
SELECT n + 1, next_fib_n, fib_n + next_fib_n
FROM fibonacci WHERE n < 10
)
SELECT * FROM fibonacci;The CTE produces this result:
+------+-------+------------+
| n | fib_n | next_fib_n |
+------+-------+------------+
| 1 | 0 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 2 |
| 4 | 2 | 3 |
| 5 | 3 | 5 |
| 6 | 5 | 8 |
| 7 | 8 | 13 |
| 8 | 13 | 21 |
| 9 | 21 | 34 |
| 10 | 34 | 55 |
+------+-------+------------+How the CTE works:
nis a display column to indicate that the row contains then-th Fibonacci number. For example, the 8th Fibonacci number is 13.The
fib_ncolumn displays Fibonacci numbern.The
next_fib_ncolumn displays the next Fibonacci number after numbern. This column provides the next series value to the next row, so that row can produce the sum of the two previous series values in itsfib_ncolumn.Recursion ends when
nreaches 10. This is an arbitrary choice, to limit the output to a small set of rows.
The preceding output shows the entire CTE result. To select
just part of it, add an appropriate WHERE
clause to the top-level SELECT.
For example, to select the 8th Fibonacci number, do this:
mysql> WITH RECURSIVE fibonacci ...
...
SELECT fib_n FROM fibonacci WHERE n = 8;
+-------+
| fib_n |
+-------+
| 13 |
+-------+Date Series Generation
A common table expression can generate a series of successive dates, which is useful for generating summaries that include a row for all dates in the series, including dates not represented in the summarized data.
Suppose that a table of sales numbers contains these rows:
mysql> SELECT * FROM sales ORDER BY date, price;
+------------+--------+
| date | price |
+------------+--------+
| 2017-01-03 | 100.00 |
| 2017-01-03 | 200.00 |
| 2017-01-06 | 50.00 |
| 2017-01-08 | 10.00 |
| 2017-01-08 | 20.00 |
| 2017-01-08 | 150.00 |
| 2017-01-10 | 5.00 |
+------------+--------+This query summarizes the sales per day:
mysql> SELECT date, SUM(price) AS sum_price
FROM sales
GROUP BY date
ORDER BY date;
+------------+-----------+
| date | sum_price |
+------------+-----------+
| 2017-01-03 | 300.00 |
| 2017-01-06 | 50.00 |
| 2017-01-08 | 180.00 |
| 2017-01-10 | 5.00 |
+------------+-----------+
However, that result contains “holes” for dates
not represented in the range of dates spanned by the table. A
result that represents all dates in the range can be produces
using a recursive CTE to generate that set of dates, joined
with a LEFT JOIN to the sales data.
Here is the CTE to generate the date range series:
WITH RECURSIVE dates (date) AS
(
SELECT MIN(date) FROM sales
UNION ALL
SELECT date + INTERVAL 1 DAY FROM dates
WHERE date + INTERVAL 1 DAY <= (SELECT MAX(date) FROM sales)
)
SELECT * FROM dates;The CTE produces this result:
+------------+
| date |
+------------+
| 2017-01-03 |
| 2017-01-04 |
| 2017-01-05 |
| 2017-01-06 |
| 2017-01-07 |
| 2017-01-08 |
| 2017-01-09 |
| 2017-01-10 |
+------------+How the CTE works:
Joining the CTE with a LEFT JOIN against
the sales table produces the sales summary
with a row for each date in the range:
WITH RECURSIVE dates (date) AS
(
SELECT MIN(date) FROM sales
UNION ALL
SELECT date + INTERVAL 1 DAY FROM dates
WHERE date + INTERVAL 1 DAY <= (SELECT MAX(date) FROM sales)
)
SELECT dates.date, COALESCE(SUM(price), 0) AS sum_price
FROM dates LEFT JOIN sales ON dates.date = sales.date
GROUP BY dates.date
ORDER BY dates.date;The output looks like this:
+------------+-----------+
| date | sum_price |
+------------+-----------+
| 2017-01-03 | 300.00 |
| 2017-01-04 | 0.00 |
| 2017-01-05 | 0.00 |
| 2017-01-06 | 50.00 |
| 2017-01-07 | 0.00 |
| 2017-01-08 | 180.00 |
| 2017-01-09 | 0.00 |
| 2017-01-10 | 5.00 |
+------------+-----------+Some points to note:
Are the queries inefficient, particularly the one with the
MAX()subquery executed for each row in the recursiveSELECT? Checking withEXPLAINshows that the subqueries are optimized away for efficiency.The use of
COALESCE()avoids displayingNULLin thesum_pricecolumn on days for which no sales data occur in thesalestable.
Hierarchical Data Traversal
Recursive common table expressions are useful for traversing
data that forms a hierarchy. Consider these statements that
create a small data set that shows, for each employee in a
company, the employee name and ID number, and the ID of the
employee's manager. The top-level employee (the CEO), has a
manager ID of NULL (no manager).
CREATE TABLE employees (
id INT PRIMARY KEY NOT NULL,
name VARCHAR(100) NOT NULL,
manager_id INT NULL,
INDEX (manager_id),
FOREIGN KEY (manager_id) REFERENCES EMPLOYEES (id)
);
INSERT INTO employees VALUES
(333, "Yasmina", NULL), # Yasmina is the CEO (manager_id is NULL)
(198, "John", 333), # John has ID 198 and reports to 333 (Yasmina)
(692, "Tarek", 333),
(29, "Pedro", 198),
(4610, "Sarah", 29),
(72, "Pierre", 29),
(123, "Adil", 692);The resulting data set looks like this:
mysql> SELECT * FROM employees ORDER BY id;
+------+---------+------------+
| id | name | manager_id |
+------+---------+------------+
| 29 | Pedro | 198 |
| 72 | Pierre | 29 |
| 123 | Adil | 692 |
| 198 | John | 333 |
| 333 | Yasmina | NULL |
| 692 | Tarek | 333 |
| 4610 | Sarah | 29 |
+------+---------+------------+To produce the organizational chart with the management chain for each employee (that is, the path from CEO to employee), use a recursive CTE:
WITH RECURSIVE employee_paths (id, name, path) AS
(
SELECT id, name, CAST(id AS CHAR(200))
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, CONCAT(ep.path, ',', e.id)
FROM employee_paths AS ep JOIN employees AS e
ON ep.id = e.manager_id
)
SELECT * FROM employee_paths ORDER BY path;The CTE produces this output:
+------+---------+-----------------+
| id | name | path |
+------+---------+-----------------+
| 333 | Yasmina | 333 |
| 198 | John | 333,198 |
| 29 | Pedro | 333,198,29 |
| 4610 | Sarah | 333,198,29,4610 |
| 72 | Pierre | 333,198,29,72 |
| 692 | Tarek | 333,692 |
| 123 | Adil | 333,692,123 |
+------+---------+-----------------+How the CTE works:
The nonrecursive
SELECTproduces the row for the CEO (the row with aNULLmanager ID).The
pathcolumn is widened toCHAR(200)to ensure that there is room for the longerpathvalues produced by the recursiveSELECT.Each row produced by the recursive
SELECTfinds all employees who report directly to an employee produced by a previous row. For each such employee, the row includes the employee ID and name, and the employee management chain. The chain is the manager's chain, with the employee ID added to the end.Recursion ends when employees have no others who report to them.
To find the path for a specific employee or employees, add a
WHERE clause to the top-level
SELECT. For example, to display
the results for Tarek and Sarah, modify that
SELECT like this:
mysql> WITH RECURSIVE ...
...
SELECT * FROM employees_extended
WHERE id IN (692, 4610)
ORDER BY path;
+------+-------+-----------------+
| id | name | path |
+------+-------+-----------------+
| 4610 | Sarah | 333,198,29,4610 |
| 692 | Tarek | 333,692 |
+------+-------+-----------------+
Common table expressions (CTEs) are similar to derived tables in some ways:
Both constructs are named.
Both constructs exist for the scope of a single statement.
Because of these similarities, CTEs and derived tables often can be used interchangeably. As a trivial example, these statements are equivalent:
WITH cte AS (SELECT 1) SELECT * FROM cte;
SELECT * FROM (SELECT 1) AS dt;However, CTEs have some advantages over derived tables:
A derived table can be referenced only a single time within a query. A CTE can be referenced multiple times. To use multiple instances of a derived table result, you must derive the result multiple times.
A CTE can be self-referencing (recursive).
One CTE can refer to another.
A CTE may be easier to read when its definition appears at the beginning of the statement rather than embedded within it.
CTEs are similar to tables created with
CREATE
[TEMPORARY] TABLE but need not be defined or dropped
explicitly. For a CTE, you need no privileges to create
tables.