Online DDL improves several aspects of MySQL operation, such as performance, concurrency, availability, and scalability:
Because queries and DML operations on the table can proceed while the DDL is in progress, applications that access the table are more responsive. Reduced locking and waiting for other resources throughout the MySQL server leads to greater scalability, even for operations not involving the table being altered.
For in-place operations, by avoiding the disk I/O and CPU cycles to rebuild the table, you minimize the overall load on the database and maintain good performance and high throughput during the DDL operation.
For in-place operations, because less data is read into the buffer pool than if all the data was copied, you avoid purging frequently accessed data from memory, which formerly could cause a temporary performance dip after a DDL operation.
If an online operation requires temporary sort files,
InnoDB creates them in the temporary file
directory, not the directory containing the original table. If
this directory is not large enough to hold such files, you may
need to set the tmpdir system
variable to a different directory. For more information, see
Section B.5.3.5, “Where MySQL Stores Temporary Files”.
While an InnoDB table is being changed by a DDL operation, the
table may or may not be
locked, depending on the
internal workings of that operation and the
LOCK clause of the ALTER
TABLE statement. By default, MySQL uses as little
locking as possible during a DDL operation; you specify the clause
either to make the locking more restrictive than it normally would
be (thus limiting concurrent DML, or DML and queries), or to
ensure that some expected degree of locking is allowed for an
operation. If the LOCK clause specifies a level
of locking that is not available for that specific kind of DDL
operation, such as LOCK=SHARED or
LOCK=NONE while creating or dropping a primary
key, the clause works like an assertion, causing the statement to
fail with an error. The following list shows the different
possibilities for the LOCK clause, from the
most permissive to the most restrictive:
For DDL operations with LOCK=NONE, both
queries and concurrent DML are allowed. This clause makes the
ALTER TABLE fail if the kind of
DDL operation cannot be performed with the requested type of
locking, so specify LOCK=NONE if keeping
the table fully available is vital and it is OK to cancel the
DDL if that is not possible. For example, you might use this
clause in DDLs for tables involving customer signups or
purchases, to avoid making those tables unavailable by
mistakenly issuing an expensive ALTER
TABLE statement.
For DDL operations with LOCK=SHARED, any
writes to the table (that is, DML operations) are blocked, but
the data in the table can be read. This clause makes the
ALTER TABLE fail if the kind of
DDL operation cannot be performed with the requested type of
locking, so specify LOCK=SHARED if keeping
the table available for queries is vital and it is OK to
cancel the DDL if that is not possible. For example, you might
use this clause in DDLs for tables in a data warehouse, where
it is OK to delay data load operations until the DDL is
finished, but queries cannot be delayed for long periods.
For DDL operations with LOCK=DEFAULT, or
with the LOCK clause omitted, MySQL uses
the lowest level of locking that is available for that kind of
operation, allowing concurrent queries, DML, or both wherever
possible. This is the setting to use when making pre-planned,
pre-tested changes that you know will not cause any
availability problems based on the workload for that table.
For DDL operations with LOCK=EXCLUSIVE,
both queries and DML operations are blocked. This clause makes
the ALTER TABLE fail if the
kind of DDL operation cannot be performed with the requested
type of locking, so specify LOCK=EXCLUSIVE
if the primary concern is finishing the DDL in the shortest
time possible, and it is OK to make applications wait when
they try to access the table. You might also use
LOCK=EXCLUSIVE if the server is supposed to
be idle, to avoid unexpected accesses to the table.
Depending on the internal workings of the online DDL operation and
the LOCK clause of the
ALTER TABLE statement, an
online DDL operation may wait for currently executing transactions
that are accessing the table to
commit or
roll back before completing
because exclusive access to the table is required for a brief time
during the initial and final phases of the DDL operation. These
online DDL statements also wait for table-accessing transactions
started while the DDL is in progress to commit or roll back before
completing. Consequently, in the case of long running transactions
performing inserts, updates, deletes, or SELECT ... FOR
UPDATE on the table, an online
ALTER TABLE operation can time out
as it waits for exclusive access to the table. DML operations that
are run on the table while the online ALTER
TABLE operation waits for an exclusive table lock may
also be blocked.
Because there is some processing work involved with recording the changes made by concurrent DML operations, then applying those changes at the end, an online DDL operation could take longer overall than the old-style mechanism that blocks table access from other sessions. The reduction in raw performance is balanced against better responsiveness for applications that use the table. When evaluating the ideal techniques for changing table structure, consider end-user perception of performance, based on factors such as load times for web pages.
A newly created InnoDB secondary index contains only the committed
data in the table at the time the CREATE
INDEX or ALTER TABLE
statement finishes executing. It does not contain any uncommitted
values, old versions of values, or values marked for deletion but
not yet removed from the old index.
The raw performance of an online DDL operation is largely determined by whether the operation is performed in-place, or requires copying and rebuilding the entire table. See Table 14.8, “Summary of Online Status for DDL Operations” to see what kinds of operations can be performed in-place, and any requirements for avoiding table-copy operations.
The performance speedup from in-place DDL applies to operations on secondary indexes, not to the primary key index. The rows of an InnoDB table are stored in a clustered index organized based on the primary key, forming what some database systems call an “index-organized table”. Because the table structure is closely tied to the primary key, redefining the primary key still requires copying the data.
When an operation on the primary key uses
ALGORITHM=INPLACE, even though the data is
still copied, it is more efficient than using
ALGORITHM=COPY because:
No undo logging or associated redo logging is required for
ALGORITHM=INPLACE. These operations add
overhead to DDL statements that use
ALGORITHM=COPY.
The secondary index entries are pre-sorted, and so can be loaded in order.
The change buffer is not used, because there are no random-access inserts into the secondary indexes.
To judge the relative performance of online DDL operations, you
can run such operations on a big InnoDB table
using current and earlier versions of MySQL. You can also run all
the performance tests under the latest MySQL version, simulating
the previous DDL behavior for the “before” results,
by setting the old_alter_table
system variable. Issue the statement set
old_alter_table=1 in the session, and measure DDL
performance to record the “before” figures. Then
set old_alter_table=0 to re-enable the newer,
faster behavior, and run the DDL operations again to record the
“after” figures.
For a basic idea of whether a DDL operation does its changes in-place or performs a table copy, look at the “rows affected” value displayed after the command finishes. For example, here are lines you might see after doing different types of DDL operations:
Changing the default value of a column (super-fast, does not affect the table data at all):
Query OK, 0 rows affected (0.07 sec)
Adding an index (takes time, but 0 rows
affected shows that the table is not copied):
Query OK, 0 rows affected (21.42 sec)
Changing the data type of a column (takes substantial time and does require rebuilding all the rows of the table):
Query OK, 1671168 rows affected (1 min 35.54 sec)
For example, before running a DDL operation on a big table, you might check whether the operation will be fast or slow as follows:
Clone the table structure.
Populate the cloned table with a tiny amount of data.
Run the DDL operation on the cloned table.
Check whether the “rows affected” value is zero or not. A non-zero value means the operation will require rebuilding the entire table, which might require special planning. For example, you might do the DDL operation during a period of scheduled downtime, or on each replication slave server one at a time.
For a deeper understanding of the reduction in MySQL processing,
examine the performance_schema and
INFORMATION_SCHEMA tables related to
InnoDB before and after DDL operations, to see
the number of physical reads, writes, memory allocations, and so
on.