The persistent optimizer statistics feature improves plan stability by storing statistics to disk and making them persistent across server restarts so that the optimizer is more likely to make consistent choices each time for a given query.
Optimizer statistics are persisted to disk when
innodb_stats_persistent=ON or
when individual tables are created or altered with
STATS_PERSISTENT=1.
innodb_stats_persistent is
enabled by default as of MySQL 5.6.6.
Formerly, optimizer statistics were cleared on each server restart and after some other operations, and recomputed on the next table access. Consequently, different estimates could be produced when recalculating statistics, leading to different choices in query execution plans and thus variations in query performance.
Persistent statistics are stored in the
mysql.innodb_table_stats and
mysql.innodb_index_stats tables, as described
in Section 14.6.11.1.5, “InnoDB Persistent Statistics Tables”.
To revert to using non-persistent optimizer statistics, you can
modify tables using an ALTER TABLE
statement. For related information, see
Section 14.6.11.2, “Configuring Non-Persistent Optimizer Statistics Parameters”
tbl_name STATS_PERSISTENT=0
The innodb_stats_auto_recalc
configuration option, which is enabled by default, determines
whether statistics are calculated automatically whenever a
table undergoes substantial changes (to more than 10% of the
rows). You can also configure automatic statistics
recalculation for individual tables using a
STATS_AUTO_RECALC clause in a
CREATE TABLE or
ALTER TABLE statement.
innodb_stats_auto_recalc is
enabled by default.
Because of the asynchronous nature of automatic statistics
recalculation (which occurs in the background), statistics may
not be recalculated instantly after running a DML operation
that affects more than 10% of a table, even when
innodb_stats_auto_recalc is
enabled. In some cases, statistics recalculation may be
delayed by a few seconds. If up-to-date statistics are
required immediately after changing significant portions of a
table, run ANALYZE
TABLE to initiate a synchronous (foreground)
recalculation of statistics.
If innodb_stats_auto_recalc
is disabled, ensure the accuracy of optimizer statistics by
issuing the ANALYZE TABLE
statement for each applicable table after making substantial
changes to indexed columns. You might run this statement in
your setup scripts after representative data has been loaded
into the table, and run it periodically after DML operations
significantly change the contents of indexed columns, or on a
schedule at times of low activity. When a new index is added
to an existing table, index statistics are calculated and
added to the innodb_index_stats table
regardless of the value of
innodb_stats_auto_recalc.
To ensure statistics are gathered when a new index is
created, either enable the
innodb_stats_auto_recalc
option, or run ANALYZE TABLE
after creating each new index when the persistent statistics
mode is enabled.
innodb_stats_persistent,
innodb_stats_auto_recalc, and
innodb_stats_persistent_sample_pages
are global configuration options. To override these
system-wide settings and configure optimizer statistics
parameters for individual tables, you can define
STATS_PERSISTENT,
STATS_AUTO_RECALC, and
STATS_SAMPLE_PAGES clauses in
CREATE TABLE or
ALTER TABLE statements.
STATS_PERSISTENT specifies whether to
enable
persistent
statistics for an InnoDB table.
The value DEFAULT causes the persistent
statistics setting for the table to be determined by the
innodb_stats_persistent
configuration option. The value 1
enables persistent statistics for the table, while the
value 0 turns off this feature. After
enabling persistent statistics through a CREATE
TABLE or ALTER TABLE
statement, issue an ANALYZE
TABLE statement to calculate the statistics,
after loading representative data into the table.
STATS_AUTO_RECALC specifies whether to
automatically recalculate
persistent
statistics for an InnoDB table.
The value DEFAULT causes the persistent
statistics setting for the table to be determined by the
innodb_stats_auto_recalc
configuration option. The value 1
causes statistics to be recalculated when 10% of the data
in the table has changed. The value 0
prevents automatic recalculation for this table; with this
setting, issue an ANALYZE
TABLE statement to recalculate the statistics
after making substantial changes to the table.
STATS_SAMPLE_PAGES specifies the number
of index pages to sample when estimating cardinality and
other statistics for an indexed column, such as those
calculated by ANALYZE
TABLE.
All three clauses are specified in the following
CREATE TABLE example:
CREATE TABLE `t1` ( `id` int(8) NOT NULL auto_increment, `data` varchar(255), `date` datetime, PRIMARY KEY (`id`), INDEX `DATE_IX` (`date`) ) ENGINE=InnoDB, STATS_PERSISTENT=1, STATS_AUTO_RECALC=1, STATS_SAMPLE_PAGES=25;
The MySQL query optimizer uses estimated
statistics about key
distributions to choose the indexes for an execution plan,
based on the relative
selectivity of the
index. Operations such as ANALYZE
TABLE cause InnoDB to sample
random pages from each index on a table to estimate the
cardinality of the
index. (This technique is known as
random dives.)
To give you control over the quality of the statistics
estimate (and thus better information for the query
optimizer), you can change the number of sampled pages using
the parameter
innodb_stats_persistent_sample_pages,
which can be set at runtime.
innodb_stats_persistent_sample_pages
has a default value of 20. As a general guideline, consider
modifying this parameter when encountering the following
issues:
Statistics are not accurate enough and the
optimizer chooses suboptimal plans, as shown by
EXPLAIN output. The
accuracy of statistics can be checked by comparing the
actual cardinality of an index (as returned by running
SELECT
DISTINCT on the index columns) with the
estimates provided in the
mysql.innodb_index_stats persistent
statistics table.
If it is determined that statistics are not accurate
enough, the value of
innodb_stats_persistent_sample_pages
should be increased until the statistics estimates are
sufficiently accurate. Increasing
innodb_stats_persistent_sample_pages
too much, however, could cause
ANALYZE TABLE to run
slowly.
ANALYZE TABLE is
too slow. In this case
innodb_stats_persistent_sample_pages
should be decreased until ANALYZE
TABLE execution time is acceptable. Decreasing
the value too much, however, could lead to the first
problem of inaccurate statistics and suboptimal query
execution plans.
If a balance cannot be achieved between accurate
statistics and ANALYZE
TABLE execution time, consider decreasing the
number of indexed columns in the table or limiting the
number of partitions to reduce
ANALYZE TABLE complexity.
The number of columns in the table's primary key is also
important to consider, as primary key columns are appended
to each non-unique index.
For related information, see Section 14.6.11.3, “Estimating ANALYZE TABLE Complexity for InnoDB Tables”.
By default, InnoDB reads uncommitted data
when calculating statistics. In the case of an uncommitted
transaction that deletes rows from a table,
InnoDB excludes records that are
delete-marked when calculating row estimates and index
statistics, which can lead to non-optimal execution plans for
other transactions that are operating on the table
concurrently using a transaction isolation level other than
READ UNCOMMITTED. To avoid
this scenario,
innodb_stats_include_delete_marked
can be enabled to ensure that InnoDB
includes delete-marked records when calculating persistent
optimizer statistics.
When
innodb_stats_include_delete_marked
is enabled, ANALYZE TABLE
considers delete-marked records when recalculating statistics.
innodb_stats_include_delete_marked
is a global setting that affects all InnoDB
tables, and it is only applicable to persistent optimizer
statistics.
innodb_stats_include_delete_marked
was introduced in MySQL 5.6.34.
The persistent statistics feature relies on the internally
managed tables in the mysql database, named
innodb_table_stats and
innodb_index_stats. These tables are set up
automatically in all install, upgrade, and build-from-source
procedures.
Table 14.3 Columns of innodb_table_stats
| Column name | Description |
|---|---|
database_name | Database name |
table_name | Table name, partition name, or subpartition name |
last_update | A timestamp indicating the last time that InnoDB
updated this row |
n_rows | The number of rows in the table |
clustered_index_size | The size of the primary index, in pages |
sum_of_other_index_sizes | The total size of other (non-primary) indexes, in pages |
Table 14.4 Columns of innodb_index_stats
| Column name | Description |
|---|---|
database_name | Database name |
table_name | Table name, partition name, or subpartition name |
index_name | Index name |
last_update | A timestamp indicating the last time that InnoDB
updated this row |
stat_name | The name of the statistic, whose value is reported in the
stat_value column |
stat_value | The value of the statistic that is named in stat_name
column |
sample_size | The number of pages sampled for the estimate provided in the
stat_value column |
stat_description | Description of the statistic that is named in the
stat_name column |
Both the innodb_table_stats and
innodb_index_stats tables include a
last_update column showing when
InnoDB last updated index statistics, as
shown in the following example:
mysql> select * from innodb_table_stats \G
*************************** 1. row ***************************
database_name: sakila
table_name: actor
last_update: 2014-05-28 16:16:44
n_rows: 200
clustered_index_size: 1
sum_of_other_index_sizes: 1
...mysql> select * from innodb_index_stats \G
*************************** 1. row ***************************
database_name: sakila
table_name: actor
index_name: PRIMARY
last_update: 2014-05-28 16:16:44
stat_name: n_diff_pfx01
stat_value: 200
sample_size: 1
...
The innodb_table_stats and
innodb_index_stats tables are ordinary
tables and can be updated manually. The ability to update
statistics manually makes it possible to force a specific
query optimization plan or test alternative plans without
modifying the database. If you manually update statistics,
issue the FLUSH TABLE
command to make
MySQL reload the updated statistics.
tbl_name
The innodb_table_stats table contains one
row per table. The data collected is demonstrated in the
following example.
Table t1 contains a primary index (columns
a, b) secondary index
(columns c, d), and
unique index (columns e,
f):
CREATE TABLE t1 ( a INT, b INT, c INT, d INT, e INT, f INT, PRIMARY KEY (a, b), KEY i1 (c, d), UNIQUE KEY i2uniq (e, f) ) ENGINE=INNODB;
After inserting five rows of sample data, the table appears as follows:
mysql> SELECT * FROM t1; +---+---+------+------+------+------+ | a | b | c | d | e | f | +---+---+------+------+------+------+ | 1 | 1 | 10 | 11 | 100 | 101 | | 1 | 2 | 10 | 11 | 200 | 102 | | 1 | 3 | 10 | 11 | 100 | 103 | | 1 | 4 | 10 | 12 | 200 | 104 | | 1 | 5 | 10 | 12 | 100 | 105 | +---+---+------+------+------+------+
To immediately update statistics, run
ANALYZE TABLE (if
innodb_stats_auto_recalc is
enabled, statistics are updated automatically within a few
seconds assuming that the 10% threshold for changed table rows
is reached):
mysql> ANALYZE TABLE t1; +---------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+----------+ | test.t1 | analyze | status | OK | +---------+---------+----------+----------+
Table statistics for table t1 show the last
time InnoDB updated the table statistics
(2014-03-14 14:36:34), the number of rows
in the table (5), the clustered index size
(1 page), and the combined size of the
other indexes (2 pages).
mysql> SELECT * FROM mysql.innodb_table_stats WHERE table_name like 't1'\G
*************************** 1. row ***************************
database_name: test
table_name: t1
last_update: 2014-03-14 14:36:34
n_rows: 5
clustered_index_size: 1
sum_of_other_index_sizes: 2
The innodb_index_stats table contains
multiple rows for each index. Each row in the
innodb_index_stats table provides data
related to a particular index statistic which is named in the
stat_name column and described in the
stat_description column. For example:
mysql> SELECT index_name, stat_name, stat_value, stat_description
-> FROM mysql.innodb_index_stats WHERE table_name like 't1';
+------------+--------------+------------+-----------------------------------+
| index_name | stat_name | stat_value | stat_description |
+------------+--------------+------------+-----------------------------------+
| PRIMARY | n_diff_pfx01 | 1 | a |
| PRIMARY | n_diff_pfx02 | 5 | a,b |
| PRIMARY | n_leaf_pages | 1 | Number of leaf pages in the index |
| PRIMARY | size | 1 | Number of pages in the index |
| i1 | n_diff_pfx01 | 1 | c |
| i1 | n_diff_pfx02 | 2 | c,d |
| i1 | n_diff_pfx03 | 2 | c,d,a |
| i1 | n_diff_pfx04 | 5 | c,d,a,b |
| i1 | n_leaf_pages | 1 | Number of leaf pages in the index |
| i1 | size | 1 | Number of pages in the index |
| i2uniq | n_diff_pfx01 | 2 | e |
| i2uniq | n_diff_pfx02 | 5 | e,f |
| i2uniq | n_leaf_pages | 1 | Number of leaf pages in the index |
| i2uniq | size | 1 | Number of pages in the index |
+------------+--------------+------------+-----------------------------------+
The stat_name column shows the following
types of statistics:
size: Where
stat_name=size, the
stat_value column displays the total
number of pages in the index.
n_leaf_pages: Where
stat_name=n_leaf_pages,
the stat_value column displays the
number of leaf pages in the index.
n_diff_pfx:
Where
NNstat_name=n_diff_pfx01,
the stat_value column displays the
number of distinct values in the first column of the
index. Where
stat_name=n_diff_pfx02,
the stat_value column displays the
number of distinct values in the first two columns of the
index, and so on. Additionally, where
stat_name=n_diff_pfx,
the NNstat_description column shows a
comma separated list of the index columns that are
counted.
To further illustrate the
n_diff_pfx
statistic, which provides cardinality data, consider the
NNt1 table example. As shown below, the
t1 table is created with a primary index
(columns a, b), a
secondary index (columns c,
d), and a unique index (columns
e, f):
CREATE TABLE t1 ( a INT, b INT, c INT, d INT, e INT, f INT, PRIMARY KEY (a, b), KEY i1 (c, d), UNIQUE KEY i2uniq (e, f) ) ENGINE=INNODB;
After inserting five rows of sample data, the table appears as follows:
mysql> SELECT * FROM t1; +---+---+------+------+------+------+ | a | b | c | d | e | f | +---+---+------+------+------+------+ | 1 | 1 | 10 | 11 | 100 | 101 | | 1 | 2 | 10 | 11 | 200 | 102 | | 1 | 3 | 10 | 11 | 100 | 103 | | 1 | 4 | 10 | 12 | 200 | 104 | | 1 | 5 | 10 | 12 | 100 | 105 | +---+---+------+------+------+------+
When you query the index_name,
stat_name, stat_value,
and stat_description where
stat_name LIKE 'n_diff%', the following
result set is returned:
mysql> SELECT index_name, stat_name, stat_value, stat_description
-> FROM mysql.innodb_index_stats
-> WHERE table_name like 't1' AND stat_name LIKE 'n_diff%';
+------------+--------------+------------+------------------+
| index_name | stat_name | stat_value | stat_description |
+------------+--------------+------------+------------------+
| PRIMARY | n_diff_pfx01 | 1 | a |
| PRIMARY | n_diff_pfx02 | 5 | a,b |
| i1 | n_diff_pfx01 | 1 | c |
| i1 | n_diff_pfx02 | 2 | c,d |
| i1 | n_diff_pfx03 | 2 | c,d,a |
| i1 | n_diff_pfx04 | 5 | c,d,a,b |
| i2uniq | n_diff_pfx01 | 2 | e |
| i2uniq | n_diff_pfx02 | 5 | e,f |
+------------+--------------+------------+------------------+
For the PRIMARY index, there are two
n_diff% rows. The number of rows is equal
to the number of columns in the index.
For non-unique indexes, InnoDB appends
the columns of the primary key.
Where
index_name=PRIMARY
and
stat_name=n_diff_pfx01,
the stat_value is 1,
which indicates that there is a single distinct value in
the first column of the index (column
a). The number of distinct values in
column a is confirmed by viewing the
data in column a in table
t1, in which there is a single distinct
value (1). The counted column
(a) is shown in the
stat_description column of the result
set.
Where
index_name=PRIMARY
and
stat_name=n_diff_pfx02,
the stat_value is 5,
which indicates that there are five distinct values in the
two columns of the index (a,b). The
number of distinct values in columns a
and b is confirmed by viewing the data
in columns a and b
in table t1, in which there are five
distinct values: (1,1),
(1,2), (1,3),
(1,4) and (1,5). The
counted columns (a,b) are shown in the
stat_description column of the result
set.
For the secondary index (i1), there are
four n_diff% rows. Only two columns are
defined for the secondary index (c,d) but
there are four n_diff% rows for the
secondary index because InnoDB suffixes all
non-unique indexes with the primary key. As a result, there
are four n_diff% rows instead of two to
account for the both the secondary index columns
(c,d) and the primary key columns
(a,b).
Where index_name=i1
and
stat_name=n_diff_pfx01,
the stat_value is 1,
which indicates that there is a single distinct value in
the first column of the index (column
c). The number of distinct values in
column c is confirmed by viewing the
data in column c in table
t1, in which there is a single distinct
value: (10). The counted column
(c) is shown in the
stat_description column of the result
set.
Where index_name=i1
and
stat_name=n_diff_pfx02,
the stat_value is 2,
which indicates that there are two distinct values in the
first two columns of the index (c,d).
The number of distinct values in columns
c an d is confirmed
by viewing the data in columns c and
d in table t1, in
which there are two distinct values:
(10,11) and (10,12).
The counted columns (c,d) are shown in
the stat_description column of the
result set.
Where index_name=i1
and
stat_name=n_diff_pfx03,
the stat_value is 2,
which indicates that there are two distinct values in the
first three columns of the index
(c,d,a). The number of distinct values
in columns c, d, and
a is confirmed by viewing the data in
column c, d, and
a in table t1, in
which there are two distinct values:
(10,11,1) and
(10,12,1). The counted columns
(c,d,a) are shown in the
stat_description column of the result
set.
Where index_name=i1
and
stat_name=n_diff_pfx04,
the stat_value is 5,
which indicates that there are five distinct values in the
four columns of the index (c,d,a,b).
The number of distinct values in columns
c, d,
a and b is confirmed
by viewing the data in columns c,
d, a, and
b in table t1, in
which there are five distinct values:
(10,11,1,1),
(10,11,1,2),
(10,11,1,3),
(10,12,1,4) and
(10,12,1,5). The counted columns
(c,d,a,b) are shown in the
stat_description column of the result
set.
For the unique index (i2uniq), there are
two n_diff% rows.
Where
index_name=i2uniq
and
stat_name=n_diff_pfx01,
the stat_value is 2,
which indicates that there are two distinct values in the
first column of the index (column e).
The number of distinct values in column
e is confirmed by viewing the data in
column e in table
t1, in which there are two distinct
values: (100) and
(200). The counted column
(e) is shown in the
stat_description column of the result
set.
Where
index_name=i2uniq
and
stat_name=n_diff_pfx02,
the stat_value is 5,
which indicates that there are five distinct values in the
two columns of the index (e,f). The
number of distinct values in columns e
and f is confirmed by viewing the data
in columns e and f
in table t1, in which there are five
distinct values: (100,101),
(200,102),
(100,103), (200,104)
and (100,105). The counted columns
(e,f) are shown in the
stat_description column of the result
set.
The size of indexes for tables, partitions, or subpartitions
can be retrieved using the
innodb_index_stats table. In the following
example, index sizes are retrieved for table
t1. For a definition of table
t1 and corresponding index statistics, see
Section 14.6.11.1.6, “InnoDB Persistent Statistics Tables Example”.
mysql> SELECT SUM(stat_value) pages, index_name,
-> SUM(stat_value)*@@innodb_page_size size
-> FROM mysql.innodb_index_stats WHERE table_name='t1'
-> AND stat_name = 'size' GROUP BY index_name;
+-------+------------+-------+
| pages | index_name | size |
+-------+------------+-------+
| 1 | PRIMARY | 16384 |
| 1 | i1 | 16384 |
| 1 | i2uniq | 16384 |
+-------+------------+-------+
For partitions or subpartitions, the same query with a
modified WHERE clause can be used to
retrieve index sizes. For example, the following query
retrieves index sizes for partitions of table
t1:
mysql> SELECT SUM(stat_value) pages, index_name,
-> SUM(stat_value)*@@innodb_page_size size
-> FROM mysql.innodb_index_stats WHERE table_name like 't1#P%'
-> AND stat_name = 'size' GROUP BY index_name;