10.1.9.8 Converting Between 3-Byte and 4-Byte Unicode Character Sets

This section describes issues that you may face when converting from the utf8 character set to the utf8mb4 character set, or vice versa.

Note

The discussion here focuses primarily on converting between utf8 and utf8mb4, but similar principles apply to converting between the ucs2 character set and character sets such as utf16 or utf32.

The utf8 and utf8mb4 character sets differ as follows:

One advantage of converting from ut8 to utf8mb4 is that this enables applications to use supplementary characters. One tradeoff is that this may increase data storage space requirements.

In most respects, converting from utf8 to utf8mb4 should present few problems. These are the primary potential areas of incompatibility:

Consequently, to convert tables from utf8 to utf8mb4, it may be necessary to change some column or index definitions.

Tables can be converted from utf8 to utf8mb4 by using ALTER TABLE. Suppose that a table was originally defined as follows:

CREATE TABLE t1 (
  col1 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  col2 CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
) CHARACTER SET utf8;

The following statement converts t1 to use utf8mb4:

ALTER TABLE t1
  DEFAULT CHARACTER SET utf8mb4,
  MODIFY col1 CHAR(10)
    CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  MODIFY col2 CHAR(10)
    CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL;

In terms of table content, conversion from utf8 to utf8mb4 presents no problems:

In terms of table structure, the catch when converting from utf8 to utf8mb4 is that the maximum length of a column or index key is unchanged in terms of bytes. Therefore, it is smaller in terms of characters because the maximum length of a character is four bytes instead of three. For the CHAR, VARCHAR, and TEXT data types, watch for these issues when converting your MySQL tables:

If the preceding conditions apply, you must either reduce the defined length of columns or indexes, or continue to use utf8 rather than utf8mb4.

Here are some examples where structural changes may be needed:

The preceding types of changes are most likely to be required only if you have very long columns or indexes. Otherwise, you should be able to convert your tables from utf8 to utf8mb4 without problems, using ALTER TABLE as described previously.

The following items summarize other potential areas of incompatibility:

If you have converted to utf8mb4, utf16, utf16le, or utf32, and then decide to convert back to utf8 or ucs2 (for example, to downgrade to an older version of MySQL), these considerations apply: