The character set named utf8 uses a maximum
of three bytes per character and contains only BMP characters.
The utf8mb4 character set uses a maximum of
four bytes per character supports supplemental characters:
For a BMP character,
utf8andutf8mb4have identical storage characteristics: same code values, same encoding, same length.For a supplementary character,
utf8cannot store the character at all, whereasutf8mb4requires four bytes to store it. Becauseutf8cannot store the character at all, you have no supplementary characters inutf8columns and need not worry about converting characters or losing data when upgradingutf8data from older versions of MySQL.
utf8mb4 is a superset of
utf8, so for an operation such as the
following concatenation, the result has character set
utf8mb4 and the collation of
utf8mb4_col:
SELECT CONCAT(utf8_col, utf8mb4_col);
Similarly, the following comparison in the
WHERE clause works according to the
collation of utf8mb4_col:
SELECT * FROM utf8_tbl, utf8mb4_tbl WHERE utf8_tbl.utf8_col = utf8mb4_tbl.utf8mb4_col;
Tip: To save space with
utf8mb4, use
VARCHAR instead of
CHAR. Otherwise, MySQL must
reserve four bytes for each character in a CHAR
CHARACTER SET utf8mb4 column because that is the
maximum possible length. For example, MySQL must reserve 40
bytes for a CHAR(10) CHARACTER SET utf8mb4
column.