MySQL collation names follow these conventions:
A collation name starts with the name of the character set
with which it is associated, generally followed by one or
more suffixes indicating other collation characteristics.
For example, utf8_general_ci and
latin1_swedish_ci are collations for
the utf8 and latin1
character sets, respectively. The
binary character set has a single
collation, also named binary, with no
suffixes.
A language-specific collation includes a language name.
For example, utf8_turkish_ci and
utf8_hungarian_ci sort characters for
the utf8 character set using the rules
of Turkish and Hungarian, respectively.
Collation suffixes indicate whether a collation is case and accent sensitive, or binary. The following table shows the suffixes used to indicate these characteristics.
Table 10.1 Collation Case Sensitivity Suffixes
| Suffix | Meaning |
_ai |
Accent insensitive |
_as |
Accent sensitive |
_ci |
Case insensitive |
_cs |
Case sensitive |
_bin |
Binary |
For nonbinary collation names that do not specify accent
sensitivity, it is determined by case sensitivity. If a
collation name does not contain _ai or
_as, _ci in the name
implies _ai and _cs
in the name implies _as. For example,
latin1_general_ci is explicitly case
insensitive and implicitly accent insensitive, and
latin1_general_cs is explicitly case
sensitive and implicitly accent sensitive.
For the binary collation of the
binary character set, comparisons are
based on numeric byte values. For the
_bin collation of a nonbinary character
set, comparisons are based on numeric character code
values, which differ from byte values for multibyte
characters. For more information, see
Section 10.1.8.5, “The binary Collation Compared to _bin Collations”.
For Unicode character sets, collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys. For example:
utf8_unicode_520_ci is based on UCA
5.2.0 weight keys
(http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt).
utf8_unicode_ci (with no version
named) is based on UCA 4.0.0 weight keys
(http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt).
For Unicode character sets, the
collations preserve the pre-5.1.24 ordering of the
original
xxx_general_mysql500_ci
collations and permit upgrades for tables created before
MySQL 5.1.24. For more information, see
Section 2.11.3, “Checking Whether Tables or Indexes Must Be Rebuilt”, and
Section 2.11.4, “Rebuilding or Repairing Tables or Indexes”.
xxx_general_ci