Every character string literal has a character set and a collation.
For the simple statement SELECT
', the string has
the connection default character set and collation defined by
the string'character_set_connection
and collation_connection
system variables.
A character string literal may have an optional character set
introducer and COLLATE clause, to designate
it as a string that uses a particular character set and
collation:
[_charset_name]'string' [COLLATEcollation_name]
Character set introducers and the COLLATE
clause are implemented according to standard SQL
specifications.
Examples:
SELECT 'abc'; SELECT _latin1'abc'; SELECT _binary'abc'; SELECT _utf8'abc' COLLATE utf8_danish_ci;
The
_
expression is formally called an
introducer. It tells the parser,
“the string that follows uses character set
charset_namecharset_name.” An introducer
does not change the string to the introducer character set
like CONVERT() would do. It
does not change the string value, although padding may occur.
The introducer is just a signal.
MySQL determines the character set and collation of a character string literal in the following manner:
If both _charset_name and
COLLATE
are
specified, character set
collation_namecharset_name and collation
collation_name are used.
collation_name must be a
permitted collation for
charset_name.
If _charset_name is specified
but COLLATE is not specified, character
set charset_name and its
default collation are used. To see the default collation
for each character set, use the SHOW
CHARACTER SET statement.
If _charset_name is not
specified but COLLATE
is
specified, the connection default character set given by
the
collation_namecharacter_set_connection
system variable and collation
collation_name are used.
collation_name must be a
permitted collation for the connection default character
set.
Otherwise (neither
_charset_name nor
COLLATE
is
specified), the connection default character set and
collation given by the
collation_namecharacter_set_connection
and collation_connection
system variables are used.
Examples:
A nonbinary string with latin1
character set and latin1_german1_ci
collation:
SELECT _latin1'Müller' COLLATE latin1_german1_ci;
A nonbinary string with utf8 character
set and its default collation (that is,
utf8_general_ci):
SELECT _utf8'Müller';
A binary string with binary character
set and its default collation (that is,
binary):
SELECT _binary'Müller';
A nonbinary string with the connection default character
set and utf8_general_ci collation
(fails if the connection character set is not
utf8):
SELECT 'Müller' COLLATE utf8_general_ci;
A string with the connection default character set and collation:
SELECT 'Müller';
An introducer indicates the character set for the following
string, but does not change how the parser performs escape
processing within the string. Escapes are always interpreted
by the parser according to the character set given by
character_set_connection.
The following examples show that escape processing occurs
using
character_set_connection even
in the presence of an introducer. The examples use
SET NAMES (which changes
character_set_connection, as
discussed in Section 10.1.4, “Connection Character Sets and Collations”), and
display the resulting strings using the
HEX() function so that the
exact string contents can be seen.
Example 1:
mysql>SET NAMES latin1;mysql>SELECT HEX('à\n'), HEX(_sjis'à\n');+------------+-----------------+ | HEX('à\n') | HEX(_sjis'à\n') | +------------+-----------------+ | E00A | E00A | +------------+-----------------+
Here, à (hexadecimal value
E0) is followed by \n,
the escape sequence for newline. The escape sequence is
interpreted using the
character_set_connection
value of latin1 to produce a literal
newline (hexadecimal value 0A). This
happens even for the second string. That is, the
_sjis introducer does not affect the
parser's escape processing.
Example 2:
mysql>SET NAMES sjis;mysql>SELECT HEX('à\n'), HEX(_latin1'à\n');+------------+-------------------+ | HEX('à\n') | HEX(_latin1'à\n') | +------------+-------------------+ | E05C6E | E05C6E | +------------+-------------------+
Here,
character_set_connection is
sjis, a character set in which the sequence
of à followed by
\ (hexadecimal values 05
and 5C) is a valid multibyte character.
Hence, the first two bytes of the string are interpreted as a
single sjis character, and the
\ is not interpreted as an escape
character. The following n (hexadecimal
value 6E) is not interpreted as part of an
escape sequence. This is true even for the second string; the
_latin1 introducer does not affect escape
processing.