Several character set and collation system variables relate to a client's interaction with the server. Some of these have been mentioned in earlier sections:
The server character set and collation are the values of the
character_set_server and
collation_server system
variables.
The character set and collation of the default database are
the values of the
character_set_database and
collation_database system
variables.
Additional character set and collation system variables are involved in handling traffic for the connection between a client and the server. Every client has connection-related character set and collation system variables.
A “connection” is what you make when you connect to the server. The client sends SQL statements, such as queries, over the connection to the server. The server sends responses, such as result sets or error messages, over the connection back to the client. This leads to several questions about character set and collation handling for client connections, each of which can be answered in terms of system variables:
What character set is the statement in when it leaves the client?
The server takes the
character_set_client system
variable to be the character set in which statements are
sent by the client.
What character set should the server translate a statement to after receiving it?
For this, the server uses the
character_set_connection
and collation_connection
system variables. It converts statements sent by the client
from character_set_client
to character_set_connection
(except for string literals that have an introducer such as
_latin1 or _utf8).
collation_connection is
important for comparisons of literal strings. For
comparisons of strings with column values,
collation_connection does
not matter because columns have their own collation, which
has a higher collation precedence.
What character set should the server translate to before shipping result sets or error messages back to the client?
The character_set_results
system variable indicates the character set in which the
server returns query results to the client. This includes
result data such as column values, and result metadata such
as column names and error messages.
Clients can fine-tune the settings for these variables, or depend on the defaults (in which case, you can skip the rest of this section). If you do not use the defaults, you must change the character settings for each connection to the server.
Two statements affect the connection-related character set variables as a group:
SET NAMES '
charset_name'
[COLLATE
'collation_name']
SET NAMES indicates what
character set the client will use to send SQL statements to
the server. Thus, SET NAMES 'cp1251'
tells the server, “future incoming messages from this
client are in character set
cp1251.” It also specifies the
character set that the server should use for sending results
back to the client. (For example, it indicates what
character set to use for column values if you use a
SELECT statement.)
A SET NAMES
'
statement is equivalent to these three statements:
charset_name'
SET character_set_client =charset_name; SET character_set_results =charset_name; SET character_set_connection =charset_name;
Setting
character_set_connection to
charset_name also implicitly sets
collation_connection to the
default collation for
charset_name. It is unnecessary
to set that collation explicitly. To specify a particular
collation, use the optional COLLATE
clause:
SET NAMES 'charset_name' COLLATE 'collation_name'
SET CHARACTER SET
charset_name
SET CHARACTER SET is similar
to SET NAMES but sets
character_set_connection
and collation_connection to
character_set_database and
collation_database. A
SET
CHARACTER SET
statement
is equivalent to these three statements:
charset_name
SET character_set_client =charset_name; SET character_set_results =charset_name; SET collation_connection = @@collation_database;
Setting
collation_connection also
implicitly sets
character_set_connection to
the character set associated with the collation (equivalent
to executing SET character_set_connection =
@@character_set_database). It is unnecessary to
set
character_set_connection
explicitly.
ucs2, utf16, and
utf32 cannot be used as a client character
set, which means that they do not work for
SET NAMES or
SET CHARACTER SET.
The MySQL client programs mysql,
mysqladmin, mysqlcheck,
mysqlimport, and mysqlshow
determine the default character set to use as follows:
In the absence of other information, the programs use the
compiled-in default character set, usually
latin1.
The programs can autodetect which character set to use based
on the operating system setting, such as the value of the
LANG or LC_ALL locale
environment variable on Unix systems or the code page
setting on Windows systems. For systems on which the locale
is available from the OS, the client uses it to set the
default character set rather than using the compiled-in
default. For example, setting LANG to
ru_RU.KOI8-R causes the
koi8r character set to be used. Thus,
users can configure the locale in their environment for use
by MySQL clients.
The OS character set is mapped to the closest MySQL
character set if there is no exact match. If the client does
not support the matching character set, it uses the
compiled-in default. For example, ucs2 is
not supported as a connection character set.
C applications can use character set autodetection based on
the OS setting by invoking
mysql_options() as follows
before connecting to the server:
mysql_options(mysql,
MYSQL_SET_CHARSET_NAME,
MYSQL_AUTODETECT_CHARSET_NAME);
The programs support a
--default-character-set
option, which enables users to specify the character set
explicitly to override whatever default the client otherwise
determines.
Before MySQL 5.5, in the absence of other
information, the MySQL client programs used the compiled-in
default character set, usually latin1. An
implication of this difference is that if your environment is
configured to use a non-latin1 locale,
MySQL client programs will use a different connection
character set than previously, as though you had issued an
implicit SET NAMES statement.
If the previous behavior is required, start the client with
the --default-character-set=latin1 option.
When a client connects to the server, it sends the name of the
character set that it wants to use. The server uses the name to
set the character_set_client,
character_set_results, and
character_set_connection system
variables. In effect, the server performs a
SET NAMES operation using the
character set name.
With the mysql client, to use a character set
different from the default, you could explicitly execute
SET NAMES every time you start
up. To accomplish the same result more easily, add the
--default-character-set option
setting to your mysql command line or in your
option file. For example, the following option file setting
changes the three connection-related character set variables set
to koi8r each time you invoke
mysql:
[mysql] default-character-set=koi8r
If you are using the mysql client with
auto-reconnect enabled (which is not recommended), it is
preferable to use the charset command rather
than SET NAMES. For example:
mysql> charset utf8
Charset changed
The charset command issues a
SET NAMES statement, and also
changes the default character set that mysql
uses when it reconnects after the connection has dropped.
Example: Suppose that column1 is defined as
CHAR(5) CHARACTER SET latin2. If you do not
say SET NAMES or
SET CHARACTER SET, then for
SELECT column1 FROM t, the server sends back
all the values for column1 using the
character set that the client specified when it connected. On
the other hand, if you say SET NAMES 'latin1'
or SET CHARACTER SET latin1 before issuing
the SELECT statement, the server
converts the latin2 values to
latin1 just before sending results back.
Conversion may be lossy if there are characters that are not in
both character sets.
If you want the server to perform no conversion of result sets
or error messages, set
character_set_results to
NULL or binary:
SET character_set_results = NULL;
To see the values of the character set and collation system variables that apply to your connection, use these statements:
SHOW VARIABLES LIKE 'character_set%'; SHOW VARIABLES LIKE 'collation%';
You must also consider the environment within which your MySQL applications execute. See Section 10.1.5, “Configuring Application Character Set and Collation”.
For more information about character sets and error messages, see Section 10.1.6, “Error Message Character Set”.