Pre-General Availability Draft: 2017-07-17
The BINARY and VARBINARY
types are similar to CHAR and
VARCHAR, except that they contain
binary strings rather than nonbinary strings. That is, they
contain byte strings rather than character strings. This means
they have the binary character set and
collation, and comparison and sorting are based on the numeric
values of the bytes in the values.
The permissible maximum length is the same for
BINARY and VARBINARY as it
is for CHAR and
VARCHAR, except that the length
for BINARY and VARBINARY
is a length in bytes rather than in characters.
The BINARY and VARBINARY
data types are distinct from the CHAR BINARY
and VARCHAR BINARY data types. For the latter
types, the BINARY attribute does not cause
the column to be treated as a binary string column. Instead, it
causes the binary (_bin) collation for the
column character set to be used, and the column itself contains
nonbinary character strings rather than binary byte strings. For
example, CHAR(5) BINARY is treated as
CHAR(5) CHARACTER SET utf8mb4 COLLATE
utf8mb4_bin, assuming that the default character set
is utf8mb4. This differs from
BINARY(5), which stores 5-bytes binary
strings that have the binary character set
and collation. For information about differences between binary
strings and binary collations for nonbinary strings, see
Section 10.1.8.5, “The binary Collation Compared to _bin Collations”.
If strict SQL mode is not enabled and you assign a value to a
BINARY or VARBINARY column
that exceeds the column's maximum length, the value is truncated
to fit and a warning is generated. For cases of truncation, you
can cause an error to occur (rather than a warning) and suppress
insertion of the value by using strict SQL mode. See
Section 5.1.8, “Server SQL Modes”.
When BINARY values are stored, they are
right-padded with the pad value to the specified length. The pad
value is 0x00 (the zero byte). Values are
right-padded with 0x00 on insert, and no
trailing bytes are removed on select. All bytes are significant
in comparisons, including ORDER BY and
DISTINCT operations. 0x00
bytes and spaces are different in comparisons, with
0x00 < space.
Example: For a BINARY(3) column,
'a ' becomes
'a \0' when inserted.
'a\0' becomes 'a\0\0' when
inserted. Both inserted values remain unchanged when selected.
For VARBINARY, there is no padding on insert
and no bytes are stripped on select. All bytes are significant
in comparisons, including ORDER BY and
DISTINCT operations. 0x00
bytes and spaces are different in comparisons, with
0x00 < space.
For those cases where trailing pad bytes are stripped or
comparisons ignore them, if a column has an index that requires
unique values, inserting into the column values that differ only
in number of trailing pad bytes will result in a duplicate-key
error. For example, if a table contains 'a',
an attempt to store 'a\0' causes a
duplicate-key error.
You should consider the preceding padding and stripping
characteristics carefully if you plan to use the
BINARY data type for storing binary data and
you require that the value retrieved be exactly the same as the
value stored. The following example illustrates how
0x00-padding of BINARY
values affects column value comparisons:
mysql> CREATE TABLE t (c BINARY(3));
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO t SET c = 'a';
Query OK, 1 row affected (0.01 sec)
mysql> SELECT HEX(c), c = 'a', c = 'a\0\0' from t;
+--------+---------+-------------+
| HEX(c) | c = 'a' | c = 'a\0\0' |
+--------+---------+-------------+
| 610000 | 0 | 1 |
+--------+---------+-------------+
1 row in set (0.09 sec)
If the value retrieved must be the same as the value specified
for storage with no padding, it might be preferable to use
VARBINARY or one of the
BLOB data types instead.
Thus, the operating system, with its standard character set, defines how characters are converted into binary values. Only there is no MySQL character set definition involved.
Examples:
mysql> SELECT x'4D7953514C';
-> 'MySQL'
mysql> SELECT 0x4D7953514C;
-> 'MySQL'
When you use char types the server takes care of things so that anyone will read the same text that was inserted, regardless of everyone's local encoding settings.
In the end, I guess I am wondering if it would be more efficient to index the reverse of the UUID, or leave it as it is?
So the options I'm interested in are as follows (assumption is that a primary field of BINARY(16) is populated with the function):
CREATE FUNCTION id() RETURNS binary(16) RETURN unhex(REPLACE(UUID(),'-',''));
or
CREATE FUNCTION id() RETURNS binary(16) RETURN unhex(reverse(REPLACE(UUID(),'-','')));