Table 12.9 String Regular Expression Operators
| Name | Description |
|---|---|
NOT REGEXP | Negation of REGEXP |
REGEXP | Pattern matching using regular expressions |
RLIKE | Synonym for REGEXP |
A regular expression is a powerful way of specifying a pattern for a complex search.
MySQL uses Henry Spencer's implementation of regular
expressions, which is aimed at conformance with POSIX 1003.2.
MySQL uses the extended version to support pattern-matching
operations performed with the
REGEXP operator in SQL statements.
This section summarizes, with examples, the special characters
and constructs that can be used in MySQL for
REGEXP operations. It does not
contain all the details that can be found in Henry Spencer's
regex(7) manual page. That manual page is
included in MySQL source distributions, in the
regex.7 file under the
regex directory. See also
Section 3.3.4.7, “Pattern Matching”.
,
expr
NOT REGEXP patexpr
NOT RLIKE pat
This is the same as NOT
(.
expr REGEXP
pat)
,
expr
REGEXP patexpr
RLIKE pat
Performs a pattern match of a string expression
expr against a pattern
pat. The pattern can be an
extended regular expression, the syntax for which is
discussed later in this section. Returns
1 if expr
matches pat; otherwise it returns
0. If either
expr or
pat is NULL,
the result is NULL.
RLIKE is a
synonym for REGEXP, provided
for mSQL compatibility.
The pattern need not be a literal string. For example, it can be specified as a string expression or table column.
Because MySQL uses the C escape syntax in strings (for
example, \n to represent the newline
character), you must double any \ that
you use in your REGEXP
strings.
REGEXP is not case sensitive,
except when used with binary strings.
mysql>SELECT 'Monty!' REGEXP '.*';-> 1 mysql>SELECT 'new*\n*line' REGEXP 'new\\*.\\*line';-> 1 mysql>SELECT 'a' REGEXP 'A', 'a' REGEXP BINARY 'A';-> 1 0 mysql>SELECT 'a' REGEXP '^[a-d]';-> 1
REGEXP and
RLIKE use
the character set and collations of the arguments when
deciding the type of a character and performing the
comparison. If the arguments have different character sets
or collations, coercibility rules apply as described in
Section 10.1.8.4, “Collation of Expressions”.
The REGEXP and
RLIKE
operators work in byte-wise fashion, so they are not
multibyte safe and may produce unexpected results with
multibyte character sets. In addition, these operators
compare characters by their byte values and accented
characters may not compare as equal even if a given
collation treats them as equal.
A regular expression describes a set of strings. The simplest
regular expression is one that has no special characters in it.
For example, the regular expression hello
matches hello and nothing else.
Nontrivial regular expressions use certain special constructs so
that they can match more than one string. For example, the
regular expression hello|word matches either
the string hello or the string
word.
As a more complex example, the regular expression
B[an]*s matches any of the strings
Bananas, Baaaaas,
Bs, and any other string starting with a
B, ending with an s, and
containing any number of a or
n characters in between.
A regular expression for the REGEXP
operator may use any of the following special characters and
constructs:
^
Match the beginning of a string.
mysql>SELECT 'fo\nfo' REGEXP '^fo$';-> 0 mysql>SELECT 'fofo' REGEXP '^fo';-> 1
$
Match the end of a string.
mysql>SELECT 'fo\no' REGEXP '^fo\no$';-> 1 mysql>SELECT 'fo\no' REGEXP '^fo$';-> 0
.
Match any character (including carriage return and newline).
mysql>SELECT 'fofo' REGEXP '^f.*$';-> 1 mysql>SELECT 'fo\r\nfo' REGEXP '^f.*$';-> 1
a*
Match any sequence of zero or more a
characters.
mysql>SELECT 'Ban' REGEXP '^Ba*n';-> 1 mysql>SELECT 'Baaan' REGEXP '^Ba*n';-> 1 mysql>SELECT 'Bn' REGEXP '^Ba*n';-> 1
a+
Match any sequence of one or more a
characters.
mysql>SELECT 'Ban' REGEXP '^Ba+n';-> 1 mysql>SELECT 'Bn' REGEXP '^Ba+n';-> 0
a?
Match either zero or one a character.
mysql>SELECT 'Bn' REGEXP '^Ba?n';-> 1 mysql>SELECT 'Ban' REGEXP '^Ba?n';-> 1 mysql>SELECT 'Baan' REGEXP '^Ba?n';-> 0
de|abc
Match either of the sequences de or
abc.
mysql>SELECT 'pi' REGEXP 'pi|apa';-> 1 mysql>SELECT 'axe' REGEXP 'pi|apa';-> 0 mysql>SELECT 'apa' REGEXP 'pi|apa';-> 1 mysql>SELECT 'apa' REGEXP '^(pi|apa)$';-> 1 mysql>SELECT 'pi' REGEXP '^(pi|apa)$';-> 1 mysql>SELECT 'pix' REGEXP '^(pi|apa)$';-> 0
(abc)*
Match zero or more instances of the sequence
abc.
mysql>SELECT 'pi' REGEXP '^(pi)*$';-> 1 mysql>SELECT 'pip' REGEXP '^(pi)*$';-> 0 mysql>SELECT 'pipi' REGEXP '^(pi)*$';-> 1
{1}, {2,3}
{n} or {m,n} notation
provides a more general way of writing regular expressions
that match many occurrences of the previous atom (or
“piece”) of the pattern. m
and n are integers.
a*
Can be written as a{0,}.
a+
Can be written as a{1,}.
a?
Can be written as a{0,1}.
To be more precise, a{n} matches exactly
n instances of a.
a{n,} matches n or
more instances of a.
a{m,n} matches m
through n instances of
a, inclusive.
m and n must be in the
range from 0 to
RE_DUP_MAX (default 255), inclusive. If
both m and n are
given, m must be less than or equal to
n.
mysql>SELECT 'abcde' REGEXP 'a[bcd]{2}e';-> 0 mysql>SELECT 'abcde' REGEXP 'a[bcd]{3}e';-> 1 mysql>SELECT 'abcde' REGEXP 'a[bcd]{1,10}e';-> 1
[a-dX], [^a-dX]
Matches any character that is (or is not, if ^ is used)
either a, b,
c, d or
X. A - character
between two other characters forms a range that matches all
characters from the first character to the second. For
example, [0-9] matches any decimal digit.
To include a literal ] character, it must
immediately follow the opening bracket [.
To include a literal - character, it must
be written first or last. Any character that does not have a
defined special meaning inside a [] pair
matches only itself.
mysql>SELECT 'aXbc' REGEXP '[a-dXYZ]';-> 1 mysql>SELECT 'aXbc' REGEXP '^[a-dXYZ]$';-> 0 mysql>SELECT 'aXbc' REGEXP '^[a-dXYZ]+$';-> 1 mysql>SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$';-> 0 mysql>SELECT 'gheis' REGEXP '^[^a-dXYZ]+$';-> 1 mysql>SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$';-> 0
[.characters.]
Within a bracket expression (written using
[ and ]), matches the
sequence of characters of that collating element.
characters is either a single character
or a character name like newline. The
following table lists the permissible character names.
The following table shows the permissible character names and the characters that they match. For characters given as numeric values, the values are represented in octal.
| Name | Character | Name | Character |
|---|---|---|---|
NUL | 0 | SOH | 001 |
STX | 002 | ETX | 003 |
EOT | 004 | ENQ | 005 |
ACK | 006 | BEL | 007 |
alert | 007 | BS | 010 |
backspace | '\b' | HT | 011 |
tab | '\t' | LF | 012 |
newline | '\n' | VT | 013 |
vertical-tab | '\v' | FF | 014 |
form-feed | '\f' | CR | 015 |
carriage-return | '\r' | SO | 016 |
SI | 017 | DLE | 020 |
DC1 | 021 | DC2 | 022 |
DC3 | 023 | DC4 | 024 |
NAK | 025 | SYN | 026 |
ETB | 027 | CAN | 030 |
EM | 031 | SUB | 032 |
ESC | 033 | IS4 | 034 |
FS | 034 | IS3 | 035 |
GS | 035 | IS2 | 036 |
RS | 036 | IS1 | 037 |
US | 037 | space | ' ' |
exclamation-mark | '!' | quotation-mark | '"' |
number-sign | '#' | dollar-sign | '$' |
percent-sign | '%' | ampersand | '&' |
apostrophe | '\'' | left-parenthesis | '(' |
right-parenthesis | ')' | asterisk | '*' |
plus-sign | '+' | comma | ',' |
hyphen | '-' | hyphen-minus | '-' |
period | '.' | full-stop | '.' |
slash | '/' | solidus | '/' |
zero | '0' | one | '1' |
two | '2' | three | '3' |
four | '4' | five | '5' |
six | '6' | seven | '7' |
eight | '8' | nine | '9' |
colon | ':' | semicolon | ';' |
less-than-sign | '<' | equals-sign | '=' |
greater-than-sign | '>' | question-mark | '?' |
commercial-at | '@' | left-square-bracket | '[' |
backslash | '\\' | reverse-solidus | '\\' |
right-square-bracket | ']' | circumflex | '^' |
circumflex-accent | '^' | underscore | '_' |
low-line | '_' | grave-accent | '`' |
left-brace | '{' | left-curly-bracket | '{' |
vertical-line | '|' | right-brace | '}' |
right-curly-bracket | '}' | tilde | '~' |
DEL | 177 |
mysql>SELECT '~' REGEXP '[[.~.]]';-> 1 mysql>SELECT '~' REGEXP '[[.tilde.]]';-> 1
[=character_class=]
Within a bracket expression (written using
[ and ]),
[=character_class=] represents an
equivalence class. It matches all characters with the same
collation value, including itself. For example, if
o and (+) are the
members of an equivalence class, [[=o=]],
[[=(+)=]], and [o(+)]
are all synonymous. An equivalence class may not be used as
an endpoint of a range.
[:character_class:]
Within a bracket expression (written using
[ and ]),
[:character_class:] represents a
character class that matches all characters belonging to
that class. The following table lists the standard class
names. These names stand for the character classes defined
in the ctype(3) manual page. A particular
locale may provide other class names. A character class may
not be used as an endpoint of a range.
| Character Class Name | Meaning |
|---|---|
alnum | Alphanumeric characters |
alpha | Alphabetic characters |
blank | Whitespace characters |
cntrl | Control characters |
digit | Digit characters |
graph | Graphic characters |
lower | Lowercase alphabetic characters |
print | Graphic or space characters |
punct | Punctuation characters |
space | Space, tab, newline, and carriage return |
upper | Uppercase alphabetic characters |
xdigit | Hexadecimal digit characters |
mysql>SELECT 'justalnums' REGEXP '[[:alnum:]]+';-> 1 mysql>SELECT '!!' REGEXP '[[:alnum:]]+';-> 0
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the
beginning and end of words, respectively. A word is a
sequence of word characters that is not preceded by or
followed by word characters. A word character is an
alphanumeric character in the alnum class
or an underscore (_).
mysql>SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]';-> 1 mysql>SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';-> 0
To use a literal instance of a special character in a regular
expression, precede it by two backslash (\) characters. The
MySQL parser interprets one of the backslashes, and the regular
expression library interprets the other. For example, to match
the string 1+2 that contains the special
+ character, only the last of the following
regular expressions is the correct one:
mysql>SELECT '1+2' REGEXP '1+2';-> 0 mysql>SELECT '1+2' REGEXP '1\+2';-> 0 mysql>SELECT '1+2' REGEXP '1\\+2';-> 1