Page History
Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
The accepted full regular expression syntax is described here:
Characters
No Format |
---|
unicodeChar Matches any identical unicode character
\ Used to quote a meta-character (like '*')
\\ Matches a single '\' character
\0nnn Matches a given octal character
\xhh Matches a given 8-bit hexadecimal character
\\uhhhh Matches a given 16-bit hexadecimal character
\t Matches an ASCII tab character
\n Matches an ASCII newline character
\r Matches an ASCII return character
\f Matches an ASCII form feed character |
Character Classes
No Format |
---|
[abc] Simple character class
[a-zA-Z] Character class with ranges
[^abc] Negated character class
NOTE: Incomplete ranges will be interpreted as "starts from zero" or "ends with last character".
I.e. [-a] is the same as [\\u0000-a], and [a-] is the same as [a-\\uFFFF], [-] means "all characters". |
Standard POSIX Character Classes
No Format |
---|
[:alnum:] Alphanumeric characters.
[:alpha:] Alphabetic characters.
[:blank:] Space and tab characters.
[:cntrl:] Control characters.
[:digit:] Numeric characters.
[:graph:] Characters that are printable and are also visible.
(A space is printable, but not visible, while an
`a' is both.)
[:lower:] Lower-case alphabetic characters.
[:print:] Printable characters (characters that are not
control characters.)
[:punct:] Punctuation characters (characters that are not letter,
digits, control characters, or space characters).
[:space:] Space characters (such as space, tab, and formfeed,
to name a few).
[:upper:] Upper-case alphabetic characters.
[:xdigit:] Characters that are hexadecimal digits. |
Non-standard POSIX-style Character Classes
No Format |
---|
[:javastart:] Start of a Java identifier
[:javapart:] Part of a Java identifier |
Predefined Classes
No Format |
---|
. Matches any character other than newline
\w Matches a "word" character (alphanumeric plus "_")
\W Matches a non-word character
\s Matches a whitespace character
\S Matches a non-whitespace character
\d Matches a digit character
\D Matches a non-digit character |
Boundary Matchers
No Format |
---|
^ Matches only at the beginning of a line
$ Matches only at the end of a line
\b Matches only at a word boundary
\B Matches only at a non-word boundary |
Greedy Closures
No Format |
---|
A* Matches A 0 or more times (greedy)
A+ Matches A 1 or more times (greedy)
A? Matches A 1 or 0 times (greedy)
A{n} Matches A exactly n times (greedy)
A{n,} Matches A at least n times (greedy)
A{n,m} Matches A at least n but not more than m times (greedy) |
Reluctant Closures
No Format |
---|
A*? Matches A 0 or more times (reluctant)
A+? Matches A 1 or more times (reluctant)
A?? Matches A 0 or 1 times (reluctant) |
Logical Operators
No Format |
---|
AB Matches A followed by B
A|B Matches either A or B
(A) Used for subexpression grouping
(?:A) Used for subexpression clustering (just like grouping but
no backrefs) |
Backreferences
There are two different backreferences for regular expression and replacement string.
No Format |
---|
In a regular expression:
\1 Backreference to 1st parenthesized subexpression
\2 Backreference to 2nd parenthesized parenthesized subexpression
\3 Backreference to 3rd parenthesized subexpression
\4 Backreference to 4th parenthesized subexpression
\5 Backreference to 5th parenthesized subexpression
\6 Backreference to 6th parenthesized subexpression
\7 Backreference to 7th parenthesized subexpression
\8 Backreference to 8th parenthesized subexpression
\9 Backreference to 9th parenthesized subexpression
In a replacement string:
$1 Backreference to 1st parenthesized group from the search string
$2 Backreference to 2nd parenthesized group from the search string
$3 Backreference to 3rd parenthesized group from the search string
$4 Backreference to 4th parenthesized group from the search string
$5 Backreference to 5th parenthesized group from the search string
$6 Backreference to 6th parenthesized group from the search string
$7 Backreference to 7th parenthesized group from the search string
$8 Backreference to 8th parenthesized group from the search string
$9 Backreference to 9th parenthesized group from the search string |
All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a '?'. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don't currently support reluctancy.
Line terminators
A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:
No Format |
---|
A newline (line feed) character ('\n'),
A carriage-return character followed immediately by a newline character ("\r\n"),
A standalone carriage-return character ('\r'),
A next-line character ('?'),
A line-separator character ('?'), or
A paragraph-separator character ('?). |