Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The accepted full regular expression syntax is described here:

Image Modified

Characters

No Format
unicodeChar          Matches any identical unicode character
\                    Used to quote a meta-character (like '*')
\\                   Matches a single '\' character
\0nnn                Matches a given octal character
\xhh                 Matches a given 8-bit hexadecimal character
\\uhhhh              Matches a given 16-bit hexadecimal character
\t                   Matches an ASCII tab character
\n                   Matches an ASCII newline character
\r                   Matches an ASCII return character
\f                   Matches an ASCII form feed character

Character Classes

No Format
[abc]                Simple character class
[a-zA-Z]             Character class with ranges
[^abc]               Negated character class

NOTE: Incomplete ranges will be interpreted as "starts from zero" or "ends with last character".
I.e. [-a] is the same as [\\u0000-a], and [a-] is the same as [a-\\uFFFF], [-] means "all characters".

Standard POSIX Character Classes

No Format
[:alnum:]            Alphanumeric characters.
[:alpha:]            Alphabetic characters.
[:blank:]            Space and tab characters.
[:cntrl:]            Control characters.
[:digit:]            Numeric characters.
[:graph:]            Characters that are printable and are also visible.
                         (A space is printable, but not visible, while an
                         `a' is both.)
[:lower:]            Lower-case alphabetic characters.
[:print:]            Printable characters (characters that are not
                         control characters.)
[:punct:]            Punctuation characters (characters that are not letter,
                         digits, control characters, or space characters).
[:space:]            Space characters (such as space, tab, and formfeed,
                         to name a few).
[:upper:]            Upper-case alphabetic characters.
[:xdigit:]           Characters that are hexadecimal digits.

Non-standard POSIX-style Character Classes

No Format
[:javastart:]        Start of a Java identifier
[:javapart:]         Part of a Java identifier

Predefined Classes

No Format
.         Matches any character other than newline
\w        Matches a "word" character (alphanumeric plus "_")
\W        Matches a non-word character
\s        Matches a whitespace character
\S        Matches a non-whitespace character
\d        Matches a digit character
\D        Matches a non-digit character

Boundary Matchers

No Format
^         Matches only at the beginning of a line
$         Matches only at the end of a line
\b        Matches only at a word boundary
\B        Matches only at a non-word boundary

Greedy Closures

No Format
A*        Matches A 0 or more times (greedy)
A+        Matches A 1 or more times (greedy)
A?        Matches A 1 or 0 times (greedy)
A{n}      Matches A exactly n times (greedy)
A{n,}     Matches A at least n times (greedy)
A{n,m}    Matches A at least n but not more than m times (greedy)

Reluctant Closures

No Format
A*?       Matches A 0 or more times (reluctant)
A+?       Matches A 1 or more times (reluctant)
A??       Matches A 0 or 1 times (reluctant)

Logical Operators

No Format
AB        Matches A followed by B
A|B       Matches either A or B
(A)       Used for subexpression grouping
(?:A)      Used for subexpression clustering (just like grouping but
              no backrefs)

Backreferences
There are two different backreferences for regular expression and replacement string.

No Format
   In a regular expression:
\1    Backreference to 1st  parenthesized subexpression
\2    Backreference to 2nd parenthesized parenthesized subexpression
\3    Backreference to 3rd parenthesized subexpression
\4    Backreference to 4th  parenthesized subexpression
\5    Backreference to 5th  parenthesized subexpression
\6    Backreference to 6th  parenthesized subexpression
\7    Backreference to 7th  parenthesized subexpression
\8    Backreference to 8th  parenthesized subexpression
\9    Backreference to 9th  parenthesized subexpression

       In a replacement string:
$1   Backreference to 1st parenthesized group from the search string
$2   Backreference to 2nd parenthesized group from the search string
$3   Backreference to 3rd parenthesized group from the search string
$4   Backreference to 4th parenthesized group from the search string 
$5   Backreference to 5th parenthesized group from the search string
$6   Backreference to 6th parenthesized group from the search string
$7   Backreference to 7th parenthesized group from the search string
$8   Backreference to 8th parenthesized group from the search string
$9   Backreference to 9th parenthesized group from the search string 

All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a '?'. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don't currently support reluctancy.

Line terminators
A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:

No Format
A newline (line feed) character ('\n'),
A carriage-return character followed immediately by a newline character ("\r\n"),
A standalone carriage-return character ('\r'),
A next-line character ('?'),
A line-separator character ('?'), or
A paragraph-separator character ('?).