You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Current »

The accepted full regular expression syntax is described here:

Characters

unicodeChar          Matches any identical unicode character
\                    Used to quote a meta-character (like '*')
\\                   Matches a single '\' character
\0nnn                Matches a given octal character
\xhh                 Matches a given 8-bit hexadecimal character
\\uhhhh              Matches a given 16-bit hexadecimal character
\t                   Matches an ASCII tab character
\n                   Matches an ASCII newline character
\r                   Matches an ASCII return character
\f                   Matches an ASCII form feed character

Character Classes

[abc]                Simple character class
[a-zA-Z]             Character class with ranges
[^abc]               Negated character class

NOTE: Incomplete ranges will be interpreted as "starts from zero" or "ends with last character".
I.e. [-a] is the same as [\\u0000-a], and [a-] is the same as [a-\\uFFFF], [-] means "all characters".

Standard POSIX Character Classes

[:alnum:]            Alphanumeric characters.
[:alpha:]            Alphabetic characters.
[:blank:]            Space and tab characters.
[:cntrl:]            Control characters.
[:digit:]            Numeric characters.
[:graph:]            Characters that are printable and are also visible.
                         (A space is printable, but not visible, while an
                         `a' is both.)
[:lower:]            Lower-case alphabetic characters.
[:print:]            Printable characters (characters that are not
                         control characters.)
[:punct:]            Punctuation characters (characters that are not letter,
                         digits, control characters, or space characters).
[:space:]            Space characters (such as space, tab, and formfeed,
                         to name a few).
[:upper:]            Upper-case alphabetic characters.
[:xdigit:]           Characters that are hexadecimal digits.

Non-standard POSIX-style Character Classes

[:javastart:]        Start of a Java identifier
[:javapart:]         Part of a Java identifier

Predefined Classes

.         Matches any character other than newline
\w        Matches a "word" character (alphanumeric plus "_")
\W        Matches a non-word character
\s        Matches a whitespace character
\S        Matches a non-whitespace character
\d        Matches a digit character
\D        Matches a non-digit character

Boundary Matchers

^         Matches only at the beginning of a line
$         Matches only at the end of a line
\b        Matches only at a word boundary
\B        Matches only at a non-word boundary

Greedy Closures

A*        Matches A 0 or more times (greedy)
A+        Matches A 1 or more times (greedy)
A?        Matches A 1 or 0 times (greedy)
A{n}      Matches A exactly n times (greedy)
A{n,}     Matches A at least n times (greedy)
A{n,m}    Matches A at least n but not more than m times (greedy)

Reluctant Closures

A*?       Matches A 0 or more times (reluctant)
A+?       Matches A 1 or more times (reluctant)
A??       Matches A 0 or 1 times (reluctant)

Logical Operators

AB        Matches A followed by B
A|B       Matches either A or B
(A)       Used for subexpression grouping
(?:A)      Used for subexpression clustering (just like grouping but
              no backrefs)

Backreferences
There are two different backreferences for regular expression and replacement string.

   In a regular expression:
\1    Backreference to 1st  parenthesized subexpression
\2    Backreference to 2nd parenthesized parenthesized subexpression
\3    Backreference to 3rd parenthesized subexpression
\4    Backreference to 4th  parenthesized subexpression
\5    Backreference to 5th  parenthesized subexpression
\6    Backreference to 6th  parenthesized subexpression
\7    Backreference to 7th  parenthesized subexpression
\8    Backreference to 8th  parenthesized subexpression
\9    Backreference to 9th  parenthesized subexpression

       In a replacement string:
$1   Backreference to 1st parenthesized group from the search string
$2   Backreference to 2nd parenthesized group from the search string
$3   Backreference to 3rd parenthesized group from the search string
$4   Backreference to 4th parenthesized group from the search string 
$5   Backreference to 5th parenthesized group from the search string
$6   Backreference to 6th parenthesized group from the search string
$7   Backreference to 7th parenthesized group from the search string
$8   Backreference to 8th parenthesized group from the search string
$9   Backreference to 9th parenthesized group from the search string 

All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a '?'. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don't currently support reluctancy.

Line terminators
A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:

A newline (line feed) character ('\n'),
A carriage-return character followed immediately by a newline character ("\r\n"),
A standalone carriage-return character ('\r'),
A next-line character ('?'),
A line-separator character ('?'), or
A paragraph-separator character ('?).




  • No labels