Pages

Wednesday, April 2, 2014

Regular Expressions Examples



These are my notes on regular expressions. The links below are where I got my info from. Remember that some regex engines will handle regular expressions differently.





Regular Expression Tutorial

_________________________________________________________________________________

Examples


ipv4 adderss (one line)
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

good char whitelist for path inputs Linux
^[a-zA-Z0-9\.._/ ]*/$

email address
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,5}\b

US phone numbers, a 7 or 10 digit number, with extensions allowed, delimiters are spaces, dashes, or periods
^(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?$

matches any three-character string ending with "at", like "hat" and "cat".
.at

matches "hat" and "cat".
[hc]at

matches all strings matched by .at except "bat".
[^b]at

matches all strings matched by .at other than "hat" and "cat".
[^hc]at

matches "hat" and "cat", but only at the beginning of the string or line.
^[hc]at

matches "hat" and "cat", but only at the end of the string or line.
[hc]at$

matches any single character surrounded by "[" and "]" since the brackets are escaped, for example: "[a]" and "[b]".
\[.\]

match all characters between 2 strings
example = This is not a good sentence.
(?<=This is)(.*)(?=sentence)

I used look behind (?<=) and look ahead (?=) so that "This is" and "sentence" is
not included in the match, but this is up to your use case,
you can also simply write
This is(.*)sentence.

The important thing here is that you activate the "dotall" mode of your regex
engine, so that the . is matching the newline. But how you do this depends on
your regex engine.

The next thing is if you use .* or .*?. The first one is greedy and will match
till the last "sentence" in your string, the second one is lazy and will match
till the next "sentence" in your string.

______________________________________________________________________________

Metacharacter Summary


Characters other than . $ ^ { [ ( | ) ] } * + ? \ match themselves.

\     escape char
.     any single character
.*    anything
   The preceding item is optional and matched at most once.
   The preceding item will be matched zero or more times.
+     The preceding item will be matched one or more times.
{n}   The preceding item is matched exactly n times.
{n,}  The preceding item is matched n or more times.
{,m}  The preceding item is matched at most m times.
{n,m} The preceding item is matched at least n times, but not more than m times.

________________________________________________________________________________

Metacharacter Description


Characters other than . $ ^ { [ ( | ) ] } * + ? \ match themselves.

. Matches any single character (many applications exclude newlines, and exactly
   which characters are considered newlines is flavor-, character-encoding-,
   and platform-specific, but it is safe to assume that the line feed character
   is included). Within POSIX bracket expressions, the dot character matches a
   literal dot. For example, a.c matches "abc", etc., but [a.c] matches
   only "a", ".", or "c".
 
[ ] A bracket expression. Matches a single character that is contained within
   the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a
   range which matches any lowercase letter from "a" to "z". These forms can be
   mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].
 
The - character is treated as a literal character if it is the last or the first
   (after the ^) character within the brackets: [abc-], [-abc]. Note that
   backslash escapes are not allowed. The ] character can be included in a
   bracket expression if it is the first (after the ^) character: []abc].

[^ ] Matches a single character that is not contained within the brackets.
   For example, [^abc] matches any character other than "a", "b", or "c".
   [^a-z] matches any single character that is not a lowercase letter
   from "a" to "z". Likewise, literal characters and ranges can be mixed.

^ Matches the starting position within the string. In line-based tools, it
   matches the starting position of any line.
 
$ Matches the ending position of the string or the position just before a
   string-ending newline. In line-based tools, it matches the ending position
   of any line.

BRE: \( \)
ERE: ( ) Defines a marked subexpression. The string matched within the
   parentheses can be recalled later (see the next entry, \n). A marked
   subexpression is also called a block or capturing group.

\n Matches what the nth marked subexpression matched, where n is a digit
   from 1 to 9. This construct is theoretically irregular and was not adopted
   in the POSIX ERE syntax. Some tools allow referencing more than nine
   capturing groups.
 
* Matches the preceding element zero or more times. For example,
   ab*c matches "ac", "abc", "abbbc", etc.
   [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on.
   (ab)* matches "", "ab", "abab", "ababab", and so on.
 
BRE: \{m,n\}
ERE: {m,n} Matches the preceding element at least m and not more than n times.
   For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not
   found in a few older instances of regular expressions.

? Matches the preceding element zero or one time. For example,
   ba? matches "b" or "ba".
 
+ Matches the preceding element one or more times. For example,
   ba+ matches "ba", "baa", "baaa", and so on.
 
| The choice (also known as alternation or set union) operator matches
   either the expression before or the expression after the operator. example,
   abc|def matches "abc" or "def".





____________________________________________________________________________

Thanks for reading my post. You can find me at any the links below. 

No comments:

Post a Comment