v1.31, 2006.05.01 by Robert Giordano
[agk] matches any one a, g, or k
[a-z] matches any one character from a to z
[^z] matches any character other than z
[\\(\\)] matches ( or ) (in javascript, the escape slash must be escaped!)
. any character except \n
\w any word character, same as [a-zA-Z0-9_]
\W any non-word character
\s any whitespace character, same as [ \t\n\r\f\v]
\S any non-whitespace character
\d any digit
\D any non-digit
\/ literal /
\\ literal \
\. literal .
\* literal *
\+ literal +
\? literal ?
\| literal |
\( literal (
\) literal )
\[ literal [
\] literal ]
\- the - must be escaped inside brackets: [a-z0-9 _.\-\?!]
{n,m} match previous item n to m times
{n,} match previous item n or more times
{n} match exactly n times
? match zero or once, same as {0,1}, also makes + and * "lazy"
+ match one or more
* match zero or more
| or
(x|y) match x or y, inclusive (all x and y will be replaced)
( ) grouping and reference
\1 reference to first grouping, used in the expression
$1 reference to first grouping, used in the replacement string
$$ literal $ used in the replacement string
^ anchor to the beginning of the string
$ anchor to the end of the string
\b match a word boundary (does not include the boundary)
\B match a non word boundary (does not include the boundary)
q(?=u) match q only before u (does not match the u)
q(?!u) match q except before u
i case-insensitive search, used like /expression/i
g global replacement, used like /expression/g
/\b(gr[ae]y)/ig,"GRay"
Replaces any "gray" or "grey" with "GRay".
First, \b is used to prevent replacement of "gray" within the word
"stingray". Having no \b at the end allows replacement of "gray" in "grayish".
Next, [ae] allows for alternate spellings. Finally, i and
g specify a case-insensitive and global search and replace.
/\b(cat|dog)\b/ig,"pet"
Replaces any "cat" or "dog" with "pet".
A \b is used at both the beginning and end so that only the words "cat" or "dog"
by themselves will be replaced. The | is inclusive, meaning that if both words
appear in a string, they will both be replaced. If the g at the end if left
out, only the first occurance of either word will be replaced.
/(a|e|i|o|u)/ig,"[$1]" or
/([aeiou])/ig,"[$1]"
These two expressions put square brackets around every vowel in a string. $1
in the replacement string is replaced with what was found in the parenthesis. The first example
uses | as "OR" and the second example uses bracket notation.
/feb(ruary)?\s*(\d+)\s*\w*/i,"Feb $2"
Change a number of different input styles to a single style.
Looking at the expression from left to right, feb(ruary)? will match feb or
february; \s* matches 0 or more spaces; (\d+) matches 1 or more digits;
\s* matches 0 or more spaces; \w* matches 0 or more word characters.
The entire match is replaced with "Feb " plus the contents of the second parenthesis,
which is the digits, (\d+).
/( ){2,}/g," "
Replaces 2 or more consecutive spaces with a single space.
/^\s+|\s+$/g,""
Trim white space from both ends of a string. First, the ^ and $ anchor the
search to both ends of the string. Next, one or more whitespace chars at the
beginning OR one or more whitespace chars at the end are replaced with "".
The | means inclusive OR, that is any or all is replaced. Whitespace
characters in the middle are not affected.
/\b(is)(?!\s+not)\b/ig,"REALLY $1"
*** Default Example ***
Replaces "is" with "REALLY is", unless it says "is not" and also keeps the
case of "is". The expression starts with \b so "is" in the middle of words
like "lavish" is not replaced. Parenthesis around the search term (is)
allows it to be used in the replacement string so its case is preserved.
(?!\s+not) means, do not match if what has matched so far is followed by 1 or
more spaces and "not". Finally \b ensures that "is" in the beginning of a
word like "Istanbul" is not replaced.
/(.+)((\r?\n|\r)\1)+\b/ig,"$1"
Removes duplicate lines from a list. The (.+) grabs a line of text and the
parenthesis save it for a reference. The (\r?\n|\r) grabs the line separator,
either \r\n, \n, or \r. Next, \1 references the first line and so
((\r?\n|\r)\1)+
matches 1 or more subsequent lines that match the first line. Notice that in
Javascript, a reference within the expression is \1 while a reference in the
replacement string is $1. The \b prevents "street" and "streets"
from being seen as the same word.
/\b(\w{4})\s?(\b|(\W))/ig,"$3"
Removes all four letter words and a trailing space if present. First, \b finds a
word boundary. Next, (\w{4}) finds 4 letter words. The parenthesis save the word if
needed later in the replacement string. \s? finds an optional trailing space.
(\b|(\W)) finds the end of the word. This is necessary or the first 4 letters of
longer words would be found. The word can end in \b OR \W and the
\W is saved and used in the replacement string so we don't lose punctuation
at the end of sentences. $3 is the third set of parenthesis.
/(\S)(\1){6,}/ig,"$1$1$1"
Any character appearing more than 6 times in a row is replaced with exactly 3 of those
characters. (\S) find any non-whitespace character and saves it for a reference.
(\1){6,} finds 6 or more of the first character. 7 or more of the same character in
a row is replaced with $1$1$1. For compatibility with IE 5.2 Macintosh,
\1 must be in parenthesis when used with {6,}
/\s*([{};+=-\\(\\)])\s*/g,"$1"
BUG ???
This expression should remove white space on either side of {};+=-(). ( and ) have to
be escaped and the escape characters must be escaped in javascript. If you enter
"abc DEF ghi jkl" as input, you'll get "abcDEFghi jkl" as output. What the heck? Anyway,
better to use /\s*([{};+=-])\s*/g,"$1" followed by
/\s*(\(|\))\s*/g,"$1" as a separate command. Notice
how ( and ) don't have to be escaped twice when used like in the previous example.
/[^a-z]*([a-z]+)[^a-z]*/ig,"$1\n"
Create a word list from a document. Case is ignored and preserved. The next step would
be to sort the list, then use another regexp to remove duplicates.
/((\w+\s+){5})[\s\S]*/i,"$1"
Trims a string to only the first 5 words. ((\w+\s+){5}) finds the first 5 words
and the outer parenthesis saves them for later. [\s\S]* finds the rest of the
string, which is not saved and so disappears. Note: If you are using the .search
method instead of .replace, you only need (\w+\s+){5}.
Greedy vs. Lazy
If you want to remove pairs of html <span> tags and any text between them, you might be tempted
to use an expression like this:
/<span[^>]*>[\s\S]+<\/span>/ig,""
Greedy
The above expression will work fine, but if you have something like:
See <span>tall</span> trees in the <span>large</span> park.
You'll get:
See park.
That's because the + is "greedy" and searches from the end of the string backwards.
Everything from the FIRST span tag to the LAST span tag is removed. To make the + "lazy",
add a ? after it like so:
/<span[^>]*>[\s\S]+?<\/span>/ig,""
Lazy but Slow
This solution is not efficient because the regex engine still searches backwards and
extra cpu cycles are lost. An even better solution is to use a negated character
class like this:
/<span[^>]*>[^>]+<\/span>/ig,""
Lazy and Fast!
(The only time this breaks is if there is a > between the two tags.)
bookmark and share:
26 Jan 2006 1:53pm
"Find all words containing ANY of the given letters:
/(\b\w*[vwxyz]+\w*\b)/ig,"($1)""
08 Nov 2006 7:16am
"If you wish to use the above with PHP then use ereg_replace function see below.
$thecaption = "Test (10)";
$pattern = ' [(][0-9]+[)]';
$replacement = '';
$thecaption = ereg_replace($pattern, $replacement, thecaption);
echo $thecaption; //will return "Test""
13 Feb 2007 12:23am
"Marvellous! I used to search many sites, but i have never shown any interest in giving any feedback... I really would like to appreciate this work. Simple, easy to learn and get practised with... Wonderfull Job. Keep it up!"
23 May 2007 3:19pm
"Thank YOU! New to javascript... your site is a lifesaver. (It's so obnoxious having to load code, load the page, test the page, then go back and try to figure out where my errors are - your site is a permanent bookmark for me!)"
12 Jun 2007 9:28am
"I stumbled upon this tool and loved it so much I added it to del.icio.us. When I went to search my bookmarks I realized it wasn't there. Stupid me. I forgot to tag it regular expression instead i tagged it regexp which never crossed my mine. To put it short, i love this tool. All the others are too complicated to wrap my head around. the fact that you provide examples is genius! Keep up the good work!"
18 Aug 2007 4:05pm
"Nice tool! Thanks for making it available! :)"
30 Aug 2007 9:34am
"Great tool, I can get a lot out of this!"
25 Oct 2007 11:31pm
"i totally used this tool to help me do my homework!!! thanks"
14 Feb 2008 5:22am
"This tool is invaluable for testing and practicing regular expressions necessary to our daily operations. Thank you."
17 Feb 2008 10:59pm
"Hey, This is really useful article on java regular expression. I am new to this part. I was confused for $1, $2. And this article solved my doubt. Thanks once again. Regards, Deepali"
14 Apr 2008 10:44pm
"Many many thanks for sharing this useful tool."
21 Dec 2008 7:33pm
"Do you have any examples using the "exec" method in AS3? Trying to use it, but it's only showing the match in the textfield output one time, as opposed to the traced output. Just wondering..."
08 Jan 2009 2:01pm
"really useful info all over the site... readable compact tutorials in cheatsheet style... goes into bookmarks as we speak.. peace"
21 Feb 2009 5:43am
"you know what?? i use this for almost 2 years and running"