Regular Expressions
1. Online Testing
https://www.tutorialspoint.com/compile_java_online.php
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static final String REGEX = “^[a-zA-Z0–9!@#$&()\\-`_.+,/\”]*$”;
private static final String INPUT = “vt_user”;
public static void main( String args[] ) {
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
if (m.matches()) System.out.println(“Match “);
else System.out.println(“Not Match “);
}
}
2. Notation
There are many different syntaxes for regular expressions, but in general you will see that:
- Most characters stand for themselves
- Certain characters, called metacharacters, have special meaning and must be escaped (usually with
\
if you want to use them as characters. In most syntaxes the metacharacters are: ( ) [ ] { } ^ $ . \ ? * + | - Within square brackets, you only have to escape (1) an initial
^
, (2) a non-initial or non-final-
, (3) a non-initial]
, and (4) a\
.
3. Using Regular Expressions
Many languages allow programmers to define regexes and then use them to:
- Validate that a piece of text (or a portion of that text) matches some pattern
- Find fragments of some text that match some pattern
- Extract fragments of some text
- Replace fragments of text with other text
4. Basic Examples
Rather than start with technical details, we’ll start with a bunch of examples.
RegexMatches any string that
hello
contains {hello}
gray|grey
contains {gray, grey}
gr(a|e)y
contains {gray, grey}
gr[ae]y
contains {gray, grey}
b[aeiou]bble
contains {babble, bebble, bibble, bobble, bubble}
[b-chm-pP]at|ot
contains {bat, cat, hat, mat, nat, oat, pat, Pat, ot}
colou?r
contains {color, colour}
rege(x(es)?|xps?)
contains {regex, regexes, regexp, regexps}
go*gle
contains {ggle, gogle, google, gooogle, goooogle, ...}
go+gle
contains {gogle, google, gooogle, goooogle, ...}
g(oog)+le
contains {google, googoogle, googoogoogle, googoogoogoogle, ...}
z{3}
contains {zzz}
z{3,6}
contains {zzz, zzzz, zzzzz, zzzzzz}
z{3,}
contains {zzz, zzzz, zzzzz, ...}
[Bb]rainf\*\*k
contains {Brainf**k, brainf**k}
\d
contains {0,1,2,3,4,5,6,7,8,9}
\d{5}(-\d{4})?
contains a United States zip code
1\d{10}
contains an 11-digit string starting with a 1
[2-9]|[12]\d|3[0-6]
contains an integer in the range 2..36 inclusive
Hello\nworld
contains Hello followed by a newline followed by world
mi.....ft
contains a nine-character (sub)string beginning with mi and ending with ft (Note: depending on context, the dot stands either for “any character at all” or “any character except a newline”.) Each dot is allowed to match a different character, so both microsoft and minecraft will match.
\d+(\.\d\d)?
contains a positive integer or a floating point number with exactly two characters after the decimal point.
[^i*&2@]
contains any character other than an i, asterisk, ampersand, 2, or at-sign.
//[^\r\n]*[\r\n]
contains a Java or C# slash-slash comment
^dog
begins with "dog"
dog$
ends with "dog"
^dog$
is exactly "dog"
5. Grammar
- Anchors — ^ and $
^The matches any string that starts with
end$ matches a string that ends with end
^The end$ exact string match (starts and ends with The end)
roar matches any string that has the text roar in it
- Quantifiers — * + ? and {}
abc* matches a string that has ab followed by zero or more c
abc+ matches a string that has ab followed by one or more c
abc? matches a string that has ab followed by zero or one c
abc{2} matches a string that has ab followed by 2 c
abc{2,} matches a string that has ab followed by 2 or more c
abc{2,5} matches a string that has ab followed by 2 up to 5 c
a(bc)* matches a string that has a followed by zero or more copies of the sequence bc
a(bc){2,5} matches a string that has a followed by 2 up to 5 copies of the sequence bc
- OR operator — | or []
a(b|c) matches a string that has a followed by b or c (and captures b or c) -> Try it!
a[bc] same as previous, but without capturing b or c
- Character classes — \d \w \s and .
\d matches a single character that is a digit -> Try it!\w matches a word character (alphanumeric character plus underscore) -> Try it!
\s matches a whitespace character (includes tabs and line breaks)
. matches any character
6. Common Ex:
"^[a-zA-Z0-9!@#$&()\\-`.+,/\"]*$"
- Valid email (RFC5322) — try it!
\b[\w.!#$%&’*+\/=?^`{|}~-]+@[\w-]+(?:\.[\w-]+)*\b
Matches an email address like john.doe@my-domain.com inside text
valid-email@email.com
not!valid@#email.com
- Username (simple) — try it!
Minimum length of 3, maximum length of 16, composed by letters, numbers or dashes.
/^[a-z0–9_-]{3,16}$/
- Strong password — try it!
(?=^.{6,}$)((?=.*\w)(?=.*[A-Z])(?=.*[a-z])(?=.*[0–9])(?=.*[|!”$%&\/\(\)\?\^\’\\\+\-\*]))^.*
my valid 2passwoRd+
my missing number passwoRd+
MY MISSING LOWERCASE LETTER 2PASSWORD+
my missing uppercase letter 2passwod+
my missing special character 2passwoRd
not valid password
7. Ref
https://cs.lmu.edu/~ray/notes/regex/
https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285