Giang Trung
3 min readAug 14, 2020

--

Regular Expressions

1. Online Testing

https://www.tutorialspoint.com/compile_java_online.php

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {

private static final String REGEX = “^[a-zA-Z0–9!@#$&()\\-`_.+,/\”]*$”;
private static final String INPUT = “vt_user”;

public static void main( String args[] ) {
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object

if (m.matches()) System.out.println(“Match “);
else System.out.println(“Not Match “);
}
}

2. Notation

There are many different syntaxes for regular expressions, but in general you will see that:

  • Most characters stand for themselves
  • Certain characters, called metacharacters, have special meaning and must be escaped (usually with \ if you want to use them as characters. In most syntaxes the metacharacters are: ( ) [ ] { } ^ $ . \ ? * + |
  • Within square brackets, you only have to escape (1) an initial ^, (2) a non-initial or non-final -, (3) a non-initial ], and (4) a \.

3. Using Regular Expressions

Many languages allow programmers to define regexes and then use them to:

  • Validate that a piece of text (or a portion of that text) matches some pattern
  • Find fragments of some text that match some pattern
  • Extract fragments of some text
  • Replace fragments of text with other text

4. Basic Examples

Rather than start with technical details, we’ll start with a bunch of examples.

RegexMatches any string that

hellocontains {hello}

gray|greycontains {gray, grey}

gr(a|e)ycontains {gray, grey}

gr[ae]ycontains {gray, grey}

b[aeiou]bblecontains {babble, bebble, bibble, bobble, bubble}

[b-chm-pP]at|otcontains {bat, cat, hat, mat, nat, oat, pat, Pat, ot}

colou?rcontains {color, colour}

rege(x(es)?|xps?)contains {regex, regexes, regexp, regexps}

go*glecontains {ggle, gogle, google, gooogle, goooogle, ...}

go+glecontains {gogle, google, gooogle, goooogle, ...}

g(oog)+lecontains {google, googoogle, googoogoogle, googoogoogoogle, ...}

z{3}contains {zzz}

z{3,6}contains {zzz, zzzz, zzzzz, zzzzzz}

z{3,}contains {zzz, zzzz, zzzzz, ...}

[Bb]rainf\*\*kcontains {Brainf**k, brainf**k}

\dcontains {0,1,2,3,4,5,6,7,8,9}

\d{5}(-\d{4})?contains a United States zip code

1\d{10}contains an 11-digit string starting with a 1

[2-9]|[12]\d|3[0-6]contains an integer in the range 2..36 inclusive

Hello\nworldcontains Hello followed by a newline followed by world

mi.....ftcontains a nine-character (sub)string beginning with mi and ending with ft (Note: depending on context, the dot stands either for “any character at all” or “any character except a newline”.) Each dot is allowed to match a different character, so both microsoft and minecraft will match.

\d+(\.\d\d)?contains a positive integer or a floating point number with exactly two characters after the decimal point.

[^i*&2@]contains any character other than an i, asterisk, ampersand, 2, or at-sign.

//[^\r\n]*[\r\n]contains a Java or C# slash-slash comment

^dogbegins with "dog"

dog$ends with "dog"

^dog$is exactly "dog"

5. Grammar

  • Anchors — ^ and $
^The        matches any string that starts with 
end$
matches a string that ends with end
^The end$ exact string match
(starts and ends with The end)
roar matches any string that has the text roar in it
  • Quantifiers — * + ? and {}
abc*        matches a string that has ab followed by zero or more c 
abc+ matches a string that has ab followed by one or more c
abc? matches a string that has ab followed by zero or one c
abc{2} matches a string that has ab followed by 2 c
abc{2,} matches a string that has ab followed by 2 or more c
abc{2,5} matches a string that has ab followed by 2 up to 5 c
a(bc)* matches a string that has a followed by zero or more copies of the sequence bc
a(bc){2,5} matches a string that has a followed by 2 up to 5 copies of the sequence bc
  • OR operator — | or []
a(b|c)     matches a string that has a followed by b or c (and captures b or c) -> Try it!
a[bc]
same as previous, but without capturing b or c
  • Character classes — \d \w \s and .
\d         matches a single character that is a digit -> Try it!\w         matches a word character (alphanumeric character plus underscore) -> Try it!
\s
matches a whitespace character (includes tabs and line breaks)
. matches any character

6. Common Ex:

"^[a-zA-Z0-9!@#$&()\\-`.+,/\"]*$"

\b[\w.!#$%&’*+\/=?^`{|}~-]+@[\w-]+(?:\.[\w-]+)*\b

Matches an email address like john.doe@my-domain.com inside text
valid-email@email.com
not!valid@#email.com

  • Username (simple) — try it!

Minimum length of 3, maximum length of 16, composed by letters, numbers or dashes.

/^[a-z0–9_-]{3,16}$/

(?=^.{6,}$)((?=.*\w)(?=.*[A-Z])(?=.*[a-z])(?=.*[0–9])(?=.*[|!”$%&\/\(\)\?\^\’\\\+\-\*]))^.*

my valid 2passwoRd+
my missing number passwoRd+
MY MISSING LOWERCASE LETTER 2PASSWORD+
my missing uppercase letter 2passwod+
my missing special character 2passwoRd
not valid password

7. Ref

https://cs.lmu.edu/~ray/notes/regex/

https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285

--

--