Home > Linux, Webmaster > Regular expression

Regular expression

Regular expression( regex ) introduction.

Regular Expression Editor with links to to other editors

The pattern like [^abc] says, match any character not included in the set, abc, where abc is a set of characters and  ^  means not.

Obviously [abc]  match only a or b or c.

So [ ]  is a one character window.

That set can be explicitly defined, or defined by character classes.

[a-c] or [a-h]

Quantifier is necessary to increase occurences:  ?, *, +,  {n}      – ?,*,+ refer to  previous subpattern

?   zero or one character  ( and it applies to the preceding atom in the regex )

*   zero or any sequence of characters  ( of the prior subpattern)

+  one or  any sequence of characters

{n}   repeat n times

{n,m} repeat from n to m times

To check a postal code it’s possible to use something like  [0-9] {5}

There are some predefined character classes ( alias ) in the regular expression

So postal code can be represented  by \d{5}

    the dot matches any character except the newline by default (newline can be different in different platform).  Like [^\n]

\d   (d)igit which is equivalent to [0-9]
\D   which is equivalent to [^0-9]  ( that is  ^\d)

\w   represents “word characters” (digits, underscore and letters) [a-zA-Z0-9_]
\W   represents “non word characters”   ( that is  ^\w)

\s    contains whitespace characters like space, tab, newline, carriage return [\t\r\n]
\S    contains non-whitespace characters    ( that is  ^\s)

.*    match any sequence of characters ( null string included )


     escape to mean literal value for any non-alpha-numeric character  like .,  / or ?
so  \., \/,  \?  others  \+,  \*

Grouping, subpattern

To check a web domain that is something like   hpc2.eurotech.com

(” The characters allowed in a label are a subset of the ASCII character set, and includes the characters a through zA through Z, digits 0 through 9, and the hyphen. This rule is known as the LDH rule (letters, digits, hyphen). Domain names are interpreted in case-independent manner. Labels may not start or end with a hyphen”  from Wikipedia )

So it is possible to use




with less  characters
but  \w  include _ that is not permitted



with even less characters  (but  \w  include _ that is not permitted)

( )  to grouping and create subpattern

(.*)       match any sequence of characters ( null string included )

([^/]+)   any not null sequence of characters without /

^  means  NOT  but when at the beginning it is a boundary  and is used to start regex espression (usually ^ means NOT when  it is  in [] brakets ) .

$   means end of  regex string

so  ^m   string that begins with m    $m  text that ends with  m      m  any  m

\b is the word boudary   ex.   \b(max|min)\b

 ?!      when ?! appears as the first two characters within parens, that denotes what is known as a “non-capturing negative lookahead.”

(?i)    ignore case

or  operator 


So domains name are restricted to com or info or biz

|   like logic operator OR

attention to greediness

* is a greedy operator  so  to match only a tag we have to use

<.*?>   not   <.*>

\Q .. \E quoting substring

\Qabc$xyz\E abc$xyz
\Qabc\$xyz\E  abc\$xyz
\Qabc\E\$\Qxyz\E  abc$xyz

Useful summary and examples applied to urls

^   beginning-of-line (except when used within a range  = not)

$   end-of-line

(https?)     () capturing group.   The ?  matches: http or https.   ?  means zero or more ‘s

[^/]+     one or more characters, none of which are slash.

[^\.]+     one or more characters, none of which are dot

[^/\.]+\.jpg   matches one or more characters, none of which are dot or slash, followed by .jpg

^/products/?$      matches  /products or /products/  – with or without a trailing slash

\.html?    matches either .htm or .html

^/(?!index\.aspx)(.*)$    any URL that does not begin with index.aspx

^/([^/]+)/([^/]*)(?<!\.aspx)$       The (?<!\.aspx) that ends the pattern is a non-capturing negative look-behind subpattern, which says “the string must not end in .aspx“.

((?:en|fi)[0-9]{2})   The sequence ?: – when it appears as the first two characters within parens – makes the group a non-capturing group.

^(https?://www\.siagri\.net/)     This pattern can be used in a RewriteCond applied against HTTP_REFERER to prevent image leaching.

^(?!https?://www\.siagri\.net/)  This matches the opposite of the prior example: anything that is neither http://www.siagri.net nor https://www.siagri.net . ( ndr. IIRF This pattern can be used in a RewriteCond applied to the HTTP_HOST to rewrite if the hostname is NOT matched by the pattern.)

^(?!www.)([^\.]+)\.siagri\.net    matches any hostname in the siagri.net domain, except for ‘www.siagri.net’.

(https?)://([^/]+)(/([^\?]+(\?(.*))?)?)?      https://eurotech.com/      three groups    ->    1 https ;  2 eurotech.com; 3 /

(https?)://([^/]+)(/([^\?]+(\?(.*))?)?)?      http://eurotech.com/a/b/c.aspx?p1=foo      three groups    ->    1 https ;  2 eurotech.com;   3 /a/b/c.aspx?p1=foo;  4 a/b/c.aspx?p1=foo;    5 ?p1=foo;  6 p1=foo;

Examples of Regular Expressions from  Example of Regular Expression

IP Address Regexp


MAC Address Regexp


Domain Name Regexp


Windows File Name Regexp


Float Number Regexp


Roman Number Regexp


Date in format yyyy-MM-dd

(19|20)\d\d([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])


  1. Non c'è ancora nessun commento.
  1. No trackbacks yet.

Lascia un commento

Inserisci i tuoi dati qui sotto o clicca su un'icona per effettuare l'accesso:

Logo WordPress.com

Stai commentando usando il tuo account WordPress.com. Chiudi sessione / Modifica )

Foto Twitter

Stai commentando usando il tuo account Twitter. Chiudi sessione / Modifica )

Foto di Facebook

Stai commentando usando il tuo account Facebook. Chiudi sessione / Modifica )

Google+ photo

Stai commentando usando il tuo account Google+. Chiudi sessione / Modifica )

Connessione a %s...

%d blogger cliccano Mi Piace per questo: