email Regular expression

Di seguito alcune regex per la validazione di un indirizzo email:

1.   la regex  del consorzio w3c per il type email

<input type=“email” placeholder=“Enter your email” />

^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$

2.  una vecchia regex per email con specificati i domini di primo livello accettati e che non accetta le maiuscole

"^((([a-z]|[0-9]|!|#|$|%|&|'|\*|\+|\-|/|=|\?|\^|_|`|\{|\||\}|~)+(\.([a-z]|[0-9]|!|#|$|%|&|'|\*|\+|\-|/|=|\?|\^|_|`|\{|\||\}|~)+)*)@((((([a-z]|[0-9])([a-z]|[0-9]|\-){0,61}([a-z]|[0-9])\.))*([a-z]|[0-9])([a-z]|[0-9]|\-){0,61}([a-z]|[0-9])\.(af|ax|al|dz|as|ad|ao|ai|aq|ag|ar|am|aw|au|at|az|bs|bh|bd|bb|by|be|bz|bj|bm|bt|bo|ba|bw|bv|br|io|bn|bg|bf|bi|kh|cm|ca|cv|ky|cf|td|cl|cn|cx|cc|co|km|cg|cd|ck|cr|ci|hr|cu|cy|cz|dk|dj|dm|do|ec|eg|sv|gq|er|ee|et|fk|fo|fj|fi|fr|gf|pf|tf|ga|gm|ge|de|gh|gi|gr|gl|gd|gp|gu|gt| gg|gn|gw|gy|ht|hm|va|hn|hk|hu|is|in|id|ir|iq|ie|im|il|it|jm|jp|je|jo|kz|ke|ki|kp|kr|kw|kg|la|lv|lb|ls|lr|ly|li|lt|lu|mo|mk|mg|mw|my|mv|ml|mt|mh|mq|mr|mu|yt|mx|fm|md|mc|mn|ms|ma|mz|mm|na|nr|np|nl|an|nc|nz|ni|ne|ng|nu|nf|mp|no|om|pk|pw|ps|pa|pg|py|pe|ph|pn|pl|pt|pr|qa|re|ro|ru|rw|sh|kn|lc|pm|vc|ws|sm|st|sa|sn|cs|sc|sl|sg|sk|si|sb|so|za|gs|es|lk|sd|sr|sj|sz|se|ch|sy|tw|tj|tz|th|tl|tg|tk|to|tt|tn|tr|tm|tc|tv|ug|ua|ae|gb|uk|us|um|uy|uz|vu|ve|vn|vg|vi|wf|eh|ye|zm|zw|cat|com|edu|gov|int|mil|net|org|biz|info|name|pro|aero|coop|museum|arpa))|(((([0-9]){1,3}\.){3}([0-9]){1,3}))|(\[((([0-9]){1,3}\.){3}([0-9]){1,3})\])))$"

Regex specifiche per il campo email nei vari linguaggi, c+, python … possono essere reperite dal sito http://emailregex.com/

esempio javascript

/^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/

 

Diversi i validatori regex presenti in rete ad esempio:

Editor Regular Expression Editor with links to to other editors

Introduzione alle regex

The pattern like [^abc] says, match any character not included in the set, abc, where abc is a set of characters and  ^  means not.

Obviously [abc]  match only a or b or c.

So [ ]  is a one character window.

That set can be explicitly defined, or defined by character classes.

[a-c] or [a-h]

Quantifier is necessary to increase occurences:  ?, *, +,  {n}      – ?,*,+ refer to  previous subpattern

?   zero or one character  ( and it applies to the preceding atom in the regex )

*   zero or any sequence of characters  ( of the prior subpattern)

+  one or  any sequence of characters

{n}   repeat n times

{n,m} repeat from n to m times

To check a postal code it’s possible to use something like  [0-9] {5}

There are some predefined character classes ( alias ) in the regular expression

So postal code can be represented  by \d{5}

    the dot matches any character except the newline by default (newline can be different in different platform).  Like [^\n]

\d   (d)igit which is equivalent to [0-9]
\D   which is equivalent to [^0-9]  ( that is  ^\d)

\w   represents “word characters” (digits, underscore and letters) [a-zA-Z0-9_]
\W   represents “non word characters”   ( that is  ^\w)

\s    contains whitespace characters like space, tab, newline, carriage return [\t\r\n]
\S    contains non-whitespace characters    ( that is  ^\s)

.*    match any sequence of characters ( null string included )

Escaping

\      escape to mean literal value for any non-alpha-numeric character  like .,  / or ?
so  \., \/,  \?  others  \+,  \*

Special character  that must be escaped in regular expression:
 POSIX 
.^$*+?()[{\|
 POSIX  extended
.^$*+?()[{\|
POSIX Base
.^$*[\

Grouping, subpattern

To check a web domain that is something like   hpc2.eurotech.com

(” The characters allowed in a label are a subset of the ASCII character set, and includes the characters a through zA through Z, digits 0 through 9, and the hyphen. This rule is known as the LDH rule (letters, digits, hyphen). Domain names are interpreted in case-independent manner. Labels may not start or end with a hyphen”  from Wikipedia )

So it is possible to use

^([a-zA-Z0-9]([a-zA-Z0-9\-][a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}$

or

^([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,6}$

with less  characters
but  \w  include _ that is not permitted

or

^(\w[-.\w]*\w\.\w{2,9})$

with even less characters  (but  \w  include _ that is not permitted)

( )  to grouping and create subpattern

(.*)       match any sequence of characters ( null string included )

([^/]+)   any not null sequence of characters without /

^  means  NOT  but when at the beginning it is a boundary  and is used to start regex espression (usually ^ means NOT when  it is  in [] brakets ) .

$   means end of  regex string

so  ^m   string that begins with m    $m  text that ends with  m      m  any  m

\b is the word boudary   ex.   \b(max|min)\b

 ?!      when ?! appears as the first two characters within parens, that denotes what is known as a “non-capturing negative lookahead.”

(?i)    ignore case

or  operator 

^(\w[-.\w]*\w\.(com|info|biz))

So domains name are restricted to com or info or biz

|   like logic operator OR

attention to greediness

* is a greedy operator  so  to match only a tag we have to use

<.*?>   not   <.*>

\Q .. \E quoting substring

\Qabc$xyz\E abc$xyz
\Qabc\$xyz\E  abc\$xyz
\Qabc\E\$\Qxyz\E  abc$xyz

Useful summary and examples applied to urls

^   beginning-of-line (except when used within a range  = not)

$   end-of-line

(https?)     () capturing group.   The ?  matches: http or https.   ?  means zero or more ‘s

[^/]+     one or more characters, none of which are slash.

[^\.]+     one or more characters, none of which are dot

[^/\.]+\.jpg   matches one or more characters, none of which are dot or slash, followed by .jpg

^/products/?$      matches  /products or /products/  – with or without a trailing slash

\.html?    matches either .htm or .html

^/(?!index\.aspx)(.*)$    any URL that does not begin with index.aspx

^/([^/]+)/([^/]*)(?<!\.aspx)$       The (?<!\.aspx) that ends the pattern is a non-capturing negative look-behind subpattern, which says “the string must not end in .aspx“.

((?:en|fi)[0-9]{2})   The sequence ?: – when it appears as the first two characters within parens – makes the group a non-capturing group.

^(https?://www\.siagri\.net/)     This pattern can be used in a RewriteCond applied against HTTP_REFERER to prevent image leaching.

^(?!https?://www\.siagri\.net/)  This matches the opposite of the prior example: anything that is neither http://www.siagri.net nor https://www.siagri.net . ( ndr. IIRF This pattern can be used in a RewriteCond applied to the HTTP_HOST to rewrite if the hostname is NOT matched by the pattern.)

^(?!www.)([^\.]+)\.siagri\.net    matches any hostname in the siagri.net domain, except for ‘www.siagri.net’.

(https?)://([^/]+)(/([^\?]+(\?(.*))?)?)?      https://eurotech.com/      three groups    ->    1 https ;  2 eurotech.com; 3 /

(https?)://([^/]+)(/([^\?]+(\?(.*))?)?)?      http://eurotech.com/a/b/c.aspx?p1=foo      three groups    ->    1 https ;  2 eurotech.com;   3 /a/b/c.aspx?p1=foo;  4 a/b/c.aspx?p1=foo;    5 ?p1=foo;  6 p1=foo;

Examples of Regular Expressions from  Example of Regular Expression

IP Address Regexp

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

MAC Address Regexp

^([0-9a-fA-F][0-9a-fA-F]:){5}([0-9a-fA-F][0-9a-fA-F])$

Domain Name Regexp

^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}$

Windows File Name Regexp

(?i)^(?!^(PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(\..+)?$)[^\\\./:\*\?\"<>\|][^\\/:\*\?\"<>\|]{0,254}$

Float Number Regexp

[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?

Roman Number Regexp

^(?i:(?=[MDCLXVI])((M{0,3})((C[DM])|(D?C{0,3}))?((X[LC])|(L?XX{0,2})|L)?((I[VX])|(V?(II{0,2}))|V)?))$

Date in format yyyy-MM-dd

(19|20)\d\d([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])

Resource:

Annunci

Regular expression

Regular expression( regex ) introduction.

Regular Expression Editor with links to to other editors

The pattern like [^abc] says, match any character not included in the set, abc, where abc is a set of characters and  ^  means not.

Obviously [abc]  match only a or b or c.

So [ ]  is a one character window.

That set can be explicitly defined, or defined by character classes.

[a-c] or [a-h]

Quantifier is necessary to increase occurences:  ?, *, +,  {n}      – ?,*,+ refer to  previous subpattern

?   zero or one character  ( and it applies to the preceding atom in the regex )

*   zero or any sequence of characters  ( of the prior subpattern)

+  one or  any sequence of characters

{n}   repeat n times

{n,m} repeat from n to m times

To check a postal code it’s possible to use something like  [0-9] {5}

There are some predefined character classes ( alias ) in the regular expression

So postal code can be represented  by \d{5}

    the dot matches any character except the newline by default (newline can be different in different platform).  Like [^\n]

\d   (d)igit which is equivalent to [0-9]
\D   which is equivalent to [^0-9]  ( that is  ^\d)

\w   represents “word characters” (digits, underscore and letters) [a-zA-Z0-9_]
\W   represents “non word characters”   ( that is  ^\w)

\s    contains whitespace characters like space, tab, newline, carriage return [\t\r\n]
\S    contains non-whitespace characters    ( that is  ^\s)

.*    match any sequence of characters ( null string included )

Escaping

\      escape to mean literal value for any non-alpha-numeric character  like .,  / or ?
so  \., \/,  \?  others  \+,  \*

Grouping, subpattern

To check a web domain that is something like   hpc2.eurotech.com

(” The characters allowed in a label are a subset of the ASCII character set, and includes the characters a through zA through Z, digits 0 through 9, and the hyphen. This rule is known as the LDH rule (letters, digits, hyphen). Domain names are interpreted in case-independent manner. Labels may not start or end with a hyphen”  from Wikipedia )

So it is possible to use

^([a-zA-Z0-9]([a-zA-Z0-9\-][a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}$

or

^([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,6}$

with less  characters
but  \w  include _ that is not permitted

or

^(\w[-.\w]*\w\.\w{2,9})$

with even less characters  (but  \w  include _ that is not permitted)

( )  to grouping and create subpattern

(.*)       match any sequence of characters ( null string included )

([^/]+)   any not null sequence of characters without /

^  means  NOT  but when at the beginning it is a boundary  and is used to start regex espression (usually ^ means NOT when  it is  in [] brakets ) .

$   means end of  regex string

so  ^m   string that begins with m    $m  text that ends with  m      m  any  m

\b is the word boudary   ex.   \b(max|min)\b

 ?!      when ?! appears as the first two characters within parens, that denotes what is known as a “non-capturing negative lookahead.”

(?i)    ignore case

or  operator 

^(\w[-.\w]*\w\.(com|info|biz))

So domains name are restricted to com or info or biz

|   like logic operator OR

attention to greediness

* is a greedy operator  so  to match only a tag we have to use

<.*?>   not   <.*>

\Q .. \E quoting substring

\Qabc$xyz\E abc$xyz
\Qabc\$xyz\E  abc\$xyz
\Qabc\E\$\Qxyz\E  abc$xyz

Useful summary and examples applied to urls

^   beginning-of-line (except when used within a range  = not)

$   end-of-line

(https?)     () capturing group.   The ?  matches: http or https.   ?  means zero or more ‘s

[^/]+     one or more characters, none of which are slash.

[^\.]+     one or more characters, none of which are dot

[^/\.]+\.jpg   matches one or more characters, none of which are dot or slash, followed by .jpg

^/products/?$      matches  /products or /products/  – with or without a trailing slash

\.html?    matches either .htm or .html

^/(?!index\.aspx)(.*)$    any URL that does not begin with index.aspx

^/([^/]+)/([^/]*)(?<!\.aspx)$       The (?<!\.aspx) that ends the pattern is a non-capturing negative look-behind subpattern, which says “the string must not end in .aspx“.

((?:en|fi)[0-9]{2})   The sequence ?: – when it appears as the first two characters within parens – makes the group a non-capturing group.

^(https?://www\.siagri\.net/)     This pattern can be used in a RewriteCond applied against HTTP_REFERER to prevent image leaching.

^(?!https?://www\.siagri\.net/)  This matches the opposite of the prior example: anything that is neither http://www.siagri.net nor https://www.siagri.net . ( ndr. IIRF This pattern can be used in a RewriteCond applied to the HTTP_HOST to rewrite if the hostname is NOT matched by the pattern.)

^(?!www.)([^\.]+)\.siagri\.net    matches any hostname in the siagri.net domain, except for ‘www.siagri.net’.

(https?)://([^/]+)(/([^\?]+(\?(.*))?)?)?      https://eurotech.com/      three groups    ->    1 https ;  2 eurotech.com; 3 /

(https?)://([^/]+)(/([^\?]+(\?(.*))?)?)?      http://eurotech.com/a/b/c.aspx?p1=foo      three groups    ->    1 https ;  2 eurotech.com;   3 /a/b/c.aspx?p1=foo;  4 a/b/c.aspx?p1=foo;    5 ?p1=foo;  6 p1=foo;

Examples of Regular Expressions from  Example of Regular Expression

IP Address Regexp

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

MAC Address Regexp

^([0-9a-fA-F][0-9a-fA-F]:){5}([0-9a-fA-F][0-9a-fA-F])$

Domain Name Regexp

^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}$

Windows File Name Regexp

(?i)^(?!^(PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(\..+)?$)[^\\\./:\*\?\"<>\|][^\\/:\*\?\"<>\|]{0,254}$

Float Number Regexp

[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?

Roman Number Regexp

^(?i:(?=[MDCLXVI])((M{0,3})((C[DM])|(D?C{0,3}))?((X[LC])|(L?XX{0,2})|L)?((I[VX])|(V?(II{0,2}))|V)?))$

Date in format yyyy-MM-dd

(19|20)\d\d([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])

Resource:

ASP:NET Email validation expression

<asp:RegularExpressionValidator ID=”RegularExpressionValidator1″ Display=”Dynamic”
ControlToValidate=”email” ValidationExpression=
“^((([a-z]|[0-9]|!|#|$|%|&|’|\*|\+|\-|/|=|\?|\^|_|`|\{|\||\}|~)+(\.([a-z]|[0-9]|!|#|$|%|&|’|\*|\+|\-|/|=|\?|\^|_|`|\{|\||\}|~)+)*)@((((([a-z]|[0-9])([a-z]|[0-9]|\-){0,61}([a-z]|[0-9])\.))*([a-z]|[0-9])([a-z]|[0-9]|\-){0,61}([a-z]|[0-9])\.(af|ax|al|dz|as|ad|ao|ai|aq|ag|ar|am|aw|au|at|az|bs|bh|bd|bb|by|be|bz|bj|bm|bt|bo|ba|bw|bv|br|io|bn|bg|bf|bi|kh|cm|ca|cv|ky|cf|td|cl|cn|cx|cc|co|km|cg|cd|ck|cr|ci|hr|cu|cy|cz|dk|dj|dm|do|ec|eg|sv|gq|er|ee|et|fk|fo|fj|fi|fr|gf|pf|tf|ga|gm|ge|de|gh|gi|gr|gl|gd|gp|gu|gt| gg|gn|gw|gy|ht|hm|va|hn|hk|hu|is|in|id|ir|iq|ie|im|il|it|jm|jp|je|jo|kz|ke|ki|kp|kr|kw|kg|la|lv|lb|ls|lr|ly|li|lt|lu|mo|mk|mg|mw|my|mv|ml|mt|mh|mq|mr|mu|yt|mx|fm|md|mc|mn|ms|ma|mz|mm|na|nr|np|nl|an|nc|nz|ni|ne|ng|nu|nf|mp|no|om|pk|pw|ps|pa|pg|py|pe|ph|pn|pl|pt|pr|qa|re|ro|ru|rw|sh|kn|lc|pm|vc|ws|sm|st|sa|sn|cs|sc|sl|sg|sk|si|sb|so|za|gs|es|lk|sd|sr|sj|sz|se|ch|sy|tw|tj|tz|th|tl|tg|tk|to|tt|tn|tr|tm|tc|tv|ug|ua|ae|gb|us|um|uy|uz|vu|ve|vn|vg|vi|wf|eh|ye|zm|zw|cat|com|edu|gov|int|mil|net|org|biz|info|name|pro|aero|coop|museum|arpa))|(((([0-9]){1,3}\.){3}([0-9]){1,3}))|(\[((([0-9]){1,3}\.){3}([0-9]){1,3})\])))$”
EnableClientScript=”True” ErrorMessage=”This is not a valid email!” runat=”server” />