WinBatch® Technical Support Forum

All Things WinBatch => WinBatch => Topic started by: stanl on November 06, 2017, 10:39:50 AM

Title: Regex For Email Validation
Post by: stanl on November 06, 2017, 10:39:50 AM
Attached is a test which uses .NET/CLR regex and a pattern I grabbed from C#. I read a few comments that regex is not really the way to go for email address validation. I am going to be looking at a variety of regex patterns to extract emails, phone numbers, salesid's from user notes (queried into memo fields). Would appreciate anyone testing the pattern for a valid email that the script calls invalid.
Title: Re: Regex For Email Validation
Post by: kdmoyers on November 09, 2017, 08:52:28 AM
I think that plus signs are valid in an email address, like

fred+sales@company.com

The suggested regex flags them as bad.
Title: Re: Regex For Email Validation
Post by: stanl on November 10, 2017, 06:06:19 AM
Thanks Kirby.
Title: Re: Regex For Email Validation
Post by: chrislegarth on November 14, 2017, 10:46:37 AM
Here is a pattern that I use and found on the Internet.

pattern = `^(?("")("".+?(?<!\\)""@)|(([0-9a-z]((\.(?!\.))|[-!#\$%%&'\*\+/=\?\^`: '`\{\}\|~\w])*)(?<=[0-9a-z])@))(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$'

it's concatenated because the pattern contains the ` character.
Title: Re: Regex For Email Validation
Post by: JTaylor on November 14, 2017, 11:21:39 AM
This may be the same thing...but in any event comes from:

https://html.spec.whatwg.org/multipage/input.html#e-mail-state-(type=email)

Jim

Code (winbatch) Select



A valid e-mail address is a string that matches the email production of the following ABNF, the character set for which is Unicode. This ABNF implements the extensions described in RFC 1123. [ABNF] [RFC5322] [RFC1034] [RFC1123]

email         = 1*( atext / "." ) "@" label *( "." label )
label         = let-dig [ [ ldh-str ] let-dig ]  ; limited to a length of 63 characters by RFC 1034 section 3.5
atext         = < as defined in RFC 5322 section 3.2.3 >
let-dig       = < as defined in RFC 1034 section 3.5 >
ldh-str       = < as defined in RFC 1034 section 3.5 >

This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/