PDA

View Full Version : What IS a valid email, TRULY? Experts please. :)


whammy
01-09-2003, 12:08 AM
Ok... this thread is really meant for the experts, who not only know regular expressions, but also are experienced with email addresses, etc.

Now, I've tried many many many complicated regular expressions to validate email addresses, and no matter how complex, (or simple, usually), so far, without fail, they all fail on perfectly valid email addresses.

Before posting your regular expression, first does it allow ['\+] before the @ sign? If not, it fails on valid email addresses. I am a bit exasperated that apparently nobody can come up with a truly valid regex for validating email addresses, so at the moment I'm using a modified version of the built-in ASP.NET email validation regex because it didn't allow for ' before the @ (which IS valid, amongst other characters).

Since I keep running into stupid characters (that apparently noone knows about, and the RFC is unintelligible to me) like a ['\+] that ARE allowed in the first section, I updated it to this:

\w+[^@\s]*@\w+([-.]\w+)*\.[a-zA-Z]{2,}

If an email address can indeed START with a hyphen (as in another post, [\w-]+, I need to update this yet again to be even looser.

Can a valid email address start with a hyphen as the first character? Hotmail, for instance, will accept only \w before the @, however it won't accept an underscore as the first character of an email address.

Is there ANYONE out there that REALLY knows what a valid email address is, or can translate the appropriate RFC into plain english? I have yet to test a complex email regular expression that is not just...

[^@\s]+@[^@\s]+\.[a-zA-Z]{2,}

...that also doesn't fail against a valid email address (even if unlikely)... if all else fails, I will start using the one above!

TIA...

jkd
01-09-2003, 01:19 AM
I don't suppose you've seen my email RegExp used in some of JSK's scripts?

/^([\w-]+(?:\.[\w-]+)*)@((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$/i

About as accurate as I can get it... I've also tried direct validation against domain extensions... but they can change, and the already long regexp suddenly becomes humongous.

I've decided a valid email consists of any number of alphanumeric characters, hyphens, and periods, followed by an @, an unlimited length subdomain, then by 67 alphanumeric characters/hyphens domain, followed by a suffic of 2 to 6 letters, with an optional postsuffix two letters. (.co.uk for example)

If you exec() the above regexp, it pulls out the part before the @, the domain name, and the domain suffix into match results.

Some of the information for domain names was researched. Older domains allow up to 67, most new ones don't. It must start with an alphanumeric character and not a hyphen too.
Other stuff I made an educated assumption by looking at common practice, current instances, etc.

whammy
01-09-2003, 01:23 AM
I don't bother with domain extensions, I use \.[a-zA-Z]{2,} ... Actually though, your regex was the very first one I tried, quite a while ago, and it failed on something like:

John.O'Brien@something.com

Which is a valid email address.

Which was part of the reason I started using looser email regular expressions, and I've tried many other fairly complicated regex's that also fail for some reason or another.

I know there is no way to truly validate an email address without "sniffing" it using a server-side script, but I would like to make a nice, LOOSE (but yet valid) script that rejects invalid email addresses, while allowing others to pass (even if they are not a real email address), that's a little tighter than .+@.+\.[a-zA-Z]{2,}

BTW, the reason I know that they fail is I tested them all on an intranet where Customer Service reps receive hundreds or thousands of emails a day, and they generally have to log those email addresses and validate them against whatever regular expression I am using at the time. Believe me, I hear about it if a valid email address doesn't pass (and I check to make sure they aren't mistyping it). :D

I suppose I'm best off sticking with a really loose regex until I can find the "holy grail" of email validation. :)

jkd
01-09-2003, 01:35 AM
Apostrophes are legal? I was unaware of that. If I knew what was legal and what was not, I wouldn't mind busting out my crazy regexp skills again ;)...

Am I wrong in assuming that whatever comes before the @ *must* be a valid *nix username? Or am I off base on that? I wouldn't mind some official specification which clears it up.... if anyone has a link.

whammy
01-09-2003, 01:58 AM
That's my whole dilemma...

I thought I knew what a valid email address was (that it *had* to be a valid *nix username, at least - according to other knowledgeable web developers I know), and that apparently isn't so, since I actually emailed the guy (with the single quote in the email address) to apologize about the situation, and received an email back. :)

I have read the appropriate RFC, (don't remember which one, 822?) and it reads like greek to me, although from what I read it explains in detail what the headers, etc. should look like but nowhere does it explain what a valid email address is... and for the life of me, I can't find ANY resource anywhere that does! I've been trying to find something like that for months.

So I'm trying to figure out, what really *is* a valid email address? Apparently none of the complicated regex's I've seen are correct. :D

I think for now (judging by my luck looking for some resource for this) I will totally "loosen" my regex just to be safe...

joeframbach
01-09-2003, 02:25 AM
i dont know regexps but i have another dillema for you.
my school issues email addies for the admins. an example, not real, for instance:
jframbach@westallegheny.k12.pa.us
so... hm.

whammy
01-09-2003, 11:42 PM
Actually, an email address like that should validate just fine using any of the regex's I've used. :)

Shecky
01-10-2003, 03:23 AM
i'm not sure if it would interest you but CDYNE.com (http://www.cdyne.com/) has a great free email verifier that actually checks with the email server, then returns response codes that tell you if the email is valid.

This requires asp to work

jkd
01-10-2003, 04:30 AM
Along that line:
http://boogietools.com/RFC/rfc821.txt

It seems all you need to do is connect to an SMPT server, and do:

VRFY user

Where you want to check user@domain.com... seems like it would be easy enough to do in PHP, though I probably misinterpretted the document.

I wouldn't mind writing my own version of this... if anyone knows of an open-source validator written in any language, I wouldn't mind taking a look at it and seeing where it actually does connect and run the verification.

jkd
01-10-2003, 04:38 AM
Actually, it seems to be very easily done, as long as you know the address of the mail server (which isn't necessarily the same as the domain in the address):

220 mssm.org running Eudora Internet Mail Server 2.2.2
HELO
250 mssm.org hello (12.149.226.203)
VRFY davisj
250 Jason Davis <davisj@mssm.org>
VRFY someonenotreal
550 5.1.1 user not known


:)

whammy
01-10-2003, 11:44 PM
Yeah, actually I already have some sniffer code to use to verify an email address like that in ASP - I was just looking to make either a more accurate email regex (or a looser, but still correct regex) to not only use in combination with a sniffer (if the email address passes the regex test) but for instance, to also be able to use in javascript client-side.

Ökii
01-11-2003, 12:05 PM
Just to sling a clanger element into the above regexs, numeric IP based email addresses are also perfectly valid.

J.O'Brien@127.0.0.1

maybe

^(([^<>;()[\]\\.,;:@"]+(\.[^<>()[\]\\.,;:@"]+)*)|(".+"))@((([a-z]([-a-z0-9]*[a-z0-9])?)|(#[0-9]+)|(\[((([01]?[0-9]{0,2})|(2(([0-4][0-9])|(5[0-5]))))\.){3}(([01]?[0-9]{0,2})|(2(([0-4][0-9])|(5[0-5]))))\]))\.)*(([a-z]([-a-z0-9]*[a-z0-9])?)|(#[0-9]+)|(\[((([01]?[0-9]{0,2})|(2(([0-4][0-9])|(5[0-5]))))\.){3}(([01]?[0-9]{0,2})|(2(([0-4][0-9])|(5[0-5]))))\]))$

which should match email addresses of the format specified by RFC 821 (see the BNF notation at http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc0821.html#page-30)

care of: http://www.regxlib.com/Default.aspx

scroots
01-11-2003, 08:01 PM
before anyone start work would it not be a good idea to make a list of valid and invalid email address, as everyone is posting left right and center with examples.
If a list is drawn up it can then be seen what are the common elelments etc.

scroots

ShMiL
01-12-2003, 11:22 AM
Originally posted by whammy
Yeah, actually I already have some sniffer code to use to verify an email address like that in ASP - I was just looking to make either a more accurate email regex (or a looser, but still correct regex) to not only use in combination with a sniffer (if the email address passes the regex test) but for instance, to also be able to use in javascript client-side.
Can you post that ASP code to check the existance of the email via SMTP?
(THANKS)

whammy
01-12-2003, 05:08 PM
http://coveryourasp.com/ValidateEmail.asp

Should work for you. I can't post the code I have, as its not handy.

ShMiL
01-12-2003, 05:22 PM
thanks