...

View Full Version : URL validation with regular expression



dominicall
11-05-2002, 01:37 PM
Hi all - am looking for an answer to a particular problem...

I have a form on my site that allows visitors to provide the URL of the site... Am looking for a reg exp for validating that the URL is correctly formed...

Anyone able to help.

Thanks

Dominic :D

beetle
11-05-2002, 03:25 PM
Can you provide the rules for a correctly formed URL?

What URLs are you going ot accept?

www.codingforums.com
http://www.codingforums.com/
codingforums.com
http://www.codingforums.com/newreply.php?s=&action=newreply&threadid=9241

All are valid URLs. Which do you plan on accepting?

dominicall
11-05-2002, 03:37 PM
Oops - sorry beetle.... should have been clearer.

Am only looking to validate anything http, i.e.

http://www.codingforums.com
http://someotherprefix.codingforums.com

or with a querystring, i.e.

http://www.codingforums.com?anything_here

Am not so worried about ppl putting the http:// in front of their entered address as I can check and add if necessary... have been struggling to design the correct pattern match for the rest...

I am getting better with reg exps but this one has me stumped... what I'm looking to do is...


function isValidURL(url) {
if (!/regularexpressionpatternforurlhere/.test(url)) {
return false;
}
return true;
}

Hope that makes it clearer.

Dominic :confused:

beetle
11-05-2002, 04:03 PM
Well, here's a first draft...

/^(?:http:\/\/)?(?:[\w-]+\.)+[a-z]{2,6}$/i

Breakdown:

^
match at beginning of string

(?:http:\/\/)?
optional match of the http://

(?:[\w-]+\.)+
At least one group of accepted characters and a period

[a-z]{2,6}$
match domain at end of string.

dominicall
11-05-2002, 04:26 PM
Cool - thanks beetle - works perfectly....

Funnily enough, after I posted my message I found this...

(((https?)|(ftp)):\/\/([\-\w]+\.)+\w{2,3}(\/[%\-\w]+(\.\w{2,})?)*(([\w\-\.\?\\/+@&#;`~=%!]*)(\.\w{2,})?)*\/?)/i

which also validates with a querystring, so have cut it down to...

(http:\/\/([\-\w]+\.)+\w{2,3}(\/[%\-\w]+(\.\w{2,})?)*(([\w\-\.\?\\/+@&#;`~=%!]*)(\.\w{2,})?)*\/?)/i

to validate http:// only

It doesn't check for max number of characters for the TLD - so I'm going to try to decipher it tonight - which should confuse me some - LOL. Am not sure if I'll work it out though :confused:

Thanks

Dominic :D

mordred
11-05-2002, 04:35 PM
How useful is checking for a maximum TLD characters length? Some months ago, ICANN added .museum to the available TLDs, and the effect: Most older regexp validating URLs needed an update, if the problem was communicated and easy to resolve (since the intent behind a regexp is difficult to see from the syntax).

Because we don't know what TLDs might pop up in the nearer future, I would refrain from trying to set a maximum length at the moment.

Just my 2 cents.

dominicall
11-05-2002, 04:37 PM
Good point mordred... who knows what ICANN are going to do next...

Dominic :D

dominicall
11-05-2002, 07:39 PM
Huh.... what's going on...

Am incorporating the URL check into my site at home but first need to check whether the submitted url starts with http:// ... so Iwrote this just to do a check see if I was getting it right...


function isValidURL(url) {
if (!/^((http)|(HTTP)):\/\//.test(url)); {
alert ("oops - not OK");
checkurl.url.focus();
checkurl.url.select();
return false;
}
alert ("everything OK");
return true;
}

So why does it return false even when I start the url with http:// or HTTP://

Very confused :confused:

Dominic

mordred
11-05-2002, 09:46 PM
Because one semicolon is in the wrong place, your intended else block is always carried out and the function returns.



if (!/^((http)|(HTTP)):\/\//.test(url)); {


Nothing wrong with the RegExp as far as I can see, though.

dominicall
11-05-2002, 09:59 PM
Hmmm..... thanks mordred....

I took out the semi-colon (the one you showed in red) and it still returns false even if I use a URL starting with http://


function isValidURL(url) {
if (!/^((http)|(HTTP)):\/\//.test(url)) {
alert ("oops - not OK");
checkurl.url.focus();
checkurl.url.select();
return false;
}
alert ("everything OK");
return true;
}

Any ideas????

Dominic :confused:

beetle
11-05-2002, 10:02 PM
just do this

if (!/^http:\/\//i.test(url)) {

See that red i? It's called a pattern modifier, and the i modifier means case-insensitive matching, or ignore case.

the other common modifier is g which signifies global matching, replacing.

If you look back, you can see that I used the i modifier in my first post :D

dominicall
11-05-2002, 10:11 PM
LOL - that's what I did in the first place but changed it to be verbose.... still returning the error though

Have attached as a text file - VERY strange....

Dominic :(

whammy
11-05-2002, 11:28 PM
Did you check out the regular expression I'm using to validate URLs?

http://www.solidscripts.com/displayscript.asp?sid=10

I got it from http://www.regexlib.com/ .

Seems to work pretty good... I haven't run into any problems with it yet, at any rate. :)

Owl
11-06-2002, 12:07 AM
Hi dominicall,

<form name="checkurl" onsubmit="return isValidURL(this.url.value)">

( ) ( )
>>V

whammy
11-06-2002, 12:33 AM
No offense Owl, but where's the javascript function you're calling (isValidUrl())?

Also, when you're using server-side scripting, it's a lot more reliable than javascript, since you don't have to rely on the client having javascript enabled on their browser.

:)

dominicall
11-06-2002, 12:43 AM
Thanks Owl... that worked....

Re: the client side vs server side validation - I probably go over the top but actually do both... client side since it's instant and gives ppl the chance to change the form before submission and then server side for those without javascript.

All I need to do now is work out how to add to the http:// to the front of the url if it hasn't be added by the user...

LOL

Dominic :rolleyes:

whammy
11-06-2002, 01:06 AM
Take a look at the regex I'm using, I think it works pretty good, but I can't guarantee it's totally correct, since I didn't write it. It hasn't given me any grief anyway.

As for the javascript thing, my bad... I didn't realize you were using both methods of validation. In my opinion it's great to do that, as long as you are also validating on the server-side since you can catch errors client-side before they are ever posted to the server.

P.S. dominicall, have you gotten the message about your functions that I posted? I can save you a lot of grief since most of them are unnecessary. ;)

dominicall
11-06-2002, 01:13 AM
Yeah - got your message... sent you a reply...

Send me a msg with your thoughts - look forward to it.

Dominic :D

whammy
11-07-2002, 12:19 AM
Well, you can just use ASP to see if the input is http://:


<%
Function BeginsWithHTTP(byVal str)
Dim bghRegEx
Set bghRegEx = New RegExp
bghRegEx.Pattern = "^(https?|ftp):\/\/.*$"
bghRegEx.IgnoreCase = True
BeginsWithHTTP = bghRegEx.Test(str)
End Function

Response.Write(BeginsWithHTTP("http://www.blah.com") & "<br />" & vbCrLf)
Response.Write(BeginsWithHTTP("https://www.blah.com") & "<br />" & vbCrLf)
Response.Write(BeginsWithHTTP("ftp://www.blah.com") & "<br />" & vbCrLf)
Response.Write(BeginsWithHTTP("www.blah.com") & "<br />" & vbCrLf)
%>


I just tested that and it seems to work. You should also be able to use that regular expression in javascript like this:

if(!/^(https?|ftp):\/\/.*$/i.test(yourstring)){
alert("Oh my gosh! You didn't type a valid URL!");
}

;)

As for your other functions, I have yet to reply to your email since I haven't had time. But I will show you the right way to do it. I did learn something valuable from the functions that you posted though, so it's all good! :)

beetle
11-07-2002, 12:46 AM
Originally posted by whammy
"^((http|https|ftp):\/\/).*$"Wouldn't this work?
"^(https?|ftp):\/\/.*$"

whammy
11-07-2002, 12:47 AM
Yeah it would, and it's more elegant. I typed that up rather quickly after looking at some other posts... good catch, though :).

I'll fix my above posts to include that in the regex's.

Although if you want to get nitpicky, we could split that apart even more with pipes, parentheses and question marks... but then it would be more code instead of less, so this is still the best way assuming dominicall's string is not part of a larger picture. ;)



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum