View Full Version : Validate UK phone number with reg exp
dominicall
10-19-2002, 03:21 PM
Arrrgggghhhhhhhhh - help.... - LOL
Now I've got that over with, hope someone can help.
Am just learning regular expressions and am trying to validate UK format telephone numbers (have had an extensive look on the web and only examples I can find are US tests). Anyway, I want to learn to do it myself.
UK phone number rules are:
Must start with a 0 in position 1
Next 2 to 4 characters must be a number
Next character can be a space or a dash
Next 6 to 9 characters can be a number, space or dash only
Anyway, as I say, am new to regular expressions and trying to learn so I tried to write the expression to fit the rules above - see below:
^[0]{1}\d{2,4}\s|-{1}\d|\s|-{6,9}$
My logic (for what it's worth - LOL) tells me this should work, but it doesn't - it accepts letters as well - which I really don't understand yet.
Anyway - hope someone can help.
Thanks
Dominic :confused:
dominicall
10-19-2002, 03:32 PM
Hmmm.... think I figured it out, but would be good if someone could just verify this for me... I think it works???
^[0]{1}\d{2,4}([\s|-]){1}([\d|\s|-]){6,9}$
Let me know
Thanks
Dominic :confused: or :D ... not quite sure
Given your description of the pattern, I think:
^0\d{2,4}[ -][\d -]{6,9}$
Is correct.
dominicall
10-19-2002, 07:38 PM
Thanks jkd
A tidier, more concise version of my re-write.
Am getting there - slowly - LOL
Dominic :)
whammy
10-19-2002, 10:31 PM
I wanted to see what I came up with without looking... and I came up with:
^0\d{2,4}[\s-][\d\s-]{6,9}$
Which only differs from jkd's in the fact that he has a literal space instead of \s :)
So I like his better:
^0\d{2,4}[ -][\d -]{6,9}$
Nightfire
10-19-2002, 10:43 PM
Just a quick question, does the script make sure there are 2 - 4 numbers in the area code? or have I misread it, coz the maximum is 5
I don't know anything about reg exp, that's why I asked :p
whammy
10-20-2002, 04:48 AM
Actually, that one says a 0 followed by 2-4 numbers. Which does equal 5 in total, 3 at the least...
joh6nn
10-20-2002, 06:14 AM
just wanted to throw this in real quick:
http://evolt.org/article/Usable_Forms_for_an_international_audience/4090/15118/index.html
dominicall
10-20-2002, 06:26 AM
Update...
I'd been doing some more work to tidy things up with the reg exp when I saw Nigtfire's question.
Unfortunately, unlike US numbers, there is no absolute standard for UK phone numbers - all the below are valid numbers:
020 1234 5678 or could be 020-1234-5678
01200 123456
01234 567 8901
That's why after checking for a 0 in position 1 the expression then checks for 2 to 4 numbers. I'm allowing either spaces or dashes only as separaters.
Anyway, I've updated the expression to:
^0\d{2,4}[ -]{1}[\d]{3}[\d -]{1}[\d -]{1}[\d]{1,4}$
...which is a bit more robust, but still allows someone to enter an invalid number such as:
01234 12345678
... which is pain in the butt...
Since I'm still learning I'm forgiving myself this right now because it still does enough of a check to prevent people just putting lots of dashes, spaces or letters as garbage.
Unless someone can think of a way of doing it all in 1 reg exp then I'm going to break it up and check length of dialling code (1st part of number) in step one, then check the length of the second (and 3rd if appropriate) parts of the number to give a definite answer.
Will post the final solution up here since am sure, like me, there are plenty of people that will need to validate UK number.
:thumbsup:
Dominic
dominicall
10-20-2002, 07:01 AM
Hi all
Thought I'd go to the BT site to see how they validate the numbers (BT being the major carrier in the UK).
Anyway, laughingly they ask for the number to be input without any spaces - I suppose that's one solution.
I know that's not exactly what the forums is for but it made me chuckle so I thought I'd pass it on.
:)
Dominic
Philip M
10-20-2002, 11:57 AM
This may not be entirely apposite, but why do you want to validate the phone number?
If you want to ensure that the user submits A phone number,
(as opposed to blanks, rude words etc.) then all you need is to weed out anything but numbers, spaces and hyphens.
It is always annoying when forms insist on a particular input format:-
020-7234-5678 or 020 7234 5678 or 0207-234-5678
usually without making this clear in advance. Why not allow any input format and then simply weed out everything but numbers to leave 02012345678?
But if you are trying to harvest phone numbers for some marketing purpose, and make it compulsory to include a phone number, then the user can (and if he is sensible does) submit a made up number (or the number of someone he dislikes!) For myself I would never give out my real (ex-directory) number.
Also, validating to a UK format prevents overseas people from using the form. This is a common problem with USA order forms which often seem to imagine that phone numbers (or postcodes) across the world correspond with the format of US ones.
Another validation problem is forms that reject names or addresses with unusual characters in them - Mr O'Hara, Smith & Sons Inc., "My House Name", Westward Ho!, John Smith (1985) Ltd. and so forth.
In like manner, an email address can be valid syntax but still wrongly spelled etc.
In short, excessively restrictive form validation can turn out to be too clever by half and a pain in the undercarriage for the user. I am speaking more as a user than as a programmer.
whammy
10-20-2002, 03:52 PM
You're right - for instance, for the USA and CANADA I match the following:
1 (123) 456-7890
(123) 456-7890
*123*.456.7890
1234567890
1-123-456-7890
etc., since you never know what users are going to enter. IF the phone number matches the above, I format it for display using a regular expression:
fpRegEx.Pattern = "^\D*(1)?\D*([1-9]\d{2})\D*(\d{3})\D*(\d{4})\D*$"
FormatPhoneNumber = fpRegEx.Replace(sPhone,"($2) $3-$4")
which ends up nicely as:
(123) 456-7890
However, since pure numbers are cleaner in the database, and that's what most clients want, I simply strip everything but numbers when writing it to a database:
enRegEx.Pattern = "\D"
enRegEx.Global = True
ExtractNumbers = enRegEx.Replace(str,"")
I just use the honor system if someone NOT in the USA or CANADA enters a phone number (since there are way too many different foreign formats to validate, and as mentioned in the article John posted above, that prevents frustration on the part of users that don't have a phone number following that format.) and use the above method to leave only the numbers entered.
Also, to make it really clear, I might display an error like:
* Invalid phone number for USA (just in case they accidentally entered the wrong country).
The above is VBScript, but the same regular expressions can be used in JavaScript. :)
For dealing with various forms on non-numeric input in things that can be represented solely by numbers, I typically strip all of the nonnumbers out first:
phone.replace(/\D/g, '')
Now I can check the length, check groups of numbers, etc. Generally makes your life easier.
dominicall
10-20-2002, 04:25 PM
Thanks for your comments Philip.
Just to clear up a few points, since I obviously didn't make myself quite clear enough on a couple of points.
The numbers are being collected from client companies who will pay to use the service on the site I am developing - not from the general public.
I currently work in the Direct Marketing industry and am a strict follower of the UK Data Protection Act. Indeed, even though I am a few months yet from launching my business, one of my first acts after registering the business was to register with the Information Commissioner (formerly the Data Protection Registrar). I, like most people I am sure, hate un-requested email and/or telephone calls.
My need for validating the telephone numbers was to ensure that I didn't receive absolute rubbish from clients - unlikely I know given that they're paying for the service but a useful check.
I had also come to the conclusion that the best course was to remove anything that isn't a number and just check that the phone number fulfils the minimum and maximum number of digits allowed in a UK phone number.
This of course still allows clients to put in unreal numbers, but some validation is better than none at all.
From an international perspective, at the launch of the business, and for at least the first six months of trading, my only clients will be UK clients. Thereafter, I will probably widen the net to include overseas clients. Then I'll have a look at the new validation requirements and re-work then.
If nothing else, given that I'm learning regular expressions it has been a useful exercise in getting my head around how to use reg expressions for validation and what a useful tool they are.
Certainly don't mean to flame you and apologies if that's how it appears. Your points are very valid and as I said, I'd reached very much the same conclusion myself.
Dominic :)
ensign
10-20-2011, 12:43 AM
UK phone number rules are:
Must start with a 0 in position 1
Next 2 to 4 characters must be a number
Next character can be a space or a dash
Next 6 to 9 characters can be a number, space or dash only
Your specifications for UK telephone numbers are wide of reality by a fair margin.
UK numbers start with one or other of two things:
- the +44 country code and a space, or
- the 0 national-dialling trunk-code.
Not counting the +44 or the 0, the rest of the number usually has either 9 or 10 digits (apart from two special cases with only 7 digits:
- 0800 1111 and 0845 46 4X).
After the +44 or 0, the area code follows and usually has between 2 and 5 digits; e.g. 29, 118, 161, 1486, 13873.
The subscriber number comes last and usually has between 4 and 8 digits. If the subscriber number has 7 or 8 digits, it is split with a space placed 4 digits from the end.
Longer area codes are paired with shorter subscriber numbers, and vice versa, so that there is a total of 9 or 10 digits.
all the below are valid numbers:
020 1234 5678 or could be 020-1234-5678
01200 123456
01234 567 8901
Subscriber numbers that begin with a 0 or 1 are NDO or National Dialling Only numbers.
Ordinary subscriber numbers begin only with the digits 2 to 9.
NDO numbers require the area code to be dialled even when called from within the same area.
For 01 and 02 numbers the area code should be in parentheses when expressed in national format (except for NDO numbers).
The last example in your list has too many digits.
The following list covers the range of formats in use in the UK from 1999/2000 onwards (note: 03 numbers were introduced a little later):
10 digit NSNs
(013873) xxxxx
(015242) xxxxx
(015394) xxxxx
(015395) xxxxx
(015396) xxxxx
(016973) xxxxx
(016974) xxxxx
(016977) xxxxx
(017683) xxxxx
(017684) xxxxx
(017687) xxxxx
(019467) xxxxx
(011x) xxx xxxx
(01x1) xxx xxxx
(01xxx) xxxxxx
(02x) xxxx xxxx
03xx xxx xxxx
055 xxxx xxxx
056 xxxx xxxx
070 xxxx xxxx
07624 xxxxxx
076 xxxx xxxx
07xxx xxxxxx
08xx xxx xxxx
09xx xxx xxxx
9 digit NSNs
(016977) 2xxx
(016977) 3xxx
(01204) xxxxx
(01208) xxxxx
(01254) xxxxx
(01276) xxxxx
(01297) xxxxx
(01298) xxxxx
(01363) xxxxx
(01364) xxxxx
(01384) xxxxx
(01386) xxxxx
(01404) xxxxx
(01420) xxxxx
(01460) xxxxx
(01461) xxxxx
(01480) xxxxx
(01488) xxxxx
(01524) xxxxx
(01527) xxxxx
(01562) xxxxx
(01566) xxxxx
(01606) xxxxx
(01629) xxxxx
(01635) xxxxx
(01647) xxxxx
(01659) xxxxx
(01695) xxxxx
(01726) xxxxx
(01744) xxxxx
(01750) xxxxx
(01768) xxxxx
(01827) xxxxx
(01837) xxxxx
(01884) xxxxx
(01900) xxxxx
(01905) xxxxx
(01935) xxxxx
(01946) xxxxx
(01949) xxxxx
(01963) xxxxx
(01995) xxxxx
0500 xxxxxx
0800 xxxxxx
7 digit NSNs
0800 1111
0845 46 4x
The above list covers ALL UK number ranges and their required format. NSN is National Significant Number and is all of the digits after the +44 country code or the 0 trunk code.
Valid formats for geographic numbers include 2+8, 3+7, 4+6, 4+5, 5+5 and 5+4 (and 0+10 for NDO numbers).
Non-geographic numbers mostly use 0+10 format, but some 0800 numbers and all 0500 numbers use 0+9 format.
NDO numbers are National Dialling Only. These have been around for several decades and need the area code to be dialled even when called locally from within the same area. These numbers are used for alarm systems, computer communication systems and other lines that are not dialled for voice calls. They are also used as the termination point for non-geographic numbers. NDO numbers are not supposed to be advertised nor directly called by subscribers. NDO numbers are always 0+10 format. The subscriber number always begins with a 0 or 1.
Philip M
10-20-2011, 07:50 AM
Ensign - So what? This is a very ancient thread, long finished.
ensign
10-20-2011, 08:08 AM
This thread ranks on the first page of Google results with the original incorrect information for the UK. We're just fixing an application that was seemingly built to the specification detailed by the OP in the first post.
Obtaining the correct information for the UK has been very difficult, especially since there appears to be errors in the official lists. After a lot of digging to find the right stuff, we thought that it would be a good idea to share the correct information for the benefit of future coders.
A task that was marked as "trivial - 0.5 hours" actually took a whole day to research and fix. Number formats used in various places in and around Cumbria took the most amount of time to get completely right.
Philip M
10-20-2011, 09:04 AM
Ensign - OK, but see my remarks in Post #11. Excessive validation may turn round and bite you - as indeed it seems to have done already. It ought to be enough to check that
a) only digits are entered (strip non-digits for checking) - or allow only digits space hyphen in the input box onkeyup.
b) first digit is 0
c) total digits are 10 or 11.
The biggest risk is that the user simply gets the number plain wrong - transposes digits etc.
ensign
10-20-2011, 09:32 AM
Fair point about validation. The final form of validation we used was to allow almost any input format but to check the number range for validity and to also check the number length.
This is what we did:
- allow +sign, spaces, hyphens, periods and parentheses anywhere the user wanted to type them
- strip off leading +44, (+44), (44), 0, (0), +44 (0) or whatever user had typed,
- remove spaces and punctuation,
- check the remaining length is 9 or 10 digits, rejecting incorrect length,
- check the number range is valid, rejecting numbers not on the list on the previous page (such as 04xxxxxxxxx etc).
The leading zero is added back on and the number is stored as a 10 or 11 character string.
The format list on the previous page is used when printing the number. It clearly shows the area code and subscriber number as two separate entities.
The number is echoed back to the customer in what we believe to be the correct format. If they don't agree with that, then they can alter it and whatever they amend it to will be stored "as is", with a flag to show the customer amended it. This allows people to enter foreign numbers if they need to (recently a Spanish +34 mobile number for example). This is very rare but it would be a UI failure to not accept other stuff.
When the database returns a 10 or 11 digit number beginning with a 0, it is formatted as per the list on the previous page when printed. If the "amended" flag is set, then the number is printed "as is" with the text "(unverified)" shown immediately after.
Hope the information is useful to anyone trying to code for UK telephone numbers!
jimbo99
10-22-2011, 10:01 AM
Ensign:
I too need a working PHP (not javascript) regex that allows users to enter telephone numbers on a form - both UK and International - that may include +, spaces, dashes and (). Number length can vary considerably so is there a regex that does this and prevents code injection using server side checking? I do not want a complex strip-out as above in post #19.
This thread may be old but the problem of code injection is getting worse and users still need some input flexibility.
ensign
10-28-2011, 11:26 AM
What do you want the RegEx to do? Does it just need to check the telephone number has the right number of digits, or does it need to be able to check the number range is valid too? For example, with some extra work it could reject a number with the 0171 area code as non-valid while accepting 0161 as a valid UK area code.
When storing the numbers, it is vital to do so in a consistent format, preferably also with the country code. When area codes change within a particular country, as they do from time to time, you need a simple way to be able to update those numbers in your database. If you store numbers in a variety of formats, imagine trying follow the instruction "From the 1st of next month, numbers in country X are changing such that area code 1 changes to 21, area code 3 changes to 33 and area code 55 changes to 75".
Philip M
11-02-2011, 08:54 AM
Ensign:
I too need a working PHP (not javascript) regex ......
So you should post in the PHP forum.
jassi.singh
11-03-2011, 10:06 AM
Hi,
Check this link: http://regexlib.com/REDetails.aspx?regexp_id=495&AspxAutoDetectCookieSupport=1
ensign
11-08-2011, 10:55 PM
The fact that the RegEx pattern begins 020[7,8] is ample proof that the pattern is totally inadequate.
It fails to match London numbers in the (020) 3 range as well as numbers in the 023, 024, 028 and 029 area codes.
Refer to the list at the bottom of page 1 of this thread for the full list of UK formats.
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.