tagnu 08-26-2008, 01:18 PM Hi all,
One more regex help.
I'd like to retrieve the original url of sites from yahoo search results.
For. e.g:
1. www.example.com
2. subdomain.example1.com (subdomain)
If I go for this expression:
http://*[^/]*
I'll get all the http://uk.wrs.yahoo.com/ from both links.
But how to retrieve the highlighted sites.
http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/
SIG=11kp4e70q/EXP=1219835696/**http%3A//www.example.com/index.html
http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A5YS7HAx.;_ylu=X3oDMTEwaXAxMWJuBHNlYwNzcgRwb3MDNQRjb2xvA2luMl9pbnRsBHZ0aWQD/
SIG=11r0pjgq9/EXP=1219835696/**http%3A//subdomain.example1.com/index.html
Thank you
Philip M 08-26-2008, 01:57 PM This should move you forward:-
<script type = "text/javascript">
var a = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/" + "SIG=11kp4e70q/EXP=1219835696/**http%3A//www.example.com/index.html"
var b = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A5YS7HAx.;_ylu=X3oDMTEwaXAxMWJuBHNlYwNzcgRwb3MDNQRjb2xvA2luMl9pbnRsBHZ0aWQD/" + "SIG=11r0pjgq9/EXP=1219835696/**http%3A//subdomain.example1.com/index.html"
var x = a.match(/(http%3A.+)/);
x[0] = x[0].replace (/\%3A/,":")
alert (x[0]);
var y = b.match(/(http%3A.+)/);
y[0] = y[0].replace (/\%3A/,":")
alert (y[0]);
</script>
With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead.
abduraooft 08-26-2008, 02:51 PM With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. Lol, I thought the above code is for something else.
Cranford 08-26-2008, 02:54 PM <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Any Title</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<script type="text/javascript">
var nStr1 = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/SIG=11kp4e70q/EXP=1219835696/**http%3A//www.example.com/index.html";
var nStr2 = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A5YS7HAx.;_ylu=X3oDMTEwaXAxMWJuBHNlYwNzcgRwb3MDNQRjb2xvA2luMl9pbnRsBHZ0aWQD/SIG=11r0pjgq9/EXP=1219835696/**http%3A//subdomain.example1.com/index.html";
function getDomain(urlStr){
var nDomain = urlStr.substring(urlStr.lastIndexOf('//')+2,urlStr.lastIndexOf('/'));
return nDomain;
}
function init(){
alert(getDomain(nStr1));
alert(getDomain(nStr2));
}
onload = init;
</script>
</head>
<body>
</body>
</html>
Philip M 08-26-2008, 03:02 PM Another example showing that there are more ways than one of killing a cat.
tagnu 08-29-2008, 01:13 PM Thank you Philip, the code works fine.
But in my case, I get 'http://example.com' in certain cases apart from 'http%3A//example.com'.
I was really looking for an expression that would accommodate both cases.
tagnu 08-29-2008, 01:18 PM var nStr1 = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/SIG=11kp4e70q/EXP=1219835696/**http%3A//www.example.com/index.html";
var nStr2 = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A5YS7HAx.;_ylu=X3oDMTEwaXAxMWJuBHNlYwNzcgRwb3MDNQRjb2xvA2luMl9pbnRsBHZ0aWQD/SIG=11r0pjgq9/EXP=1219835696/**http%3A//subdomain.example1.com/index.html";
function getDomain(urlStr){
var nDomain = urlStr.substring(urlStr.lastIndexOf('//')+2,urlStr.lastIndexOf('/'));
return nDomain;
}
function init(){
alert(getDomain(nStr1));
alert(getDomain(nStr2));
}
onload = init;
</script>
Cranford, thank you for the effort, your snippet suits my need and I'm currently moving with it. But I'm really curious if there's a regex.
Adding 'http://' to nDomain; will make it easier to apply this as an attribute for any html element.
Failing to add 'http://', will add the current domain as prefix to the returned variable nDomain. In this case, you'll get the output as
'http://uk.wrs.yahoo.com/example.com'
function getDomain(urlStr){
var nDomain = urlStr.substring(urlStr.lastIndexOf('//')+2,urlStr.lastIndexOf('/'));
return 'http://' + nDomain;
}
ps: Learning regex using expresso (http://www.ultrapico.com/ExpressoDownload.htm), I'll update this post as soon I find a good regex.
Philip M 08-30-2008, 07:43 AM Thank you Philip, the code works fine.
But in my case, I get 'http://example.com' in certain cases apart from 'http%3A//example.com'.
I was really looking for an expression that would accommodate both cases.
<script type = "text/javascript">
var a = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/" + "SIG=11kp4e70q/EXP=1219835696/**http%3A//www.example.com/index.html"
var b = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A5YS7HAx.;_ylu=X3oDMTEwaXAxMWJuBHNlYwNzcgRwb3MDNQRjb2xvA2luMl9pbnRsBHZ0aWQD/" + "SIG=11r0pjgq9/EXP=1219835696/**http%3A//subdomain.example1.com/index.html"
a = a.replace(/\%3A/,":");
var x = a.match(/[^\b](http.+)/);
x[0] = x[0].replace (/./,"");
alert (x[0]);
b = b.replace(/\%3A/,":");
var y = b.match(/[^\b](http.+)/);
y[0] = y[0].replace (/./,"");
alert (y[0]);
</script>
Taking the liberty of modifying Cranford's solution:-
function getDomain(urlStr){
var nDomain = urlStr.substring(urlStr.lastIndexOf('**')+2,urlStr.lastIndexOf('/'));
return nDomain;
}
You can test your regular expressions at: http://www.claughton.clara.net/regextester.html
tagnu 08-30-2008, 04:19 PM Thank you!
Got the regex http(.){1,3}\/\/[^\/]*/g
Description:
http followed by
(.){1,4} any characters, min 1 or max 4 (to retrieve : and %3A and also to include https),
\/\/ and // (escaped so \/\/)
[^\/]* any character except / (escaped so \/)
/g return all occurrences of the match
var urlStr = "http://in.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/SIG=11kp4e70q/EXP=1219835696/**http%3A//www.thesdf.org/index.html"
var res = urlStr.match(/http(.){1,3}\/\/[^\/]*/g);
document.write("count:"+ res.length + "<br />");
for(i=0;i<res.length;i++)
document.write(res[i]+ "<br/>");
ps: don't forget the /g
With g flag returns an array containing the matches, without g flag returns just the first match or if no match is found returns null.
I'm learning!
Helpful resources: http://www.javascriptkit.com/javatutors/redev3.shtml
Philip M 08-30-2008, 04:30 PM Thank you!
Got the regex http(.){1,3}\/\/[^\/]*/g
Description:
http followed by
(.){1,3} any characters, min 1 or max 3 :, %3A,
\/\/ and // (escaped so \/\/)
[^\/]* any character except / (escaped so \/)
/g return all occurances
To be picky, that does not work for https://
So make it http(.){1,4}\/\/[^\/]*/g
tagnu 08-30-2008, 07:20 PM To be picky, that does not work for https://
So make it http(.){1,4}\/\/[^\/]*/g
That's true! thanks for pointing out.
So a better regex is
http(.){1,4}\/\/[^\/]*/g
Updated the previous post too.
|
|