...

View Full Version : I can not seem to extract this domain from this url



jeddi
03-23-2010, 12:59 PM
Hello,

I am trying to use parse_url() to extract the domain from
the url.

This is the function I am using:



function GetDomain($url) {
$nowww = trim(ereg_replace('www\.','',$url));
$domain = parse_url($nowww);
if(!empty($domain["host"])) {
return $domain["host"];
}
else {
return "No DOM"; // $domain["path"];
}
}

$m_url = trim($sal_page_lin);
$m_url = ereg_replace('http:\/\/','',$m_url);
$m_dom = GetDomain($m_url);

write_log("$cnt ) INSERTED to c2s_urls: Id: $prod->id, URL: $m_url DOM: $m_dom\r\n");



The output from my log gives:


1 ) INSERTED to c2s_urls: Id: 8988, URL: www.700inaday.com DOM: No DOM
2 ) INSERTED to c2s_urls: Id: 9223, URL: www.selfemployedcourier.com/forAffiliates.html DOM: No DOM
3 ) INSERTED to c2s_urls: Id: 9698, URL: www.howtomarketexperts.com DOM: No DOM

So for some reason my GetDomain() function isn't working.

Can anyone see what I have done wrong ?


Thanks.



.

angst
03-23-2010, 01:17 PM
what are you trying to get? just .com ?
using host will only return www. but your already removing that.

jeddi
03-23-2010, 01:52 PM
Yer,

I an trying to get the root domain.

i.e. in the examples given I would expect this output:

1 ) INSERTED to c2s_urls: Id: 8988, URL: www.700inaday.com DOM: 700inaday.com
2 ) INSERTED to c2s_urls: Id: 9223, URL: www.selfemployedcourier.com/forAffiliates.html DOM: selfemployedcourier.com
3 ) INSERTED to c2s_urls: Id: 9698, URL: www.howtomarketexperts.com DOM: howtomarketexperts.com


And from something like this:

justin.howard.stars.net/profiles/march23.php?id=ght466&help=yes

I would expect:
stars.net

So is there an attribute I can use to get the root domain ?


Thanks.


.

angst
03-23-2010, 02:02 PM
try this;



function GetDomain($url) {
$nowww = trim(ereg_replace('www.','',$url));
$domain = parse_url($nowww);
if(!empty($domain["path"])) {
return $domain["path"];
}
else {
return "No DOM"; // $domain["path"];
}
}

MattF
03-23-2010, 02:19 PM
$nowww = trim(ereg_replace('www.','',$url));


Ereg functions are deprecated. A simple str_replace would work.



$nowww = trim(str_replace('www.', '', $url));

Fou-Lu
03-23-2010, 03:28 PM
Hmm, just looking at this one on the API:


Note: This function doesn't work with relative URLs.


If you do a var_dump on the $domain inside of the getDomain, my assumption is it will come up false? If so, correct this by removing the following line:


$m_url = ereg_replace('http:\/\/','',$m_url);


And changing the function as follows (though I suspect this will actually work still without the alterations since www doesn't factor into creating and absolute URL)


function GetDomain($url) {
$domain = parse_url($nowww);
if(!empty($domain["path"])) {
return trim(str_replace('www.', '', $domain["path"]));
}
else {
return "No DOM"; // $domain["path"];
}
}


Post back results.

jeddi
03-23-2010, 04:44 PM
Ran function as:


function GetDomain($N_url) {
$domain = parse_url($N_url);
if(!empty($domain["path"])) {
return trim(str_replace('www.', '', $domain["path"]));
}
else {
return "No DOM"; // $domain["path"];
}
}


$N_url = trim($sal_page_lin);
$m_dom = GetDomain($N_url);
$m_url = ereg_replace('http:\/\/','',$N_url);




Results:

1 ) Id: 8988, URL: www.700inaday.com DOM: No DOM
2 ) Id: 9223, URL: www.selfemployedcourier.com/forAffiliates.html DOM: /forAffiliates.html
3 ) Id: 9698, URL: www.howtomarketexperts.com DOM: No DOM
4 ) Id: 9704, URL: www.4inonesystem.com/welcome.html DOM: /welcome.html
5 ) Id: 556, URL: www.lcd-monitor-repair.com DOM: lcd-monitor-repair.com

As we can see it is not working :(

PS .

I changed the position of this line:
$m_url = ereg_replace('http:\/\/','',$N_url);
because I want to to store the url without the http: as I add it
back in after I read it from the database in another script.

Maybe I should change that to :
$m_url = trim(str_replace('http://'', '', $url));
would that work without escaping the "/" ?

Thanks for advice.



.

angst
03-23-2010, 04:50 PM
that function that I posted worked, I've tested it.

Fou-Lu
03-23-2010, 04:56 PM
Hah, sorry you're using the wrong field. Path is for anything after the / and before the ?. So, what you want is 'host', not path. Try making those changes to see if thats what you want.


I'm a little concerned about those results for number 5 though, something doesn't jive there....

jeddi
03-23-2010, 05:42 PM
Hi,

Thanks for the input.

This is the function I used:


function GetDomain($N_url) {
$domain = parse_url($N_url);
if(!empty($domain["host"])) {
return trim(str_replace('www.', '', $domain["host"]));
}
else {
return "No DOM"; // $domain["path"];
}
}

$N_url = trim($sal_page_lin);
$m_dom = GetDomain($N_url);

write_log("$cnt ) Id: $prod->id, URL: $N_url DOM: $m_dom\r\n");


More of them are getting a DOM but some are not
and I can not see why.

Results:

1 ) Id: 8988, URL: http://www.700inaday.com DOM: 700inaday.com
2 ) Id: 9223, URL: http://www.selfemployedcourier.com/forAffiliates.html DOM: selfemployedcourier.com
3 ) Id: 9698, URL: http://www.howtomarketexperts.com DOM: howtomarketexperts.com
4 ) 9704, URL: http://www.4inonesystem.com/welcome.html DOM: 4inonesystem.com
5 ) Id: 556, URL: www.lcd-monitor-repair.com DOM: No DOM
11 ) Id: 3504, URL: www.timelesshealth.net/kefir/grains.html DOM: No DOM
12 ) Id: 441, URL: www.crazy-tattoo-designs.com/insane_tattoo_collection_pay_now.htm DOM: No DOM
18 ) Id: 2647, URL: www.timelesshealth.net/ebook DOM: No DOM

Ah har.

So it seems that some of my usrls do not have the "http://" and these
are the ones that are failing.

So I guess I need to add an if statement ?

if( substr($N_url,0,7) != 'http://' ) $N_url = 'http://'.$N_url;

Should that do the trick ?


.

Fou-Lu
03-23-2010, 06:58 PM
Interestingness. Seems like a bug if you ask me, but I found this in the user notes:


nirazuelos at gmail dot com
09-Oct-2009 09:45
Hello, for some odd reason, parse_url returns the host (ex. example.com) as the path when no scheme is provided in the input url. So I've written a quick function to get the real host:


<?php
function getHost($Address) {
$parseUrl = parse_url(trim($Address));
return trim($parseUrl[host] ? $parseUrl[host] : array_shift(explode('/', $parseUrl[path], 2)));
}

getHost("example.com"); // Gives example.com
getHost("http://example.com"); // Gives example.com
getHost("www.example.com"); // Gives www.example.com
getHost("http://example.com/xyz"); // Gives example.com
?>

You could try anything! It gives the host (including the subdomain if exists).

Hope it helped you.


Seems like that matches you're issue!


Also, you'll want to add the apostrophies within the index access. Can't believe that someone made a user-note that wasn't checked for errors O.o

jeddi
03-23-2010, 07:52 PM
Also, you'll want to add the apostrophies within the index access.

Not quite sure what you mean by that ???


.

Fou-Lu
03-23-2010, 07:57 PM
Not quite sure what you mean by that ???


.

$parseUrl[host] should be $parseUrl['host'] or $parseUrl["host"]. Otherwise, it will trigger a notice when attempting to access the constant host prior to casting it to the string 'host'. Doesn't stop processing, but will be problematic should 'host' ever become a defined constant.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum