...

View Full Version : What is best way to turn local link into full url?



jeddi
10-24-2009, 01:32 PM
I am using curl and DOMDocument
to extract the links from my website.

This is my script:


require("my_functions.php");

$target_url = "http://www.support-focus.com/customer-service-software.html";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

echo "<br>Starting<br>Target_url: $target_url";

// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$page= curl_exec($ch);
if (!$page) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}

// parse the html into a DOMDocument
$doc = new DOMDocument();
$doc->loadHTML($page);

//echo $doc->saveHTML();

$params = $doc->getElementsByTagName('a'); // Find the a hrefs
$k=0;
foreach ($params as $param) //go to each section 1 by 1
{
echo "Section Attribute :-> ".$params->item($k)->getAttribute('href')."<br>"; //get a

$k++;

}
?>

As you can see the target page is this one:

customer service software (http://www.support-focus.com/customer-service-software.html)

and the output is:

Starting
Target_url: http://www.support-focus.com/customer-service-software.html

Section Attribute :-> index.php
Section Attribute :-> works.php
Section Attribute :-> pricing.php
Section Attribute :-> special.php
Section Attribute :-> contact.php
Section Attribute :-> login.php
Section Attribute :-> Customer-Service-Software.php
Section Attribute :-> articles.php
Section Attribute :-> Why-Get-An-Internet-Security-Seal.php
Section Attribute :-> The-Fantastic-Return-on-Investment-from-Trust-Seals.php
Section Attribute :-> Turn-Browsers-Into-Buyers-Increase-Your-Sales-Conversion.php
Section Attribute :-> Selecting-The-Best-Trust-Seal-To-Boost-Your-Sales-Conversions.php
Section Attribute :-> Give-Great-Customer-Service-And-Get-A-Trust-Seal-to-Prove-It.php
Section Attribute :-> Customer-Service-Software-Solutions-For-Online-Business.php
Section Attribute :-> 73-Per-Cent-Of-Buyers-Abort-Their-Purchases-How-To-Change-It.php
Section Attribute :-> Why-Are-Your-Visitors-Not-Buying-Your-Products.php
Section Attribute :-> http://www.support-focus.com/index.php
Section Attribute :-> http://www.support-focus.com/special.php
Section Attribute :-> terms.php
Section Attribute :-> privacy.php
Section Attribute :-> earnings_disclaimer.php
Section Attribute :-> articles.php

Works quite well, but some of the links are local and some are full urls.

Given the code I am already using, what is the best way to get
all these links shown as complete urls.

Is there a DOMDoc method to do this ?

Also I want to get out and store the website address
i.e. just the "www.support-focus.com" part.

tomws
10-24-2009, 05:37 PM
All your code is doing is pulling hrefs. Since those links are not coded as full URLs, you'll need to add that yourself. Perhaps try something like a preg_match that looks for http://. If it's not there, concatenate the domain with the href result.

jeddi
10-24-2009, 08:21 PM
Yes,
I realize that I can do it with a preg_match.

It could also be done with strpos and substr - but it would be a bit messy.

But I just thought that if there is something in the DOM class that can do the job then it may be quicker and more efficient.

So in this case, is the most efficient method to use a preg_match ?



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum