View Full Version : What is best way to turn local link into full url?

10-24-2009, 01:32 PM
I am using curl and DOMDocument
to extract the links from my website.

This is my script:


$target_url = "http://www.support-focus.com/customer-service-software.html";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

echo "<br>Starting<br>Target_url: $target_url";

// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$page= curl_exec($ch);
if (!$page) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);

// parse the html into a DOMDocument
$doc = new DOMDocument();

//echo $doc->saveHTML();

$params = $doc->getElementsByTagName('a'); // Find the a hrefs
foreach ($params as $param) //go to each section 1 by 1
echo "Section Attribute :-> ".$params->item($k)->getAttribute('href')."<br>"; //get a



As you can see the target page is this one:

customer service software (http://www.support-focus.com/customer-service-software.html)

and the output is:

Target_url: http://www.support-focus.com/customer-service-software.html

Section Attribute :-> index.php
Section Attribute :-> works.php
Section Attribute :-> pricing.php
Section Attribute :-> special.php
Section Attribute :-> contact.php
Section Attribute :-> login.php
Section Attribute :-> Customer-Service-Software.php
Section Attribute :-> articles.php
Section Attribute :-> Why-Get-An-Internet-Security-Seal.php
Section Attribute :-> The-Fantastic-Return-on-Investment-from-Trust-Seals.php
Section Attribute :-> Turn-Browsers-Into-Buyers-Increase-Your-Sales-Conversion.php
Section Attribute :-> Selecting-The-Best-Trust-Seal-To-Boost-Your-Sales-Conversions.php
Section Attribute :-> Give-Great-Customer-Service-And-Get-A-Trust-Seal-to-Prove-It.php
Section Attribute :-> Customer-Service-Software-Solutions-For-Online-Business.php
Section Attribute :-> 73-Per-Cent-Of-Buyers-Abort-Their-Purchases-How-To-Change-It.php
Section Attribute :-> Why-Are-Your-Visitors-Not-Buying-Your-Products.php
Section Attribute :-> http://www.support-focus.com/index.php
Section Attribute :-> http://www.support-focus.com/special.php
Section Attribute :-> terms.php
Section Attribute :-> privacy.php
Section Attribute :-> earnings_disclaimer.php
Section Attribute :-> articles.php

Works quite well, but some of the links are local and some are full urls.

Given the code I am already using, what is the best way to get
all these links shown as complete urls.

Is there a DOMDoc method to do this ?

Also I want to get out and store the website address
i.e. just the "www.support-focus.com" part.

10-24-2009, 05:37 PM
All your code is doing is pulling hrefs. Since those links are not coded as full URLs, you'll need to add that yourself. Perhaps try something like a preg_match that looks for http://. If it's not there, concatenate the domain with the href result.

10-24-2009, 08:21 PM
I realize that I can do it with a preg_match.

It could also be done with strpos and substr - but it would be a bit messy.

But I just thought that if there is something in the DOM class that can do the job then it may be quicker and more efficient.

So in this case, is the most efficient method to use a preg_match ?