...

View Full Version : New to classes, how do I use this class ?



jeddi
10-22-2009, 12:59 PM
Hi,

I want to use this class to extract links from my site
but I am not sure how to use it.

I need to pass the url to the class.

This is the class



<?php
class Reader
{
var $buf;
var $ix;
var $list;

function Reader()
{
$this->list = array();
}

function grab($site)
{
$this->buf = file_get_contents($site);
$this->buf = strtolower($this->buf);
$len = strlen($this->buf);
$start = 0;

while( $start < $len )
{
$start = strpos($this->buf, "<a href="http:", $start );
if( $start == false )
break;

$start = $start + 1;
$end = strpos($this->buf, "</a>", $start );

$ln = $this->getSection($this->buf, $start, $end );
$fln = "<" . $ln;

array_push($this->list, $fln);

$start = $end+1;
}
}
function getSection( $buf, $start, $end )
{
if( $start > strlen($buf))
return false;

if( $end > strlen($buf))
return false;

for( $i=$start; $i<$end; $i++ )
{
$result .= $buf[$i];
}

return $result;
}
//get array contents
function results()
{
return $this->list;
}
}

?>

I assume that ı start off with initiating a new object:



<?php
require("Reader.class.php");

$reader = new Reader();
$url = "http://www.my-site.com/";



From here what do I do ?

Fou-Lu
10-22-2009, 01:18 PM
This line needs to be fixed first:


$start = strpos($this->buf, "<a href="http:", $start );

it should be:


$start = strpos($this->buf, "<a href=\"http:\", $start );


Than its used:


<?php
require("Reader.class.php");

$reader = new Reader();
$url = "http://www.my-site.com/";
$reader->grab($url);
print_r($reader->results());


$reader->results() will return an array. This is a PHP4 class and can probably be done easier with using pattern matching, but this should work.

jeddi
10-22-2009, 01:58 PM
Thanks for your reply.

I have been reading about cUrl and it seems that using
cUrl may be faster or "better" than using the file_get_contents($site)

So if I want to update this whole script then I guess that
I can start off with:


<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.my-site.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$page = curl_exec($ch);
curl_close($ch);

if (preg_match_all("/<a href=\"http:\"(.*?)\".*?>(.*?)<\/a>/i",$page,$matches) ) {
print_r($matches);
}
?>


I am not sure that I have done the pattern correctly.

Also is it OK to use the string output $page in this way?

Thanks for helping.


When I ran this script I got zero output :(

Fou-Lu
10-23-2009, 12:00 AM
Yes, that would be correct since you've used the return transfer option. Error check it too using:


if (false !== ($page = curl_exec($ch)))
{
// We got a result
}


As for you're pattern, that depends on what you're trying to capture and what you're matching against. This will take any external link (or internal if you've included the entire url in it). However, this requires that the href attribute be the first attribute, and will not handle ssl.
I'm not a super pattern matcher so it would take time for me to be satisfied with a pattern, but what you could match instead is just the href attribute and display where it is preceded by an <a> tag using lookbehind notations.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum