View Full Version : Simplify code?

03-01-2009, 07:22 AM
This is my first attempt at scraping a webpage for specific data. I got the job done however my code is probably way off. Can anyone give me some pointers on how to improve it?

class guild_exp

var $ch; /// going to used to hold our cURL instance
var $html; /// used to hold resultant html data
var $binary; /// used for binary transfers
var $url; /// used to hold the url to be downloaded

function guild_exp()
$this->html = "";
$this->binary = 0;
$this->url = "";
function fetchPage($url)
$this->url = $url;
if (isset($this->url)) {

$this->ch = curl_init (); /// open a cURL instance

curl_setopt ($this->ch, CURLOPT_RETURNTRANSFER, 1); // tell cURL to return the data

curl_setopt ($this->ch, CURLOPT_URL, $this->url); /// set the URL to download

curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, false); /// Follow any redirects

curl_setopt($this->ch, CURLOPT_BINARYTRANSFER, $this->binary); /// tells cURL if the data is binary data or not

$this->html = curl_exec($this->ch); // pulls the webpage from the internet

curl_close ($this->ch); /// closes the connection

function parse_array($beg_tag, $close_tag)
preg_match_all("($beg_tag.*$close_tag)siU", $this->html, $matching_data);
return $matching_data[0];
$myspider = new guild_exp();
$exparray = $myspider->parse_array("<DIV", "</DIV>");

foreach ($exparray as $value) {

$experience = explode(':', $exparray[38]);
$divexp = explode('/', $experience[1]);
$finalexp = explode('<', $divexp[1]);
$currentexp = $divexp[0];
$neededexp = $finalexp[0];
echo "Needed Experience: $neededexp"?><br /><?;
echo "Current Experience: $currentexp"?><br /><?;
$total = ($neededexp/$currentexp);
echo "Percentage Complete: $total";


03-01-2009, 09:20 AM
Erm, this is a solution but there are some things I should mention.
a) It doesn't use cURL as I couldn't be bothered to include that.
b) It uses regular expressions, which many would say you should use a HTML parser
c) It is not in class form.


$html = file_get_contents('http://realmwar.warhammeronline.com/realmwar/GuildInfo.war?id=657&server=168');

preg_match('#<div class="guild-progress-desc">Guild Rank: (\d+)/(\d+)</div>#i', $html, $match);

$needed = $match[1];
$current = $match[2];
$percent = round($needed / $current);

echo "Needed Experience: {$needed}<br>\n";
echo "Current Experience: {$current}<br>\n";
echo "Percentage Complete: {$percent}<br>\n";

Oh as for pointers, erm again most people would probably say use HTML parsers versus regular expressions and explode / split functions. They're probably right, but often HTML mark-up is broken so I suggest using regular expressions.

03-01-2009, 07:05 PM
Certainly a lot simpler. Preg_match is something I am unfamiliar with as I am new to arrays and such.

Thanks for your help.

Is there a good place to look up code to add so the script only runs if X amount of time has expired since it last run?

03-02-2009, 02:07 AM
I have run into a problem. Using the following code I get an error.


$html = file_get_contents('http://realmwar.warhammeronline.com/realmwar/GuildInfo.war?id=657&server=168');

preg_match('#<div id="rank" class="rankbar-full" style="width:(\d+)24(\d+)%" onmousemove="hoverFollow(event,(this.id+'-hover'));" onmouseout="hoverFollow(event,(this.id+'-hover'));"></div>#i', $html, $match);

echo "$match[1]";

How do I get this done since I get an error with the -hover? At least thats what I think is causing the error.

03-02-2009, 02:49 AM

$html = file_get_contents('http://realmwar.warhammeronline.com/realmwar/GuildInfo.war?id=657&server=168');

preg_match('#<div id="rank" class="rankbar-rank"[^>]*>(\d+)</div>#i', $html, $match);

echo $match[1];

03-04-2009, 04:15 AM
Thanks again for your help. I'll definately have to find a good regex resource.