...

View Full Version : Parsing a webpage, regex issue



david56connor
05-31-2011, 03:15 AM
Hi, I have a webpage that I need to parse some information from, the information is in the form of a table with 4 different fields.

This is the HTML of the page:


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Account Listing</title>
<link rel="STYLESHEET" type="text/css" href="pop.css">
<link rel="STYLESHEET" type="text/css" href="account.css">
</head>

<body style="color:#FFFFFF;" bgcolor="#000000" bottommargin="0" leftmargin="0" rightmargin="0" topmargin="0">
<div style="font-family:Verdana,sans-serif;font-size:8pt;padding:5px;">
You have <b>1260.65</b> points<br>

<b>15/25</b> characters on this server
</div>
<table width="437" border="0" cellspacing="0" cellpadding="0">
<tr>
<td align="center"><font color="#666666" face="Verdana, Arial, Helvetica, sans-serif" size="1"><b>PP</b></font></td>
<td><font color="#666666" face="Verdana, Arial, Helvetica, sans-serif" size="1"><b>Name</b></font></td>
<td align="center"><font color="#666666" face="Verdana, Arial, Helvetica, sans-serif" size="1"><b>Level</b></font></td>

<td><font color="#666666" face="Verdana, Arial, Helvetica, sans-serif" size="1"><b>Crew</b></font></td>
<td></td>
</tr>
<tr>
<td align="center" style="background-color:#333333">
<img src="images/ppnostar.jpg" border="0">
</td>
<td style="background-color:#333333"><font color="#FFFF00" face="Verdana, Arial, Helvetica, sans-serif" size="1">

<b>Pimpa</b>
</font></td>
<td align="center" style="background-color:#333333"><font color="#FFFFFF" face="Verdana, Arial, Helvetica, sans-serif" size="1">
<b>75</b>
</font></td>
<td style="background-color:#333333"><font color="#999999" face="Verdana, Arial, Helvetica, sans-serif" size="1">
<b> •Ou†war Immor†als•</b>

</font></td>
<td style="background-color:#333333">
<a target="_top" href="http://sigil.***********/world.php?suid=2198627&serverid=1"><font color="#00FF00" face="Verdana, Arial, Helvetica, sans-serif" size="1"><b>PLAY!</b></font></a>
</td>
</tr>
<tr>
<td align="center" style="background-color:#000000">
<img src="images/ppnostar.jpg" border="0">

</td>
<td style="background-color:#000000"><font color="#FFFF00" face="Verdana, Arial, Helvetica, sans-serif" size="1">
<b>Eag1e</b>
</font></td>
<td align="center" style="background-color:#000000"><font color="#FFFFFF" face="Verdana, Arial, Helvetica, sans-serif" size="1">
<b>72</b>
</font></td>
<td style="background-color:#000000"><font color="#999999" face="Verdana, Arial, Helvetica, sans-serif" size="1">

<b>• Freedom Figh†ers •</b>
</font></td>
<td style="background-color:#000000">
<a target="_top" href="http://sigil.***********/world.php?suid=2236250&serverid=1"><font color="#00FF00" face="Verdana, Arial, Helvetica, sans-serif" size="1"><b>PLAY!</b></font></a>
</td>
</tr>
<tr>
<td align="center" style="background-color:#000000">

<img src="images/ppnostar.jpg" border="0">
</td>
<td style="background-color:#000000"><font color="#FFFF00" face="Verdana, Arial, Helvetica, sans-serif" size="1">
<b>K1NGBILLY</b>
</font></td>
<td align="center" style="background-color:#000000"><font color="#FFFFFF" face="Verdana, Arial, Helvetica, sans-serif" size="1">
<b>66</b>
</font></td>

<td style="background-color:#000000"><font color="#999999" face="Verdana, Arial, Helvetica, sans-serif" size="1">
<b>• Freedom Figh†ers •</b>
</font></td>
<td style="background-color:#000000">
<a target="_top" href="http://sigil.***********/world.php?suid=2462246&serverid=1"><font color="#00FF00" face="Verdana, Arial, Helvetica, sans-serif" size="1"><b>PLAY!</b></font></a>
</td>
</tr>


A friend made a Regex for this before and it still works within my VB.Net application however I have tried to convert it to PHP and have had no success.

Here is the working regex that was used in the VB app:

</tr>.*?<tr>.*?<td.*?>.*?<td.*?>.*?<b>(?'Name'.+?)</b>.*?<td.*?>.*?<b>(?'Level'\d+?)</b>.*?<td.*?>.*?<b>(?'Crew'.*?)</b>.*?<td.*?>.*?<a .*?href=.*?suid=(?'CharacterId'\d+?)&.*?</td>

I also don't have a great knowledge of regex and so can't figure out where to edit to make it work within PHP!

Any help converting this would be appreciated.

Thanks,
David.

okcmom
05-31-2011, 06:01 AM
I m not much of a programmer. But I may be able to help. Please send me the url you need to fetch the data from and i will see what I can do.

I cannot promise you anything, as I said I m not much of a programmer

david56connor
05-31-2011, 12:05 PM
The page is password protected, which is why I put a sample of the page up, I managed to understand everything what the regular expression was doing in VB but I don't know how to use the groups in PHP, that will be my task for the next few hours :)

gvre
05-31-2011, 02:10 PM
Try the following


$data = "Page data.....";
$pattern = "#</tr>.*?<tr>.*?<td.*?>.*?<td.*?>.*?<b>(?'Name'.+?)</b>.*?<td.*?>.*?<b>(?'Level'\d+?)</b>.*?<td.*?>.*?<b>(?'Crew'.*?)</b>.*?<td.*?>.*?<a .*?href=.*?suid=(?'CharacterId'\d+?)&.*?</td>#si";
preg_match($pattern, $data, $matches);
print_r($matches);

david56connor
05-31-2011, 02:51 PM
Try the following


$data = "Page data.....";
$pattern = "#</tr>.*?<tr>.*?<td.*?>.*?<td.*?>.*?<b>(?'Name'.+?)</b>.*?<td.*?>.*?<b>(?'Level'\d+?)</b>.*?<td.*?>.*?<b>(?'Crew'.*?)</b>.*?<td.*?>.*?<a .*?href=.*?suid=(?'CharacterId'\d+?)&.*?</td>#si";
preg_match($pattern, $data, $matches);
print_r($matches);

It works! Thanks very much!

Now to figure out how I will iterate over the results :)



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum