...

View Full Version : Need help with REGEX to capture data in all html table cells



V@no
10-09-2009, 03:38 PM
Hello!

I've been struggling for sometime now trying figure out how to capture data between ALL <td></td> tags and it must be sorted by <tr></tr> (meaning I need to know in which <tr> the <td> is located)
As an working example I'm using this code:

<?php
function highlight($array)
{
foreach($array as $key => $val)
if (is_array($val))
$array[$key] = highlight($val);
else
$array[$key] = '<span style="color:red;font-weight:normal;">'.(preg_match("#^[\n\r]#", $val) ? "" : "\n").htmlspecialchars($val).'</span>';
return $array;
}
$text = '
<table>
<tr>
<td>tb1-tr1-td1</td>
<td>tb1-tr1-td2</td>
<td>
<span>
tb1-tr1-td3
<span>
</td>
<td>
tb1-tr1-td4
</td>
</tr>
<tr>
<td>tb1-tr2-td1</td>
<td><span>tb1-tr2-td2</span></td>
<td>
tb1-tr2-td3
</td>
</tr>
</table>
<table>
<tr>
<td>tb2-tr1-td1</td>
<td><span>tb2-tr1-td2</span></td>
<td>
tb2-tr1-td3
</td>
</tr>
</table>
';

preg_match_all('#<table>(\s*<tr>(\s*<td>(.*)</td>\s*)+</tr>\s*)+</table>#sU', $text, $result);

echo "<pre><b>";
print_r(highlight($result));
echo "</b></pre>";
?>

It returns data from the last <tr> and last <td></td> only of each <table>...what am I doing wrong?


Thank you.

MattF
10-09-2009, 05:23 PM
Untested.



preg_match_all('#<td>(.+?)</td>#', $text, $result);

V@no
10-10-2009, 02:27 AM
Thanks for the reply, I missed one important thing in my explanation...besides the data in <td></td> I also need to know in which <tr></tr> that data located...(added to my post)
I can do it in 3 steps, with 3 different regex, but there is must be a way do it in one step with one regex...

MattF
10-10-2009, 06:16 PM
Your description for each row is within <td></td> tags, (working from your code above), so is already captured into that array with that regex. Process the array data in a manner necessary to provide that info.

V@no
10-10-2009, 06:56 PM
um....it captures only one <td> per table...

MattF
10-11-2009, 03:14 AM
um....it captures only one <td> per table...

Nope, it captures them all. Whether you are processing them correctly is a different thing. (I've removed the HTML formatting and just put a print_r to display the complete array so that you can see what info you *actually* capture, rather than what you're displaying).



<?php

$text = '
<table>
<tr>
<td>tb1-tr1-td1</td>
<td>tb1-tr1-td2</td>
<td>
<span>
tb1-tr1-td3
<span>
</td>
<td>
tb1-tr1-td4
</td>
</tr>
<tr>
<td>tb1-tr2-td1</td>
<td><span>tb1-tr2-td2</span></td>
<td>
tb1-tr2-td3
</td>
</tr>
</table>
<table>
<tr>
<td>tb2-tr1-td1</td>
<td><span>tb2-tr1-td2</span></td>
<td>
tb2-tr1-td3
</td>
</tr>
</table>
';

$text = preg_replace('#\r\n|\n#', '', $text);
preg_match_all('#<td>(.+?)</td>#', $text, $result);

print_r($result);

?>



Edit: If you are spreading one lot of tabular data across multiple lines, you also need to remove newlines and such from the $text string for the regex to capture those. That's done by the preg_replace line I've added above.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum