PDA

View Full Version : preg_match_all Expression


maddoxnm
10-09-2002, 09:32 PM
Can someone please help me figure out howto get this in the correct formatting for preg_match_all. I have tryed various times but it grabs the first result, then just grabs everything below the result. Another time it grabbed all the results but stored everything in the first array slot, instead of multiple ones...

Here is the html I'm trying to grab:

<li xmlns=""><a href="http://www.searchability.com/" class="clsResultTitle">SearchAbility - Guides to Specialized Search Engines</a><table border="0" cellpadding="0" cellspacing="0" width="560"><tr><td>Find a list of guides to thousands of search engines covering hundreds of subjects. Browse by alphabetical listings and read the company profile.<br><span class="clsResultURL"><i>www.searchability.com</i></span><br><img src="/images/spacer.gif" height="8" width="1"><br></td></tr></table></li>

I want to grab the URL from the first A-Href, the Name of the Site from the first Href, and then the description. So if we can stop it at that first break after the description it would be fine.

mordred
10-09-2002, 11:12 PM
Seems like you forgot to include the code you toyed with, but anyway, here's a quick hack that shows one way to deal with the problem:


$regex = '/(?:\\<a href\\=")(.+?)(?:".+?>)(.+?)(?:<\\/a.+<td>)(.+?)(?:<br>.+)/';
preg_match_all($regex, $str, $matches, PREG_SET_ORDER);

if (count($matches) > 0) {
$results = array();

for ($i = 0; $i < count($matches); $i++) {
$results[] = array(
"url" => $matches[$i][1],
"site" => $matches[$i][2],
"description" => $matches[$i][3]
);
}
}

var_dump($results);


I tested this succesfully on this sample data:


$str = '<li xmlns=""><a href="http://www.searchability.com/ask" class="clsResultTitle">SearchAbility - Guides to Specialized Search Engines</a><table border="0" cellpadding="0" cellspacing="0" width="560"><tr><td>Find a list of guides to thousands of search engines covering hundreds of subjects. Browse by alphabetical listings and read the company profile.<br><span class="clsResultURL"><i>www.searchability.com</i></span><br><img src="/images/spacer.gif" height="8" width="1"><br></td></tr></table></li>

blabla

<li xmlns=""><a href="http://www.codinforums.com/" class="clsResultTitle">Coding coding</a><table border="0" cellpadding="0" cellspacing="0" width="560"><tr><td>Yadda! Yadda, yadda - yadda. Yadda? Yadda.<br><span class="clsResultURL"><i>www.searchability.com</i></span><br><img src="/images/spacer.gif" height="8" width="1"><br></td></tr></table></li>
';


Note: I did my best while submitting this code, but the forum software insists on putting spaces on random positions within the $regex string - I've got no idea why it does this, but all blanks and spaces that appear there should not be there! Remove before trying out the code.
If it still doesn't work for you, I'll repost the code in more unreadable fashion.