...

View Full Version : XPath Values



graham23s
03-29-2010, 01:14 AM
Hi Guys,

I'm having trouble extracting certain pieces of data from a webpage using dom.

html:



<div align="center" id="content"><link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}
</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >How to produce methane gas from manure <font color="#567faf"> $35.00</font></td>
</tr><tr align="left"><td width="26">&nbsp;</td>
<td class="subtitle_s"><em><font color="#333333"><span>Produce methane gas at home for cooking, heating, and making electricity.</></font></em></td>
</tr><tr align="left"><td>&nbsp;</td>
<td><a href=a.page.php?id=41008&u=sspence >Promote</a> | <a href=http://site.com/r/41008/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.81</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>


<link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}
</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >Profitable Recipes e-Book Package <font color="#567faf"> $7.00</font></td>
</tr><tr align="left"><td width="26">&nbsp;</td>
<td class="subtitle_s"><em><font color="#333333"><span class="subtitle_s" onmouseover="DivSetVisible(true,'description2', 500);" onmouseout="DivSetVisible(false, 'description2', 500);"> Instantly OWN Master Resale Rights To The Hottest 100% Profitable Cooking E-books On The Web! Every item in this monster collection comes complete with individual sales pages. Am...</span><div id='description2' style='position:absolute; width:500px; padding:4px; display:none; z-index:100; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:13px;font-weight:normal;' class="cattable"> Instantly OWN Master Resale Rights To The Hottest 100% Profitable Cooking E-books On The Web! Every item in this monster collection comes complete with individual sales pages. Amazing Collection of Fast Selling cooking e-books That People Will Be Literally Throwing Money At You To Buy From Your Web Site.</div></font></em></td>
</tr><tr align="left"><td>&nbsp;</td>
<td><a href=a.page.php?id=80024&u=revenue >Promote</a> | <a href=http://site.com/r/80024/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.59</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>
<link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}


</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >Cookin' Kids <font color="#567faf"> $17.00</font></td>
</tr><tr align="left"><td width="26">&nbsp;</td>
<td class="subtitle_s"><em><font color="#333333"><span class="subtitle_s" onmouseover="DivSetVisible(true,'description3', 500);" onmouseout="DivSetVisible(false, 'description3', 500);">Cookin' Kids ebook is for kids who like to cook! Very original and unique ebook with themes, recipes, fun facts, games, jokes, cooking definitions, safety info, and more. It also ...</span><div id='description3' style='position:absolute; width:500px; padding:4px; display:none; z-index:100; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:13px;font-weight:normal;' class="cattable">Cookin' Kids ebook is for kids who like to cook! Very original and unique ebook with themes, recipes, fun facts, games, jokes, cooking definitions, safety info, and more. It also makes a great present for your favorite kid!</div></font></em></td>
</tr><tr align="left"><td>&nbsp;</td>
<td><a href=a.page.php?id=77957&u=Margret >Promote</a> | <a href=http://site.com/r/77957/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.15</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>
<link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}


</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >Guide to Organic Cooking! - The Healthy Way of Living! - eBook only <font color="#567faf"> $19.97</font></td>
</tr><tr align="left"><td width="26">&nbsp;</td>
<td class="subtitle_s"><em><font color="#333333"><span class="subtitle_s" onmouseover="DivSetVisible(true,'description4', 500);" onmouseout="DivSetVisible(false, 'description4', 500);">Pays 70% - Health, Hobby and Fitness Guide about Organic Cooking including Shopping and Gardening Tips and Recipes. If you want to cook and eat healthier and do your part to protec...</span><div id='description4' style='position:absolute; width:500px; padding:4px; display:none; z-index:100; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:13px;font-weight:normal;' class="cattable">Pays 70% - Health, Hobby and Fitness Guide about Organic Cooking including Shopping and Gardening Tips and Recipes. If you want to cook and eat healthier and do your part to protect your family and help the environment... or you are interested in growing your own organic foods in your garden... then this eBook was written just for you. High quality 98 page PDF eBook for immediate download.</div></font></em></td>
</tr><tr align="left"><td>&nbsp;</td>
<td><a href=a.page.php?id=57416&u=dts >Promote</a> | <a href=http://site.com/r/57416/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.04</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>
</div>



So far i have:



// parse the html into a DOMDocument
$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='acat']");
//$results = $xpath->getElementsByTagName('a');
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table/tbody/tr/td
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table[2]/tbody/tr/td
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table[3]/tbody/tr/td
//$results = $xpath->query("/html/body/div/center/table/tr/td/table/tr/td/div[@id='content']/table/tr/td");
foreach ($results as $result) {

// Title
$title = $result->nodeValue;

print $title;
print "<br /><br />";

}



If i change: $results = $xpath->query("//*[@class='acat']"); to $results = $xpath->query("//*[@class='subtitle_s']");

The first one returns the title (which is correct), if i replace it with the second query it returns the description (also correct)

i can't seem to retrieve both at the same time.

any help would be appreciated

thanks guys

Graham

Dormilich
03-29-2010, 12:41 PM
that’s right, you can’t. simply because the XPath query returns a list containing all matches (and you process the list item-wise).

graham23s
03-29-2010, 10:44 PM
Hi Dorm,

ah because i specify exactly to look for: $xpath->query("//*[@class='acat']"); that returns me only those node values, so i would need to go back up the tree a bit? i have tried searching for the exact xpath using firebug but reading up on firebug a a lot of people say it gives you the wrong path and adds a tbody tag, is there a fairly accurate way to find the correct xpaths?

cheers mate

Graham



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum