graham23s
03-29-2010, 01:14 AM
Hi Guys,
I'm having trouble extracting certain pieces of data from a webpage using dom.
html:
<div align="center" id="content"><link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}
</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >How to produce methane gas from manure <font color="#567faf"> $35.00</font></td>
</tr><tr align="left"><td width="26"> </td>
<td class="subtitle_s"><em><font color="#333333"><span>Produce methane gas at home for cooking, heating, and making electricity.</></font></em></td>
</tr><tr align="left"><td> </td>
<td><a href=a.page.php?id=41008&u=sspence >Promote</a> | <a href=http://site.com/r/41008/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.81</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>
<link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}
</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >Profitable Recipes e-Book Package <font color="#567faf"> $7.00</font></td>
</tr><tr align="left"><td width="26"> </td>
<td class="subtitle_s"><em><font color="#333333"><span class="subtitle_s" onmouseover="DivSetVisible(true,'description2', 500);" onmouseout="DivSetVisible(false, 'description2', 500);"> Instantly OWN Master Resale Rights To The Hottest 100% Profitable Cooking E-books On The Web! Every item in this monster collection comes complete with individual sales pages. Am...</span><div id='description2' style='position:absolute; width:500px; padding:4px; display:none; z-index:100; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:13px;font-weight:normal;' class="cattable"> Instantly OWN Master Resale Rights To The Hottest 100% Profitable Cooking E-books On The Web! Every item in this monster collection comes complete with individual sales pages. Amazing Collection of Fast Selling cooking e-books That People Will Be Literally Throwing Money At You To Buy From Your Web Site.</div></font></em></td>
</tr><tr align="left"><td> </td>
<td><a href=a.page.php?id=80024&u=revenue >Promote</a> | <a href=http://site.com/r/80024/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.59</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>
<link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}
</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >Cookin' Kids <font color="#567faf"> $17.00</font></td>
</tr><tr align="left"><td width="26"> </td>
<td class="subtitle_s"><em><font color="#333333"><span class="subtitle_s" onmouseover="DivSetVisible(true,'description3', 500);" onmouseout="DivSetVisible(false, 'description3', 500);">Cookin' Kids ebook is for kids who like to cook! Very original and unique ebook with themes, recipes, fun facts, games, jokes, cooking definitions, safety info, and more. It also ...</span><div id='description3' style='position:absolute; width:500px; padding:4px; display:none; z-index:100; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:13px;font-weight:normal;' class="cattable">Cookin' Kids ebook is for kids who like to cook! Very original and unique ebook with themes, recipes, fun facts, games, jokes, cooking definitions, safety info, and more. It also makes a great present for your favorite kid!</div></font></em></td>
</tr><tr align="left"><td> </td>
<td><a href=a.page.php?id=77957&u=Margret >Promote</a> | <a href=http://site.com/r/77957/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.15</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>
<link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}
</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >Guide to Organic Cooking! - The Healthy Way of Living! - eBook only <font color="#567faf"> $19.97</font></td>
</tr><tr align="left"><td width="26"> </td>
<td class="subtitle_s"><em><font color="#333333"><span class="subtitle_s" onmouseover="DivSetVisible(true,'description4', 500);" onmouseout="DivSetVisible(false, 'description4', 500);">Pays 70% - Health, Hobby and Fitness Guide about Organic Cooking including Shopping and Gardening Tips and Recipes. If you want to cook and eat healthier and do your part to protec...</span><div id='description4' style='position:absolute; width:500px; padding:4px; display:none; z-index:100; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:13px;font-weight:normal;' class="cattable">Pays 70% - Health, Hobby and Fitness Guide about Organic Cooking including Shopping and Gardening Tips and Recipes. If you want to cook and eat healthier and do your part to protect your family and help the environment... or you are interested in growing your own organic foods in your garden... then this eBook was written just for you. High quality 98 page PDF eBook for immediate download.</div></font></em></td>
</tr><tr align="left"><td> </td>
<td><a href=a.page.php?id=57416&u=dts >Promote</a> | <a href=http://site.com/r/57416/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.04</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>
</div>
So far i have:
// parse the html into a DOMDocument
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='acat']");
//$results = $xpath->getElementsByTagName('a');
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table/tbody/tr/td
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table[2]/tbody/tr/td
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table[3]/tbody/tr/td
//$results = $xpath->query("/html/body/div/center/table/tr/td/table/tr/td/div[@id='content']/table/tr/td");
foreach ($results as $result) {
// Title
$title = $result->nodeValue;
print $title;
print "<br /><br />";
}
If i change: $results = $xpath->query("//*[@class='acat']"); to $results = $xpath->query("//*[@class='subtitle_s']");
The first one returns the title (which is correct), if i replace it with the second query it returns the description (also correct)
i can't seem to retrieve both at the same time.
any help would be appreciated
thanks guys
Graham
I'm having trouble extracting certain pieces of data from a webpage using dom.
html:
<div align="center" id="content"><link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}
</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >How to produce methane gas from manure <font color="#567faf"> $35.00</font></td>
</tr><tr align="left"><td width="26"> </td>
<td class="subtitle_s"><em><font color="#333333"><span>Produce methane gas at home for cooking, heating, and making electricity.</></font></em></td>
</tr><tr align="left"><td> </td>
<td><a href=a.page.php?id=41008&u=sspence >Promote</a> | <a href=http://site.com/r/41008/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.81</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>
<link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}
</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >Profitable Recipes e-Book Package <font color="#567faf"> $7.00</font></td>
</tr><tr align="left"><td width="26"> </td>
<td class="subtitle_s"><em><font color="#333333"><span class="subtitle_s" onmouseover="DivSetVisible(true,'description2', 500);" onmouseout="DivSetVisible(false, 'description2', 500);"> Instantly OWN Master Resale Rights To The Hottest 100% Profitable Cooking E-books On The Web! Every item in this monster collection comes complete with individual sales pages. Am...</span><div id='description2' style='position:absolute; width:500px; padding:4px; display:none; z-index:100; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:13px;font-weight:normal;' class="cattable"> Instantly OWN Master Resale Rights To The Hottest 100% Profitable Cooking E-books On The Web! Every item in this monster collection comes complete with individual sales pages. Amazing Collection of Fast Selling cooking e-books That People Will Be Literally Throwing Money At You To Buy From Your Web Site.</div></font></em></td>
</tr><tr align="left"><td> </td>
<td><a href=a.page.php?id=80024&u=revenue >Promote</a> | <a href=http://site.com/r/80024/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.59</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>
<link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}
</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >Cookin' Kids <font color="#567faf"> $17.00</font></td>
</tr><tr align="left"><td width="26"> </td>
<td class="subtitle_s"><em><font color="#333333"><span class="subtitle_s" onmouseover="DivSetVisible(true,'description3', 500);" onmouseout="DivSetVisible(false, 'description3', 500);">Cookin' Kids ebook is for kids who like to cook! Very original and unique ebook with themes, recipes, fun facts, games, jokes, cooking definitions, safety info, and more. It also ...</span><div id='description3' style='position:absolute; width:500px; padding:4px; display:none; z-index:100; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:13px;font-weight:normal;' class="cattable">Cookin' Kids ebook is for kids who like to cook! Very original and unique ebook with themes, recipes, fun facts, games, jokes, cooking definitions, safety info, and more. It also makes a great present for your favorite kid!</div></font></em></td>
</tr><tr align="left"><td> </td>
<td><a href=a.page.php?id=77957&u=Margret >Promote</a> | <a href=http://site.com/r/77957/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.15</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>
<link href="css/style.css" rel="stylesheet" type="text/css"><style type="text/css">
.style1hhh {color: #FF0000}
</style><acronym title="Affiliate Info: Pays % on Level1 Seller Accepts PAYPAL "><table width="95%" border="0" align="center" cellpadding="0" cellspacing="3"><tr align="left"><td colspan="2" class="acat" >Guide to Organic Cooking! - The Healthy Way of Living! - eBook only <font color="#567faf"> $19.97</font></td>
</tr><tr align="left"><td width="26"> </td>
<td class="subtitle_s"><em><font color="#333333"><span class="subtitle_s" onmouseover="DivSetVisible(true,'description4', 500);" onmouseout="DivSetVisible(false, 'description4', 500);">Pays 70% - Health, Hobby and Fitness Guide about Organic Cooking including Shopping and Gardening Tips and Recipes. If you want to cook and eat healthier and do your part to protec...</span><div id='description4' style='position:absolute; width:500px; padding:4px; display:none; z-index:100; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:13px;font-weight:normal;' class="cattable">Pays 70% - Health, Hobby and Fitness Guide about Organic Cooking including Shopping and Gardening Tips and Recipes. If you want to cook and eat healthier and do your part to protect your family and help the environment... or you are interested in growing your own organic foods in your garden... then this eBook was written just for you. High quality 98 page PDF eBook for immediate download.</div></font></em></td>
</tr><tr align="left"><td> </td>
<td><a href=a.page.php?id=57416&u=dts >Promote</a> | <a href=http://site.com/r/57416/XXXXX/ target=_blank>Visit site</a><acronym> | [ APS: <span onClick="window.open('aps.php','','width=500, height=300');" style="color:#0000FF;
text-decoration:underline; cursor:pointer;">0.04</span>* ]</acronym></td>
</tr><tr><td colspan="2"><hr size="1" noshade></td></tr></table></acronym>
</div>
So far i have:
// parse the html into a DOMDocument
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='acat']");
//$results = $xpath->getElementsByTagName('a');
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table/tbody/tr/td
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table[2]/tbody/tr/td
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table[3]/tbody/tr/td
//$results = $xpath->query("/html/body/div/center/table/tr/td/table/tr/td/div[@id='content']/table/tr/td");
foreach ($results as $result) {
// Title
$title = $result->nodeValue;
print $title;
print "<br /><br />";
}
If i change: $results = $xpath->query("//*[@class='acat']"); to $results = $xpath->query("//*[@class='subtitle_s']");
The first one returns the title (which is correct), if i replace it with the second query it returns the description (also correct)
i can't seem to retrieve both at the same time.
any help would be appreciated
thanks guys
Graham