PDA

View Full Version : Fetching the contents of a div tag by id


cdcool1
05-27-2008, 02:59 PM
I've got a variable containing an xhtml page and i'm looking to retrieve the contents of a specific div tag by it's id.

My webserver is running php4.

I've tried two approaches so far. The first was via reg expressions:

preg_match('/<div id="body">(.*?)<\/div>/i', $archivePage, $matches);

But this never found any matches even though there is definately a div tag with id="body". I suspected that it wasn't using the whole page string because of " and ' in the page so i tried addslashes($archivePage) but to no avail.


My second approach is via XML but i've not really moved anywhere here after following this example found here http://www.w3schools.com/PHP/php_xml_dom.asp

<?php
$xmlDoc = new DOMDocument();
$xmlDoc->load("note.xml");

print $xmlDoc->saveXML();
?>

it returned: Warning: domdocument() expects at least 1 parameter, 0 given



Am I heading in the right direction with either of these ways? I'm really new to php so still finding my feet a bit. I think the reg exps is close but just can't get any content.

Thanks

Inigoesdr
05-27-2008, 03:01 PM
Either way should work, but the regex is probably easier.

GuessWho
05-27-2008, 03:02 PM
Read The Manual (http://www.php.net/manual/en/class.domdocument.php)

cdcool1
05-27-2008, 03:14 PM
Either way should work, but the regex is probably easier.

Yes thats my thoughts I just can't seem to get any content, any ideas?

Read The Manual (http://www.php.net/manual/en/class.domdocument.php)

Most useful :rolleyes:

_Aerospace_Eng_
05-27-2008, 03:34 PM
Most useful :rolleyes:

It actually was. The manual for DomDocument tells you that you need at least 1 parameter. This is the version number of the XML you want to use. Usually this is version 1.
http://www.php.net/manual/en/domdocument.construct.php

Also note domdocument works only in PHP5+.

cdcool1
05-27-2008, 03:39 PM
I had already tried every example on that page and all had come up with errors. It is the fact that i'm using php4. I figured that since it acknowledges the DOMDocument function that it would be available to use - not just throw up errors to everything.

Given that i'm on php4, should i be looking at the reg exps route or is there another way to use DOM in php4? Once i have the document object i'll be fine i think.

ALso, is there any obvious reason that the reg exp doesn't work?

_Aerospace_Eng_
05-27-2008, 05:20 PM
Is the page on your server or is it an external file? Something like this should work
<?php
$filename = 'test.html';
$raw_file = file_get_contents($filename);
$arr_remove = array("\r", "\n", "\t", "\s");
$archivePage = str_replace($arr_remove, '', $raw_file);
$content = preg_match('/<div[^>]*id="body">(.*?)<\\/div>/i',$archivePage,$matches);
echo $matches[0];
?>
test.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>

<body>
<div id="body">Testing blah blah blah</div>
<div id="content">Testing 2</div>
</body>
</html>

cdcool1
05-28-2008, 08:53 AM
Is the page on your server or is it an external file? Something like this should work
<?php
$filename = 'test.html';
$raw_file = file_get_contents($filename);
$arr_remove = array("\r", "\n", "\t", "\s");
$archivePage = str_replace($arr_remove, '', $raw_file);
$content = preg_match('/<div[^>]*id="body">(.*?)<\\/div>/i',$archivePage,$matches);
echo $matches[0];
?>

Thanks for your reply. I'd managed to solve it by adding the 's' modifier to match new line characters which seems to be a similar way (just a different route) to what you've suggested.

So for future reference, this worked:

preg_match('/<div id="body">(.*?) <\/div>/si', $archivePage, $matches);