View Full Version : Can't parse PHP using DOM or the older xml parser functions!

02-18-2008, 05:25 PM
Hi all,

I am trying to read large PHP files into XML documents and having no luck!

Both the XML Parser (http://uk2.php.net/manual/en/ref.xml.php) functions and the DOM Functions (http://uk2.php.net/manual/en/ref.dom.php) throw up errors when the PI data contains illegal XML chars (e.g. "&"), or even quoted XML fragments (E.g. "<?xml") (I am aware of the limitation that "?>" tags cannot be quoted ... but the manual doesn't say that other tags can't be quoted!).

I thought the PI data handler in either case was supposed to be able to deal with this kind of thing!

I can get round this with a few hacks (i.e. manually chopping out and then re-inserting everything within <?php ?> tags), but is there a way to force the XML functionality to behave properly without using CDATA? (Which I shouldn't have to use anyway if I have a PI handler registered!).

02-19-2008, 01:41 AM
Can you post the XML you are using, and the PHP you are using to read it?

02-19-2008, 09:43 AM

Yep, there's two blocks of code that I've tried.

Firstly, here's the PHP to be read in by the PI handler:


$string = "Hello World!";
$ref =& $string;

echo ('<?xml version="1.0" encoding="utf-8" ?'.'>'.
' <mytag />'.

And here's the code:

DOM Built-in (filename domparser.php):

$doc= new DOMDocument();

Generates the following warning: Warning: DOMDocument::loadXML(): Start tag expected, '<' not found in Entity, line: 1 in domparser.php on line 2

The other way uses the XML functions and registers a PI handler with an instantiated xml parser resource. Here's the gist of it:

$parser = xml_parser_create();

xml_parser_set_option($xparser, XML_OPTION_CASE_FOLDING, FALSE);

$handstat = array();
$handstat[] = xml_set_object($xparser, $this);
$handstat[] = xml_set_element_handler($xparser, "tag_start", "tag_end");
$handstat[] = xml_set_character_data_handler($xparser, "tag_data");
$handstat[] = xml_set_default_handler($xparser, "tag_default");
$handstat[] = xml_set_processing_instruction_handler($xparser, "tag_pi");
$handstat[] = xml_set_external_entity_ref_handler ($xparser, "tag_entref");
$handstat[] = xml_set_notation_decl_handler($xparser, "tag_notdec");

foreach ($handstat as $retn)
if ($retn === FALSE)
throw new Exception("Handler registration failure: ".implode(":", $handstat));

$status = xml_parse($xparser, $xmlstr, TRUE);

if ($status === 0)
$errmsg = ("XML: error in file '".$this->m_uri."' at line ".xml_get_current_line_number($xparser).": ".xml_error_string(xml_get_error_code($xparser)));


Now this actually aborts completely (within xml_parse - i.e. the PI handler function is never called) with tobeparsed.php as it is, however, if I used '<'.'?xml ... in the echo statement, the file is parsed correctly. The DOM method still throws up a warning though.

I have got round this now with a couple of hacks, but it would be nice to get the internal calls to the PI handler(s) to work correctly!