...

View Full Version : Can't parse PHP using DOM or the older xml parser functions!



mlse
02-18-2008, 06:25 PM
Hi all,

I am trying to read large PHP files into XML documents and having no luck!

Both the XML Parser (http://uk2.php.net/manual/en/ref.xml.php) functions and the DOM Functions (http://uk2.php.net/manual/en/ref.dom.php) throw up errors when the PI data contains illegal XML chars (e.g. "&"), or even quoted XML fragments (E.g. "<?xml") (I am aware of the limitation that "?>" tags cannot be quoted ... but the manual doesn't say that other tags can't be quoted!).

I thought the PI data handler in either case was supposed to be able to deal with this kind of thing!

I can get round this with a few hacks (i.e. manually chopping out and then re-inserting everything within <?php ?> tags), but is there a way to force the XML functionality to behave properly without using CDATA? (Which I shouldn't have to use anyway if I have a PI handler registered!).

Ultragames
02-19-2008, 02:41 AM
Can you post the XML you are using, and the PHP you are using to read it?

mlse
02-19-2008, 10:43 AM
Hi,

Yep, there's two blocks of code that I've tried.

Firstly, here's the PHP to be read in by the PI handler:



<?php

$string = "Hello World!";
$ref =& $string;

echo ('<?xml version="1.0" encoding="utf-8" ?'.'>'.
'<mydocument>'.
' <mytag />'.
'</mydocument>');
?>


And here's the code:

DOM Built-in (filename domparser.php):


$doc= new DOMDocument();
$doc->loadXML(file_get_contents("tobeparsed.php"));


Generates the following warning: Warning: DOMDocument::loadXML(): Start tag expected, '<' not found in Entity, line: 1 in domparser.php on line 2

The other way uses the XML functions and registers a PI handler with an instantiated xml parser resource. Here's the gist of it:



$parser = xml_parser_create();

xml_parser_set_option($xparser, XML_OPTION_CASE_FOLDING, FALSE);

$handstat = array();
$handstat[] = xml_set_object($xparser, $this);
$handstat[] = xml_set_element_handler($xparser, "tag_start", "tag_end");
$handstat[] = xml_set_character_data_handler($xparser, "tag_data");
$handstat[] = xml_set_default_handler($xparser, "tag_default");
$handstat[] = xml_set_processing_instruction_handler($xparser, "tag_pi");
$handstat[] = xml_set_external_entity_ref_handler ($xparser, "tag_entref");
$handstat[] = xml_set_notation_decl_handler($xparser, "tag_notdec");

foreach ($handstat as $retn)
{
if ($retn === FALSE)
{
xml_parser_free($xparser);
throw new Exception("Handler registration failure: ".implode(":", $handstat));
}
}

$status = xml_parse($xparser, $xmlstr, TRUE);

if ($status === 0)
$errmsg = ("XML: error in file '".$this->m_uri."' at line ".xml_get_current_line_number($xparser).": ".xml_error_string(xml_get_error_code($xparser)));

xml_parser_free($xparser);


Now this actually aborts completely (within xml_parse - i.e. the PI handler function is never called) with tobeparsed.php as it is, however, if I used '<'.'?xml ... in the echo statement, the file is parsed correctly. The DOM method still throws up a warning though.

I have got round this now with a couple of hacks, but it would be nice to get the internal calls to the PI handler(s) to work correctly!



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum