PDA

View Full Version : Question about SimpleXML


Jojorax
04-09-2004, 01:35 PM
Using the SimpleXML functions with PHP5 RC1, I've found a strange (for me) behaviour; I'm not sure what's the origin, if it's PHP's fault or if it's intrinsic to XML(??).
The problem is following: if the xml file contains characters like àèìòù (typical in languages like italian), the simplexml_load_file functions fails and returns NULL.
An emergency-solution to avoid this is to manually replace the characters:

$file = file( $path );
$str = implode( '', $file );
$str = str_replace( 'à', 'a', $str );
$xml = simplexml_load_string( $str );

but it's absolutely ugly, and by this way I can't restore the characters after the file is loaded.

Anyone has solutions/suggestions? I really need them (and probably, with me, all the italian developers)

Thanks

firepages
04-09-2004, 02:45 PM
I think its XML itself , perhaps more informed answer to be found in the XML forum , for ALL XML (PHP parsed or otherwise) I use the appropriate entity though I assume there must be another way around this

e.g à = à

<edit>moving to XML for a better answer</edit>

bcarl314
04-09-2004, 02:45 PM
Perhaps changin the character set to UTF-8 will solve the problem???

Jojorax
04-09-2004, 03:08 PM
UTF-8 encoding don't seems to change things.
Using explicit representation à results in no opening failure even if a wrong character is used. Maybe an appropriate character encodin can help, but my problem is that i have no control on what is inside the file: I must import a quite big (>400K) externally provided xml file into my db.
Is there a way to say "take whatever in this tag as is!"?

Alex Vincent
04-10-2004, 12:07 AM
Hm, I'm not familiar with PHP's handling of XML (thanks, firepages :rolleyes: ), but I'm inclined to ask for your XML source code which demonstrates the problem. Honestly, there may not be an easy solution to this. I've never heard of SimpleXML.

firepages
04-10-2004, 02:24 AM
<?xml version='1.0'?>
<suspect>
<no_problemo>El ActÓr</no_problemo>
<problemo>El ActÓr</problemo>
</suspect>


Ignoring PHP for a second , the above causes an error in IE & whilst moz shows the tree but the <problemo /> is printed as '?' , the entity itself works fine.

So how would this normally be handled ? entites or doctype or ?

.....................................................

Alex , take a looksee at http://www.php.net/simplexml , you will like , PHP dudes will like for sure as it takes any valid XML and turns it into a PHP object with access to the DOM and xpath queries.

Its a league ahead of the existing PHP XML parser functions , you need PHP5 to play with it though.

It finally makes the use of say XML config/data files etc realistic in PHP as simpleXML carries far less overhead (and typing ;) ) than the existing XML parser methodolgy in PHP ... I shall still use serialized objects for pure PHP stuff , but anything that has to be human readable .... ~