...

View Full Version : file get contents save file as xml document



tau9
03-18-2009, 01:17 PM
Hi,

Is it possible to use file get contents and retreive a webpage and save as an xml document. I know its possible to save as a txt doc.

If it can't be done by using file get contents and ideas on how?

Thanks in advance

Fou-Lu
03-19-2009, 02:00 AM
The only difference between text files and xml files is the extension (in Windows anyway). I'm serious.
Now, to be a valid XML document, you need it to contain the xml declaration (ie: <?xml version="1.0" encoding="utf-8"?> for example), and it has to be properly nested. This still doesn't make it 'correct', what would make it correct it to follow a special ruleset defined in a DTD or schema document, which may or may not be available for use (if unavailable, it is by default correct).
This is why XHTML is becoming so popular. XHTML contains a DTD that when valid can be interpreted as an XML document instead of an html document. The only difference is that until recently, most people left out the <?xml...?> declaration in it - you'd need to check with a client coder to be certain, but I believe this had something to do with throwing IE into quirks mode.

file_get_contents may be used, as long as the fopen wrappers are enabled for remote websites. I believe these are generally enabled. If they are not, you need to use something like curl or ftp to get this information.

Does that help?

oesxyl
03-19-2009, 03:43 AM
Hi,

Is it possible to use file get contents and retreive a webpage and save as an xml document. I know its possible to save as a txt doc.

If it can't be done by using file get contents and ideas on how?

Thanks in advance
file_get_contents return a string so you can use DOMDocument::loadHTML
http://www.php.net/manual/en/domdocument.loadhtml.php

then you can save it as xml using DOMDocument::save to save as xml:
http://www.php.net/manual/en/domdocument.save.php

best regards

Fou-Lu
03-19-2009, 04:09 AM
file_get_contents return a string so you can use DOMDocument::loadHTML
http://www.php.net/manual/en/domdocument.loadhtml.php

then you can save it as xml using DOMDocument::save to save as xml:
http://www.php.net/manual/en/domdocument.save.php

best regards

Hey that's a good idea. This way you don't have to worry about checking for the opening xml declaration, and adding it if it doesn't exist. Dom will take care of that for you!
That assumes PHP5+ though, which by this point I hope most sites are using >.<

tau9
03-19-2009, 01:53 PM
Thanks again

I have managed to save a html document into a text file and then retrieved elements by using getelements by tag name.

But how do I get valid <a href = links out of the text file.

Is this possible or will I have to the file in another format?

Thanks in advance

Fou-Lu
03-19-2009, 09:18 PM
You can use a getElementsByTagName method on the domdocument object. I'm assuming you went with the dom route.
The dom is here:http://php.ca/manual/en/book.dom.php
It will return a domnodelist, so you can use a foreach or for loop on it to retrieve the items. On each item, you can access the DOMAttr for the href attribute.
You should also be able to use xpath to get what you're looking for. DOM is quite complex.

oesxyl
03-20-2009, 09:34 AM
as Fou-Lu said, you can use also xpath:

extract:


$xpath = new DOMXPath($doc); // $doc is your DOMDocument instance
$query = "*//a/@href"; // only href attributes from any a tag
// you must check if $xpath is valid before use it, I omit this here!!!
$urls = $xpath->query($query);


use it( a example):


if($urls){
$len = $urls->length;
for($i = 0; $i < $len; $i++){
echo $urls->item($i)->nodeValue;
}
}


best regards



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum