Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 7 of 7
  1. #1
    New Coder
    Join Date
    Jan 2009
    Posts
    48
    Thanks
    28
    Thanked 0 Times in 0 Posts

    file get contents save file as xml document

    Hi,

    Is it possible to use file get contents and retreive a webpage and save as an xml document. I know its possible to save as a txt doc.

    If it can't be done by using file get contents and ideas on how?

    Thanks in advance

  • #2
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,979
    Thanks
    4
    Thanked 2,659 Times in 2,628 Posts
    The only difference between text files and xml files is the extension (in Windows anyway). I'm serious.
    Now, to be a valid XML document, you need it to contain the xml declaration (ie: <?xml version="1.0" encoding="utf-8"?> for example), and it has to be properly nested. This still doesn't make it 'correct', what would make it correct it to follow a special ruleset defined in a DTD or schema document, which may or may not be available for use (if unavailable, it is by default correct).
    This is why XHTML is becoming so popular. XHTML contains a DTD that when valid can be interpreted as an XML document instead of an html document. The only difference is that until recently, most people left out the <?xml...?> declaration in it - you'd need to check with a client coder to be certain, but I believe this had something to do with throwing IE into quirks mode.

    file_get_contents may be used, as long as the fopen wrappers are enabled for remote websites. I believe these are generally enabled. If they are not, you need to use something like curl or ftp to get this information.

    Does that help?
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 

  • #3
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by tau9 View Post
    Hi,

    Is it possible to use file get contents and retreive a webpage and save as an xml document. I know its possible to save as a txt doc.

    If it can't be done by using file get contents and ideas on how?

    Thanks in advance
    file_get_contents return a string so you can use DOMDocument::loadHTML
    http://www.php.net/manual/en/domdocument.loadhtml.php

    then you can save it as xml using DOMDocument::save to save as xml:
    http://www.php.net/manual/en/domdocument.save.php

    best regards

  • #4
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,979
    Thanks
    4
    Thanked 2,659 Times in 2,628 Posts
    Quote Originally Posted by oesxyl View Post
    file_get_contents return a string so you can use DOMDocument::loadHTML
    http://www.php.net/manual/en/domdocument.loadhtml.php

    then you can save it as xml using DOMDocument::save to save as xml:
    http://www.php.net/manual/en/domdocument.save.php

    best regards
    Hey that's a good idea. This way you don't have to worry about checking for the opening xml declaration, and adding it if it doesn't exist. Dom will take care of that for you!
    That assumes PHP5+ though, which by this point I hope most sites are using >.<
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 

  • #5
    New Coder
    Join Date
    Jan 2009
    Posts
    48
    Thanks
    28
    Thanked 0 Times in 0 Posts

    active links

    Thanks again

    I have managed to save a html document into a text file and then retrieved elements by using getelements by tag name.

    But how do I get valid <a href = links out of the text file.

    Is this possible or will I have to the file in another format?

    Thanks in advance

  • #6
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,979
    Thanks
    4
    Thanked 2,659 Times in 2,628 Posts
    You can use a getElementsByTagName method on the domdocument object. I'm assuming you went with the dom route.
    The dom is here:http://php.ca/manual/en/book.dom.php
    It will return a domnodelist, so you can use a foreach or for loop on it to retrieve the items. On each item, you can access the DOMAttr for the href attribute.
    You should also be able to use xpath to get what you're looking for. DOM is quite complex.
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 

  • #7
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    as Fou-Lu said, you can use also xpath:

    extract:
    PHP Code:
    $xpath = new DOMXPath($doc); // $doc is your DOMDocument instance
    $query "*//a/@href"// only href attributes from any a tag
    // you must check if $xpath is valid before use it, I omit this here!!!
    $urls $xpath->query($query); 
    use it( a example):
    PHP Code:
    if($urls){
      
    $len $urls->length;
      for(
    $i 0$i $len$i++){
         echo 
    $urls->item($i)->nodeValue;
      }

    best regards


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •