Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Page 1 of 2 12 LastLast
Results 1 to 15 of 17
  1. #1
    Regular Coder
    Join Date
    Feb 2005
    Posts
    663
    Thanks
    5
    Thanked 14 Times in 14 Posts

    How do you parse XML data?

    I'm trying to learn how to parse some xml for further use to possibly create some charts, and am pretty lost at how to proceed. The plan is to parse the xml into a new format to be used for charting.

    I've starting searching the boards, but still feel overwhelmed at this point.

    I can view the xml information by visiting the url directly and adding &xml to the url name. Example of Report 77 is.

    http://501.synsport.com/index.php?id=77&xml

    In searching the boards, I tried to start small with the below code, but I'm getting a number of errors off the start.

    PHP Code:
    $xmlstr file_get_contents('http://501.synsport.com/index.php?id=77&xml'); // read your file

    $xml = new SimpleXMLElement($xmlstr); 
    And the first error along with a bunch of others is:
    Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 12: parser error : Entity 'nbsp' not defined in C:\wamp\www\xml parse\test.php on line 5

  • #2
    Senior Coder TheShaner's Avatar
    Join Date
    Sep 2005
    Location
    Orlando, FL
    Posts
    1,126
    Thanks
    2
    Thanked 40 Times in 40 Posts
    I believe the problem is that the file you're pointing to is not a real XML file. Go to that link again and do a View Source. That's the contents the XML parser is attempting to read, which you can see is not a proper XML file. It's XML contents written out for the web browser to display.

    Instead, try pointing to a real XML instead:
    http://www.w3schools.com/XML/cd_catalog.xml

    -Shane

  • #3
    Senior Coder CFMaBiSmAd's Avatar
    Join Date
    Oct 2006
    Location
    Denver, Colorado USA
    Posts
    3,026
    Thanks
    2
    Thanked 315 Times in 307 Posts
    Edit: Basically the same as above ^^^

    Do a view source of that page and you will find out why. It is not actually an xml document. It is a html page that has dumped xml data between <pre> </pre> tags and I'm not sure if the <pre> tags did it or they are using htmlentities on it but the < and > are actually & lt; and & gt; (without the spaces.)
    If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.

  • #4
    Regular Coder
    Join Date
    Feb 2005
    Posts
    663
    Thanks
    5
    Thanked 14 Times in 14 Posts
    Thanks for the help guys.

    So, in looking at the source data, I need to convert htmlentities back their characters of < and >, etc before continuing.

    Now can I do that with get_file_contents? Or would I better to use fsockopen to read the file?

    I've been trying to use fsockopen, but I keep having problems with the url including http:// even though allow_url_fopen is set to On in my php.ini settings.

  • #5
    Senior Coder CFMaBiSmAd's Avatar
    Join Date
    Oct 2006
    Location
    Denver, Colorado USA
    Posts
    3,026
    Thanks
    2
    Thanked 315 Times in 307 Posts
    I would use preg_match() to get everything between the <pre> </pre> tags, then use html_entity_decode() to get it to a usable form to supply to the simple XML function.

    Edit: you will also find by echoing the result of the file_get_contents() that you must be using cookies to access that page. A browser can, but a php script would need to use curl with cookies.

    Specifically, you will get -

    Cookie Scan Error
    Synsport uses cookies to identify returning league owners so they don't need to log in every time they visit. We previously provided an alternative, but now we require browsers to accept cookies, and we detect that this browser is not accepting cookies. If you wish to continue, your options are to either configure this browser to accept cookies from Synsport or use another browser on your computer that is already configured to accept cookies.
    Q1. Why did you disable the alternative identification method of passing the session identifier through the URL?
    It's because of the evil spidering robots. Long ago we realized that with our millions of valid URLs, the Google bots were eating up tons of bandwidth and loading the servers, so we changed the robot.txt instructions to tell Google and Yahoo and the other search engines not to index our site, and they respect our wishes. However, there are many evil bots out there scanning sites for private information and email addresses, looking to send spam to those accounts later. They obviously ignore our wishes to be left alone by robots. Many of these spiders won't accept cookies, so by requiring them, we cut out a large percentage of these intruders. The ones that present valid cookies can be tracked by our sessions, so we can ban those that are hitting the servers at a rate faster than 1 page per second, which is many times faster a legitimate user can browse. It's a war against the spammers, and the dumb bots will get stuck on this page.
    Q2. I use Internet Explorer 7.0. How do I enable cookies this browser?
    Choose “Internet Options” from the Tools menu in IE 7.0
    Click on the “Privacy” tab.
    Click the “Default” button (or manually slide the bar down to “Medium”) under “Settings”.
    Click “OK“.
    Q3. I use Firefox 2+. How do I enable cookies this browser?
    From the Tools Menu, click Options
    From the Options window, click Privacy
    Under Cookies check Accept cookies from sites
    Click OK
    Q4. I use Opera 9+. How do I enable cookies this browser?
    From the Tools Menu, click Quick Preferences
    Check "Enable Cookies"
    Q5. I use Macintosh Safari 2+. How do I enable cookies this browser?
    From the Safari menu, select Preferences
    Select Securities
    Select Accept Cookies
    Check "Only from sites you navigate to"
    Last edited by CFMaBiSmAd; 04-30-2009 at 05:47 PM.
    If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.

  • #6
    Regular Coder
    Join Date
    Feb 2005
    Posts
    663
    Thanks
    5
    Thanked 14 Times in 14 Posts
    Thanks again for the help, and I'm making some progress. Is using preg_match the best way to find the beginning and ending <pre> tags to get the xml data only?

    Here's where I am now
    PHP Code:
    /* STEP 1. let’s create a cookie file */
    $ckfile tempnam ("/tmp""CURLCOOKIE");

    /* STEP 2. visit the homepage to set the cookie properly */
    $ch curl_init ("http://501.synsport.com/index.php");
    curl_setopt ($chCURLOPT_COOKIEJAR$ckfile);
    curl_setopt ($chCURLOPT_RETURNTRANSFERtrue);
    $output curl_exec ($ch);

    /* STEP 3. visit cookiepage.php */
    $ch curl_init ("http://501.synsport.com/index.php?id=77&xml");
    curl_setopt ($chCURLOPT_COOKIEFILE$ckfile);
    curl_setopt ($chCURLOPT_RETURNTRANSFERtrue);

    $output curl_exec ($ch);

    if (
    preg_match("<pre>"$output))
    {
        echo 
    "Found Start <br/>";
    }

    if (
    preg_match("</pre>"$output))
    {
        echo 
    "Found End <br/>";
    }


    $a html_entity_decode($output);
    echo 
    $a
    Last edited by ptmuldoon; 04-30-2009 at 06:48 PM.

  • #7
    Senior Coder CFMaBiSmAd's Avatar
    Join Date
    Oct 2006
    Location
    Denver, Colorado USA
    Posts
    3,026
    Thanks
    2
    Thanked 315 Times in 307 Posts
    Yes (the function is something I found, seems to work as expected) -

    PHP Code:
    function between_tags($string$tagname)
    {
        
    $pattern "/<$tagname>(.*)<\/$tagname>/is";
        
    preg_match($pattern$string$matches);
        return 
    $matches[1];
    }

    ... 
    your code ...

    $output curl_exec ($ch);
    $output between_tags($output,'pre');
    $output html_entity_decode($output,ENT_QUOTES); 
    If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.

  • #8
    Regular Coder
    Join Date
    Feb 2005
    Posts
    663
    Thanks
    5
    Thanked 14 Times in 14 Posts
    OK, back it this some more today. Now that I've learn how to use curl a little, I've moved my testing offline with some sample data.xml from the original file, trying to parse/process the data.

    What I can't figure out is how to parse some the xml data that has spaces and quotes (" ") used.

    Sample XML data from file
    Code:
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <Synsport report="77">
      <sample>
        <var>TEST 1 AREA</var>
        <var name="STAGE_STATUS">Week</var>
      </sample>
      <sample>
        <var>TEST IN HERE</var>
        <var name="STAGE_STATUS">Week</var>
      </sample>
    </Synsport>
    And current code to read and parse the file.
    PHP Code:
    $xmlstr file_get_contents('data.xml'); // read your file

    $xml = new SimpleXMLElement($xmlstr);

    foreach ( 
    $xml->sample as $sample ) {
        
    $vartest $sample->var;
        echo 
    $vartest '<br/>';
        
        
    //How do I include <var name="STAGE_STATUS"> ???


  • #9
    Senior Coder TheShaner's Avatar
    Join Date
    Sep 2005
    Location
    Orlando, FL
    Posts
    1,126
    Thanks
    2
    Thanked 40 Times in 40 Posts
    You would get the attributes for the tag using:
    PHP Code:
    $sample->var->attributes(); 
    http://us.php.net/manual/en/function...attributes.php

    You should probably read over this whole section here. It'll explain just about everything you need to know in order to parse XML using SimpleXML:
    http://us.php.net/manual/en/book.simplexml.php

    -Shane

  • #10
    Senior Coder CFMaBiSmAd's Avatar
    Join Date
    Oct 2006
    Location
    Denver, Colorado USA
    Posts
    3,026
    Thanks
    2
    Thanked 315 Times in 307 Posts
    Here is how you access the name attribute directly -
    PHP Code:
    $attrib $xml->sample[0]->var[1]->attributes();
    echo 
    $attrib['name']; 
    It would take some experimenting to incorporate that into your existing code.
    If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.

  • #11
    Regular Coder
    Join Date
    Feb 2005
    Posts
    663
    Thanks
    5
    Thanked 14 Times in 14 Posts
    I feel I should know this better than I do, yet I can't seem to figure out how you echo out both the attribute name and its value. I think/feel once I grasp that I should be able to begin manipulating the data for graphing and chart presentation (probably with xml/swf charts).

    Updated Sample xml and Test Code Combined to one file for each testing.
    PHP Code:
    $data = '<?xml version="1.0" encoding="ISO8859-1" ?>
    <Synsport report="77">
      <sample>
        <var name="DESC">TEST 1 AREA</var>
        <var name="WEEK">Week 1</var>
        <var name="SCORE">88</var>
      </sample>
      <sample>
        <var name="DESC">TEST 2 AREA</var>
        <var name="WEEK">Week 2</var>
        <var name="SCORE">96</var>
      </sample>
    </Synsport>';

    $xml = new SimpleXMLElement($data);

    foreach ( $xml->sample as $sample ) {
        //How do you echo out the attribute name and value for each ???
        
        $attrib = $xml->sample->var->attributes();
        $value = $sample->var;
        echo $attrib . '= ' . $value . '<br/>';
        
    }

  • #12
    Regular Coder
    Join Date
    Feb 2005
    Posts
    663
    Thanks
    5
    Thanked 14 Times in 14 Posts
    Still learning here with parsing some xml data

    Now why would the below only loop and give me the first set of sample data, Week 1 only, and not continue and give me Week 2 as well? It will loop, but it shows me Week 1 data twice.

    Is there an easier way then placing foreach statements inside foreach statements?

    PHP Code:
    $data = '<?xml version="1.0" encoding="ISO8859-1" ?>
    <Synsport report="77">
      <sample>
        <var name="DESC">TEST 1 AREA</var>
        <var name="WEEK">Week 1</var>
        <var name="SCORE">88</var>
      </sample>
      <sample>
        <var name="DESC">TEST 2 AREA</var>
        <var name="WEEK">Week 2</var>
        <var name="SCORE">96</var>
      </sample>
    </Synsport>';

    $xml = new SimpleXMLElement($data);

    foreach ( $xml->sample as $sample ){    
        
        foreach ($xml->sample->var as $var) {
               echo $var['name'] . ' is ' . $var .'<br/>';
        }
    }

  • #13
    Senior Coder CFMaBiSmAd's Avatar
    Join Date
    Oct 2006
    Location
    Denver, Colorado USA
    Posts
    3,026
    Thanks
    2
    Thanked 315 Times in 307 Posts
    Because your inner foreach() loop is not using what you think it is. It should be -
    foreach ($sample->var as $var) {
    If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.

  • #14
    Regular Coder
    Join Date
    Feb 2005
    Posts
    663
    Thanks
    5
    Thanked 14 Times in 14 Posts
    Sweet, thanks, and that makes perfect sense now that you pointed it out.

    Now...continuing down the path. I assume its more complex to dig deeper into the xml data as additional layers/childs are added?

    Looking back at the original xml file that I will be starting with here: http://501.synsport.com/index.php?id=77&xml

    You can see that deeper into the xml (middle of the xml data), you have something like this
    Code:
    <block level="1" name="ScoringGrid">
        <row level="2" number="0">
          <block level="3" name="Header">
            <row level="4" number="0">
              <var name="WEEK">1</var>
            </row>
            <row level="4" number="1">
              <var name="WEEK">2</var>
            </row>
            <row level="4" number="2">
              <var name="WEEK">3</var>
            </row>
            <row level="4" number="3">
              <var name="WEEK">4</var>
            </row>
    ......... Continuing on.
    I'm unsure how the levels work. Would need to keep using additional foreach loops inside each to get the data you want to pull out?

    I hope to eventually pull out enough data into a new xml file for eventual graphing.

    FYI........This is all for fun for my fantasy-football league. Just something to keep learning with

  • #15
    Senior Coder CFMaBiSmAd's Avatar
    Join Date
    Oct 2006
    Location
    Denver, Colorado USA
    Posts
    3,026
    Thanks
    2
    Thanked 315 Times in 307 Posts
    If you use print_r on the object (at any level) it makes it easier to see how to access the available data and attributes -

    echo "<pre>", print_r($xml,true), "</pre>";

    or

    echo "<pre>", print_r($xml->attributes(),true), "</pre>";
    If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.


  •  
    Page 1 of 2 12 LastLast

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •