Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    Regular Coder
    Join Date
    Nov 2010
    Location
    Washington DC
    Posts
    338
    Thanks
    22
    Thanked 1 Time in 1 Post

    Parsing Data From Html Using Xpath

    Hello,

    I need a little bit of assistance with parsing data from an html using xpath. Please indulge me for a moment as I attempt to explain my problem and subsequently my question:

    The html code from which I am parsing data is below. I would like to specifically parse for all href attributes starting from within the second <tr> of an html which is formatted as follows:

    PHP Code:
       <tbody><tr class="clubHeaderRow">
            <
    td>Club Name <font size="-1">(ExpDate)</font></td>
            <
    td align="center">Location and Days/Hours</td>
            <
    td>Contacts</td>
        </
    tr>
        <!--
    PARSING STARTS FROM HERE--><tr><td class="clubRow"><a href="http://www.bumpernets.com">BumperNetsInc.</a><br><font size="-1">(7/31/2014)</font></td>
            <
    td align="center" valign="middle" width="500"><table width="98%" border="1" cellpadding="2">
                <
    tbody><tr>
                    <
    td width="60%" class="clubRowEven">Riverchase Galleria Mall<br>2000 Riverchase Galleria Ste 179 &amp181A<br>BirminghamAL  35244<br>205-987-2222<br><br><u>Directions</u>:<br>Toll Free Number 1-800-366-7664</td>
                    <
    td width="40%" class="clubRowEven">Open 7 days a week  Mon-Thurs10AM to 9PMFri-Sat 10AM to 10PM and Sunday 11AM to 6PM  
        Tournaments 
    Every Friday 7:30 PM to 10:00 PM</td>
                </
    tr>
            </
    tbody></table>
            </
    td>
            <
    td class="clubRow"><a href="mailto:homer.bumpernets@gmail.com">Homer Brown</a><br>205-987-2222</td>
        </
    tr>
        <
    tr>
        <
    td bgcolor="whitesmoke" class="clubRow"><a href="http://www.nattc.com">North Alabama Table Tennis Club</a><br><font size="-1">(1/31/2015)</font></td>
            <
    td align="center" valign="middle" width="500" bgcolor="whitesmoke"><table width="98%" border="1" cellpadding="2">
                <
    tbody><tr>
                    <
    td width="60%" class="clubRowEven">Aquadome Recreation Center<br>1202 5th Ave SW<br>DecaturAL  35601</td>
                    <
    td width="40%" class="clubRowEven">Tuesday 6:00 9:00PM</td>
                </
    tr>
                <
    tr>
        <
    td width="60%" class="clubRowOdd">Brahan Springs RecCenter<br>3770 Ivy St.<br>HuntsvilleAL  35805<br>256-883-3710</td>
                    <
    td width="40%" class="clubRowOdd">Winter Wed 6 9 PM
        Spring
    /Summer Thur 6 9 PM</td>
                </
    tr>
            </
    tbody></table>
            </
    td>
            <
    td bgcolor="whitesmoke" class="clubRow"><a href="mailto:crpatton@hiwaay.net">Chip Patton</a><br>256-772-7359</td>
        </
    tr>
        <
    td width="60%" class="clubRowOdd">Brahan Springs RecCenter<br>3770 Ivy St.<br>HuntsvilleAL  35805<br>256-883-3710</td>
                    <
    td width="40%" class="clubRowOdd">Winter Wed 6 9 PM
        Spring
    /Summer Thur 6 9 PM</td>
                </
    tr>
            </
    tbody></table>
            </
    td>
            <
    td bgcolor="whitesmoke" class="clubRow"><a href="mailto:crpatton@hiwaay.net">Chip Patton</a><br>256-772-7359</td>
        </
    tr>
            <
    tr>
            <
    td class="clubRow"><a href="http://neatt.weebly.com/">North East Alabama Table Tennis</a><br><font size="-1">(7/31/2014)</font></td>
            <
    td align="center" valign="middle" width="500"><table width="98%" border="1" cellpadding="2">
                <
    tbody><tr>
                    <
    td width="60%" class="clubRowEven">Anniston Army Depot GymBldg 206<br>7 Frankford Ave.<br>AnnistonAL  36201<br>256-235-6385<br><br><u>Directions</u>:<br>Call 256-235-6385</td>
                    <
    td width="40%" class="clubRowEven">Tues 5:00 to 9:00PM</td>
                </
    tr>
            </
    tbody></table>
            </
    td>
            <
    td class="clubRow"><a href="mailto:238mike@bellsouth.net">Mike Harris</a><br>256-689-8603</td>
        </
    tr>
         </
    tbody
    the code for the xpath is as follows:

    PHP Code:
        $urlArr = array(); 
        
    $clssname="clubRow";
     
      
        
    $anchors $xpath->query("//table/tr/td[@class='$clssname'] //a");
        foreach(
    $anchors as $a)

        { 
         
    // $urlArr[]= $a->nodeValue." - ".$a->getAttribute("href")."<br/>";
        
    $urlArr[]= $a->getAttribute("href")."<br/>";
     
        } 
    In its current form, the output is:

    PHP Code:
        Array
        (
        [
    0] => http://www.bumpernets.com

        
    [1] => mailto:homer.bumpernets@gmail.com

        
    [2] => http://www.nattc.com

        
    [3] => mailto:crpatton@hiwaay.net

        
    [4] => http://neatt.weebly.com/

        
    [5] => mailto:238mike@bellsouth.net

        

    **my question is how to structure the array to look like the following:**

    PHP Code:
        Array
        (
        [
    0] => http://www.bumpernets.com

        
    [1] => mailto:homer.bumpernets@gmail.com
        
    )
        Array
        (
        [
    0] => http://www.nattc.com

        
    [1] => mailto:crpatton@hiwaay.net
        
    )
        Array
        (
        [
    0] => http://neatt.weebly.com/

        
    [1] => mailto:238mike@bellsouth.net

        

    Basically, data cell (<td>) from within each <tr> beginning with from the second is formatted in an array.
    I would appreciate any thoughts on this.

    Best
    Mossa

  • #2
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,987
    Thanks
    4
    Thanked 2,660 Times in 2,629 Posts
    Iterate the td level and then query for the href's inside these iterations instead of querying them all. Alternatively, add id's on the rows and use the id's to create the multiple dimensions. This will let you create a multi-dimensional array. The table here isn't correct (as opposed to well formed which can still load); it is missing structural elements to allow completion.
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 

  • #3
    Regular Coder low tech's Avatar
    Join Date
    Dec 2009
    Posts
    851
    Thanks
    172
    Thanked 93 Times in 93 Posts
    how to structure the array to look like the following

    Just an idea



    PHP Code:
    $group array_chunk$urlArr2);
        
        
    var_dump($group); 


    //output similar
    array(3) {
    [0]=>
    array(2) {
    [0]=>
    string(25) "http://www.bumpernets.com"
    [1]=>
    string(33) "mailto:homer.bumpernets@gmail.com"
    }
    [1]=>
    array(2) {
    [0]=>
    string(20) "http://www.nattc.com"
    [1]=>
    string(26) "mailto:crpatton@hiwaay.net"
    }
    [2]=>
    array(2) {
    [0]=>
    string(24) "http://neatt.weebly.com/"
    [1]=>
    string(28) "mailto:238mike@bellsouth.net"
    }
    }


    PHP Code:
        echo $group[0][0]."<br />"
        echo 
    $group[0][1]."<br />"
        echo 
    $group[1][0]."<br />"
        echo 
    $group[1][1]."<br />"
        echo 
    $group[2][0]."<br />"
        echo 
    $group[2][1]."<br />"
    //output
    Bumpernets — America's First Table Tennis Store
    mailto:homer.bumpernets@gmail.com
    North Alabama Table Tennis Club
    mailto:crpatton@hiwaay.net
    neatt.us - HOME PAGE - NEW SHIRTS
    mailto:238mike@bellsouth.net
    "The greatest revenge is to accomplish what others say you cannot do."
    ~ Unknown

    I used to be indecisive, but now I'm not so sure.

  • #4
    Regular Coder
    Join Date
    Nov 2010
    Location
    Washington DC
    Posts
    338
    Thanks
    22
    Thanked 1 Time in 1 Post
    I appreciate the responses to this post.

    Low Tech, your suggested idea and sample code seems to be close to what I am trying to achieve; however, I need a little modification to the output.
    Using
    PHP Code:
    array_chunk($urlArr,2
    , I am getting the following output:
    PHP Code:
    Array
    (
        [
    0] => Array
            (
                [
    0] => http://www.bumpernets.com]Bumpernets â€” America's First Table Tennis Store[/url]

                
    [1] => mailto:homer.bumpernets@gmail.com

            
    )

        [
    1] => Array
            (
                [
    0] => [url]http://www.nattc.com]North[/url] Alabama Table Tennis Club[/url]

                
    [1] => mailto:crpatton@hiwaay.net

            
    )

        [
    2] => Array
            (
                [
    0] => [url]http://neatt.weebly.com/]neatt.us[/url] - HOME PAGE - NEW SHIRTS[/url]

                
    [1] => mailto:238mike@bellsouth.net

            
    )


    I would like the output to be as follows:
    PHP Code:
     Array
            (
                [
    0] => http://www.bumpernets.com]Bumpernets â€” America's First Table Tennis Store[/url]

                
    [1] => mailto:homer.bumpernets@gmail.com

            
    )

     Array
            (
                [
    0] => [url]http://www.nattc.com]North[/url] Alabama Table Tennis Club[/url]

                
    [1] => mailto:crpatton@hiwaay.net

            
    )

      Array
            (
                [
    0] => [url]http://neatt.weebly.com/]neatt.us[/url] - HOME PAGE - NEW SHIRTS[/url]

                
    [1] => mailto:238mike@bellsouth.net

            

    Any thoughts on how to do the modification?

    Thanks
    Last edited by mbarandao; 04-12-2014 at 07:58 AM.

  • #5
    Regular Coder low tech's Avatar
    Join Date
    Dec 2009
    Posts
    851
    Thanks
    172
    Thanked 93 Times in 93 Posts
    All I can think of is (not sure if it's useful tbh)


    PHP Code:
        $group array_chunk$urlArr2);
        
        foreach ( 
    $group as $arr ) {
        
    print_r($arr);
        } 
    /*output

    PHP Code:
    Array
            (
                [
    0] => http://www.bumpernets.com
                
    [1] => mailto:homer.bumpernets@gmail.com
            
    )
    Array
            (
                [
    0] => http://www.nattc.com
                
    [1] => mailto:crpatton@hiwaay.net
            
    )
    Array
            (
                [
    0] => http://neatt.weebly.com/
                
    [1] => mailto:238mike@bellsouth.net
            

    */
    Last edited by low tech; 04-12-2014 at 08:13 AM.
    "The greatest revenge is to accomplish what others say you cannot do."
    ~ Unknown

    I used to be indecisive, but now I'm not so sure.

  • Users who have thanked low tech for this post:

    mbarandao (04-12-2014)

  • #6
    Regular Coder
    Join Date
    Nov 2010
    Location
    Washington DC
    Posts
    338
    Thanks
    22
    Thanked 1 Time in 1 Post
    Low Tech,

    thank you! That gives me a starting point. it looks close to what I need. I will build from there.

    Thanks again!


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •