Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    New Coder
    Join Date
    Jan 2003
    Posts
    39
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Need help with REGEX to capture data in all html table cells

    Hello!

    I've been struggling for sometime now trying figure out how to capture data between ALL <td></td> tags and it must be sorted by <tr></tr> (meaning I need to know in which <tr> the <td> is located)
    As an working example I'm using this code:
    PHP Code:
    <?php
    function highlight($array)
    {
      foreach(
    $array as $key => $val)
        if (
    is_array($val))
          
    $array[$key] = highlight($val);
        else
          
    $array[$key] = '<span style="color:red;font-weight:normal;">'.(preg_match("#^[\n\r]#"$val) ? "" "\n").htmlspecialchars($val).'</span>';
      return 
    $array;
    }
    $text '
    <table>
      <tr>
        <td>tb1-tr1-td1</td>
        <td>tb1-tr1-td2</td>
        <td>
          <span>
            tb1-tr1-td3
          <span>
        </td>
        <td>
          tb1-tr1-td4
        </td>
      </tr>
      <tr>
        <td>tb1-tr2-td1</td>
        <td><span>tb1-tr2-td2</span></td>
        <td>
          tb1-tr2-td3
        </td>
      </tr>
    </table>
    <table>
      <tr>
        <td>tb2-tr1-td1</td>
        <td><span>tb2-tr1-td2</span></td>
        <td>
          tb2-tr1-td3
        </td>
      </tr>
    </table>
    '
    ;

    preg_match_all('#<table>(\s*<tr>(\s*<td>(.*)</td>\s*)+</tr>\s*)+</table>#sU'$text$result);

    echo 
    "<pre><b>";
    print_r(highlight($result));
    echo 
    "</b></pre>";
    ?>
    It returns data from the last <tr> and last <td></td> only of each <table>...what am I doing wrong?


    Thank you.
    Last edited by V@no; 10-10-2009 at 01:28 AM.
    V@no.

  • #2
    Senior Coder
    Join Date
    Jul 2009
    Location
    South Yorkshire, England
    Posts
    2,318
    Thanks
    6
    Thanked 304 Times in 303 Posts
    Untested.

    Code:
    preg_match_all('#<td>(.+?)</td>#', $text, $result);

  • #3
    New Coder
    Join Date
    Jan 2003
    Posts
    39
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks for the reply, I missed one important thing in my explanation...besides the data in <td></td> I also need to know in which <tr></tr> that data located...(added to my post)
    I can do it in 3 steps, with 3 different regex, but there is must be a way do it in one step with one regex...
    V@no.

  • #4
    Senior Coder
    Join Date
    Jul 2009
    Location
    South Yorkshire, England
    Posts
    2,318
    Thanks
    6
    Thanked 304 Times in 303 Posts
    Your description for each row is within <td></td> tags, (working from your code above), so is already captured into that array with that regex. Process the array data in a manner necessary to provide that info.

  • #5
    New Coder
    Join Date
    Jan 2003
    Posts
    39
    Thanks
    0
    Thanked 4 Times in 4 Posts
    um....it captures only one <td> per table...
    V@no.

  • #6
    Senior Coder
    Join Date
    Jul 2009
    Location
    South Yorkshire, England
    Posts
    2,318
    Thanks
    6
    Thanked 304 Times in 303 Posts
    Quote Originally Posted by V@no View Post
    um....it captures only one <td> per table...
    Nope, it captures them all. Whether you are processing them correctly is a different thing. (I've removed the HTML formatting and just put a print_r to display the complete array so that you can see what info you *actually* capture, rather than what you're displaying).

    Code:
    <?php
    
    $text = '
    <table>
      <tr>
        <td>tb1-tr1-td1</td>
        <td>tb1-tr1-td2</td>
        <td>
          <span>
            tb1-tr1-td3
          <span>
        </td>
        <td>
          tb1-tr1-td4
        </td>
      </tr>
      <tr>
        <td>tb1-tr2-td1</td>
        <td><span>tb1-tr2-td2</span></td>
        <td>
          tb1-tr2-td3
        </td>
      </tr>
    </table>
    <table>
      <tr>
        <td>tb2-tr1-td1</td>
        <td><span>tb2-tr1-td2</span></td>
        <td>
          tb2-tr1-td3
        </td>
      </tr>
    </table>
    ';
    
    $text = preg_replace('#\r\n|\n#', '', $text);
    preg_match_all('#<td>(.+?)</td>#', $text, $result);
    
    print_r($result);
    
    ?>

    Edit: If you are spreading one lot of tabular data across multiple lines, you also need to remove newlines and such from the $text string for the regex to capture those. That's done by the preg_replace line I've added above.
    Last edited by MattF; 10-11-2009 at 02:21 AM.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •