Go Back   CodingForums.com > :: Server side development > PHP

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 01-14-2011, 11:01 PM   PM User | #1
ryantakers
New Coder

 
Join Date: May 2009
Posts: 29
Thanks: 11
Thanked 0 Times in 0 Posts
ryantakers is an unknown quantity at this point
Converting table in aspx file to a csv file

I raised a thread in the SQL forum about this to be told that it had to be done via php; there is no simple way to do it.

I have done a fair bit of work in getting the tags out, and commas in and am now almost there - I have one final problem however! I am getting an extra line between each entry.

PHP Code:
<?php
$fd 
fopen("ClubReportMembership.aspx"r);
$content fread($fdfilesize("ClubReportMembership.aspx"));
fclose($fd);
$search = array('<div>''<table class="noborder" cellspacing="2" cellpadding="6" border="0" id="ctl00_cphMain_gvMembership">','<tr class="head">''<th scope="col">MemberNumberMTBA</th><th scope="col">MemberNumberIMBA</th><th scope="col">First Name</th><th scope="col">Surname</th><th scope="col">Date Of Birth</th><th scope="col">Gender</th><th scope="col">Address Line 1</th><th scope="col">Address Line 2</th><th scope="col">Suburb</th><th scope="col">State</th><th scope="col">Postcode</th><th scope="col">Citizenship</th><th scope="col">Telephone</th><th scope="col">Mobile</th><th scope="col">Email</th><th scope="col">Membership Type</th><th scope="col">Membership Start Date</th><th scope="col">Membership Stop Date</th><th scope="col">Processing Date</th>','</tr><tr class="data">''</tr>''</table>''</div>''&nbsp;');
$replace "";
$modified str_replace($search$replace$content);
$search = array('</td><td>');
$replace ",";
$modified str_replace($search$replace$modified);
$modified strip_tags($modified);
$modified trim($modified);
$f fopen("test3.csv""w"); 
fwrite($f$modified); 
fclose($f);
?>
the aspx file looks like this:
Code:
<div>
	<table class="noborder" cellspacing="2" cellpadding="6" border="0" id="ctl00_cphMain_gvMembership">
		<tr class="head">
			<th scope="col">MemberNumberMTBA</th><th scope="col">MemberNumberIMBA</th><th scope="col">First Name</th><th scope="col">Surname</th><th scope="col">Date Of Birth</th><th scope="col">Gender</th><th scope="col">Address Line 1</th><th scope="col">Address Line 2</th><th scope="col">Suburb</th><th scope="col">State</th><th scope="col">Postcode</th><th scope="col">Citizenship</th><th scope="col">Telephone</th><th scope="col">Mobile</th><th scope="col">Email</th><th scope="col">Membership Type</th><th scope="col">Membership Start Date</th><th scope="col">Membership Stop Date</th><th scope="col">Processing Date</th>
		</tr><tr class="data">
			<td>44181</td><td>&nbsp;</td><td>editedout</td><td>editedout</td><td>2editedout</td><td>Male</td><td>editedout</td><td>&nbsp;</td><td>Bittern</td><td>VIC</td><td>3918</td><td>Australia</td><td>editedout</td><td>&nbsp;</td><td>editedout</td><td>Junior Membership</td><td>15/12/2010 12:00:00 AM</td><td>15/12/2011 12:00:00 AM</td><td>15/12/2010 11:04:43 PM</td>
		</tr><tr class="altData">
....Continues
I'm so close to getting this section done! Any help would be immensely appreciated.

Last edited by ryantakers; 01-15-2011 at 09:06 PM..
ryantakers is offline   Reply With Quote
Old 01-15-2011, 03:45 AM   PM User | #2
venegal
Gütkodierer


 
Join Date: Apr 2009
Posts: 2,127
Thanks: 1
Thanked 426 Times in 424 Posts
venegal has a spectacular aura aboutvenegal has a spectacular aura about
I can see that you have put some work into this and that it's something you just have to get done anyway you manage to get it done, so I'll provide you with a solution a bit later on.

Allow me to ramble for a bit though: HTML has an inherent structure that allows it to be quite easily parsed. It happens all the time. Your browser does that. And there are quite a few PHP classes that do just that – parse HTML, so you can easily extract any data buried in whatever mess of tags you have to work with. Possibly someone is linking you to one of those classes right now while I'm typing this (probably not though, since this is not the type of question people tend to jump on).

That said, if you're feeling masochistic, you can do it by just replacing stuff. Here's how I would do it:

PHP Code:
// This is just your original input, with a few table rows added to show that it's doing what it's supposed to do
$content = <<<EOD
    <div>
        <table class="noborder" cellspacing="2" cellpadding="6" border="0" id="ctl00_cphMain_gvMembership">
            <tr class="head">
                <th scope="col">MemberNumberMTBA</th><th scope="col">MemberNumberIMBA</th><th scope="col">First Name</th><th scope="col">Surname</th><th scope="col">Date Of Birth</th><th scope="col">Gender</th><th scope="col">Address Line 1</th><th scope="col">Address Line 2</th><th scope="col">Suburb</th><th scope="col">State</th><th scope="col">Postcode</th><th scope="col">Citizenship</th><th scope="col">Telephone</th><th scope="col">Mobile</th><th scope="col">Email</th><th scope="col">Membership Type</th><th scope="col">Membership Start Date</th><th scope="col">Membership Stop Date</th><th scope="col">Processing Date</th>
            </tr><tr class="data">
                <td>44181</td><td>&nbsp;</td><td>editedout</td><td>editedout</td><td>2editedout</td><td>Male</td><td>editedout</td><td>&nbsp;</td><td>Bittern</td><td>VIC</td><td>3918</td><td>Australia</td><td>editedout</td><td>&nbsp;</td><td>editedout</td><td>Junior Membership</td><td>15/12/2010 12:00:00 AM</td><td>15/12/2011 12:00:00 AM</td><td>15/12/2010 11:04:43 PM</td>
            </tr><tr class="data">
                <td>44181</td><td>&nbsp;</td><td>editedout</td><td>editedout</td><td>2editedout</td><td>Male</td><td>editedout</td><td>&nbsp;</td><td>Bittern</td><td>VIC</td><td>3918</td><td>Australia</td><td>editedout</td><td>&nbsp;</td><td>editedout</td><td>Junior Membership</td><td>15/12/2010 12:00:00 AM</td><td>15/12/2011 12:00:00 AM</td><td>15/12/2010 11:04:43 PM</td>
            </tr><tr class="data">
                <td>44181</td><td>&nbsp;</td><td>editedout</td><td>editedout</td><td>2editedout</td><td>Male</td><td>editedout</td><td>&nbsp;</td><td>Bittern</td><td>VIC</td><td>3918</td><td>Australia</td><td>editedout</td><td>&nbsp;</td><td>editedout</td><td>Junior Membership</td><td>15/12/2010 12:00:00 AM</td><td>15/12/2011 12:00:00 AM</td><td>15/12/2010 11:04:43 PM</td>
            </tr>
        </table>
    </div>
EOD;

// Remove the whole table head
$content preg_replace('#<tr class="head">.*?</tr>#s'''$content);
// Remove the HTML-encoded whitespace
$content preg_replace('#&nbsp;#'''$content);
// Remove the whitespace
$content preg_replace('#\s#'''$content);
// Replace the end of each table row with a line feed (replace the last </td> as well, so there won't be any superfluous commas after the last cell in a row)
$content preg_replace('#</td>\s*</tr>#'"\r\n"$content);
// Replace the end of each cell with a comma
$content preg_replace('#</td>#'','$content);
// Remove any tags that are still in there
$content strip_tags($content);
// Print out CSV
echo $content;
// Rejoice 
venegal is offline   Reply With Quote
Users who have thanked venegal for this post:
ryantakers (01-15-2011)
Old 01-15-2011, 11:12 AM   PM User | #3
ryantakers
New Coder

 
Join Date: May 2009
Posts: 29
Thanks: 11
Thanked 0 Times in 0 Posts
ryantakers is an unknown quantity at this point
Thanks very much Venegal.

It works brilliantly, however it is stripping out too much! Allow me to explain the problem:

Some of the fields have a space in them, for instance '24 myhouse road', which needs to remain. I notice that if I comment out the line:
PHP Code:
$content preg_replace('#\s#'''$content); 
then the spaces are back, however, so is the extra line between entries.

Any tips?

Thanks again.
ryantakers is offline   Reply With Quote
Old 01-15-2011, 02:28 PM   PM User | #4
venegal
Gütkodierer


 
Join Date: Apr 2009
Posts: 2,127
Thanks: 1
Thanked 426 Times in 424 Posts
venegal has a spectacular aura aboutvenegal has a spectacular aura about
You're completely right, of course, I'm sorry. That whitespace within data fields has to stay in there. I altered the code a bit so it does:

PHP Code:
// Remove the whole table head
$content preg_replace('#<tr class="head">.*?</tr>#s'''$content); 
// Remove the HTML-encoded whitespace 
$content preg_replace('#&nbsp;#'''$content); 
// Conserve whitespace within data cells
$content preg_replace_callback('#<td>.*?</td>#'create_function('$data''return preg_replace("#\s#", "@@CONSERVED_WHITESPACE@@", $data[0]);'), $content); 
// Remove unconserved whitespace 
$content preg_replace('#\s*#'''$content);
// Rebuild conserved whitespace
$content preg_replace('#@@CONSERVED_WHITESPACE@@#'' '$content);
// Replace the end of each table row with a line feed (replace the last </td> as well, so there won't be any superfluous commas after the last cell in a row) 
$content preg_replace('#</td>\s*</tr>#'"\r\n"$content); 
// Replace the end of each cell with a comma 
$content preg_replace('#</td>#'','$content); 
// Remove any tags that are still in there 
$content strip_tags($content); 
// Print out CSV 
echo $content
// Rejoice 
venegal is offline   Reply With Quote
Users who have thanked venegal for this post:
ryantakers (01-15-2011)
Old 01-15-2011, 09:06 PM   PM User | #5
ryantakers
New Coder

 
Join Date: May 2009
Posts: 29
Thanks: 11
Thanked 0 Times in 0 Posts
ryantakers is an unknown quantity at this point
Fantastic! Thanks for all your help!
ryantakers is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 06:46 AM.


Advertisement
Log in to turn off these ads.