PDA

View Full Version : Extracting info from HTML


Halli
02-17-2007, 01:26 PM
Hi all,

Im needing some help with writing a PHP script. Basically all i want to do is write script that can extract data from a table in html format into a MySQL database on my server.

An example of what i mean can be found on this link:

http://www.sportinglife.com/football/premiership/table/table.html

Any help with this would be much apreciated. Also if there are any tutorials available online which could help teach me how to do it, that would be great.

Cheers

Nightfire
02-17-2007, 04:53 PM
What do you have so far?

ACJavascript
02-17-2007, 04:59 PM
Heres how I normally do it - depends on speed though and the site.


//CONNECT TO YOUR DB HERE

$myfile = file_get_contents("file.html");

$start = strpos($myfile, "Start Field");
$finish = strpos($myfile, "End Field");

$length = $finish - $start;

$string = substr($myfile,$start,$length);

//Pull apart the fields
$column=explode("<td>",$string);

//or however you would like to rip apart that table


I normally use this for when I need to just pull a table or data out of a site, you will have to use explode, str_replace to extract the data needed. :)

Mhtml
02-18-2007, 02:13 AM
You can always use DOM (http://www.w3.org/DOM/) via PHP's DOM extension (http://www.php.net/dom).

Fou-Lu
02-18-2007, 04:12 AM
I would agree with Mhtml on the dom.
Assuming you are writting your own tables, I'd recommend looking more into an xhtml compliant method, and skip the tables. If you are reading from an already written table, cross your fingers and hope that is is well strucutured.

mlseim
02-18-2007, 10:38 PM
I'm guessing that they get all of their scores from a sports service.
They then build the table using a script.

You could find out how they get their information and do the same thing.
I don't think it's too ethical to grab their HTML and take the scores.

You have a tough project ahead of you to parse the HTML and find
all of those scores.

aedrin
02-19-2007, 07:04 PM
You have a tough project ahead of you to parse the HTML and find
all of those scores.

It the pattern the source HTML uses is constant, this shouldn't prove to be much of a problem.

Mhtml's suggestion is a pretty good one, although you have to depend on the correctness of the HTML.

Exploding would work too, especially for grid like formats. (no overlapping columns/rows)

timgolding
02-19-2007, 10:26 PM
I wrote a simular script the other day for a football league table and just used simple file handling in php

Halli
03-08-2007, 06:25 AM
I have a regular expression here which extracts the data out of the table:

<tr>\n<td align="left">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n<tr/>

Now what im having trouble with is getting this into an array, or a format that reads ("Man Utd, 29, 12, 1, 1, etc, etc") so i can upload to a mysql database.

<?

$file = file_get_contents("http://www.sportinglife.com/football/premiership/table/table.html");

$extract = ("<tr>\n<td align="left">\n<a[^>]+>(\w+)</a>\n</td>\n<td[^>]+>(\d+)</td>\n<tr/>");

preg_match_all($extract,$file,$match);

Echo $match;

?>

Am i along the right track here? Kind of new to PHP so any help is apreciated.

Cheers :)