...

View Full Version : Parse/Scrape table information from another site



jsquadrilla
11-16-2011, 04:58 PM
Hello,

I wont go into why or what for, but I need to grab data from a website that I can input into a MySQL table for me to do whatever I need with it.

I Google'd around, and found that you can do it using Simple HTML DOM? The problem is, no matter what I try, nothing seems to work.



include "simple/simple_html_dom.php";

$html = file_get_html('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
$es = $html->find('data standings League');

echo($es);


This is outputting "Array".

I realize this isn't much for code, but I haven't gotten anywhere with tutorials I found on the net. The table's class is "data standings league".

Basically, I want to grab the data in the NHL.com standings for the league. Then I'd like to put that info into my own MySQL table. Ideally, I'd like to automate this process everyday (although this can wait).

Any help is appreciated!

tangoforce
11-16-2011, 05:24 PM
This is outputting "Array".

Then you need to var_dump() the result and see what the array contains. The function is probably returning several things in an array including the information you want.

mlseim
11-16-2011, 05:30 PM
wow ... that is going to be a tough one.
I know you want to do this for free, but find out how much it would
cost to get an account with them, and possible ... they may have an
API or XML file that contains the data. Even if it costs some money
each month, it might be worth it. At least ask them about it.

Parsing their HTML is not going to be easy, but technically, it could be
done with enough scripting and enough time to debug it.


.

jsquadrilla
11-16-2011, 06:04 PM
Then you need to var_dump() the result and see what the array contains. The function is probably returning several things in an array including the information you want.

Now it says "array(0) { } Array"


wow ... that is going to be a tough one.
I know you want to do this for free, but find out how much it would
cost to get an account with them, and possible ... they may have an
API or XML file that contains the data. Even if it costs some money
each month, it might be worth it. At least ask them about it.

Parsing their HTML is not going to be easy, but technically, it could be
done with enough scripting and enough time to debug it.


.

Can't be that hard? I mean, I know barely nothing about scraping, but if the data is constant (table name stays the same etc.) it shouldn't be too difficult to figure out. Sadly, Google Docs does exactly what I want with the "importhtml" function. It puts it in a spreadsheet perfectly, so there has to be a way (if only I knew how importhtml functioned...)

jsquadrilla
11-16-2011, 06:26 PM
Minor success!



function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}

$returned_content = get_data('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');

echo $returned_content;


When I run this, I get the exact page, but on my URL which is perfect. Now I just need to find a way to extract what I need from this page, which is the next step and where I'm completely lost.

Adee
11-16-2011, 08:10 PM
Minor success!



function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}

$returned_content = get_data('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');

echo $returned_content;


When I run this, I get the exact page, but on my URL which is perfect. Now I just need to find a way to extract what I need from this page, which is the next step and where I'm completely lost.


regular expressions. have fun with that!

jsquadrilla
11-16-2011, 08:19 PM
regular expressions. have fun with that!

Any idea where I can start?

Using Simple HTML Dom, my code is only:



$html = file_get_html('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
$e = $html->find("table", 2);
echo $e;


It displays the table, but still has href's, classes etc. I want to strip it all and just have the data. Just looking for a kick in the right direction for how to do that.

Adee
11-16-2011, 08:46 PM
Is this what you're trying to do?
http://rs-downfall.com/scripts/cf/nhl.php

jsquadrilla
11-16-2011, 08:56 PM
Is this what you're trying to do?
http://rs-downfall.com/scripts/cf/nhl.php

That's what I've got so far (minus the weird that shows up on yours)

Basically, I want to put that information, as-is, into a database.

But, I believe I'd need to strip it down before I can.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum