Go Back   CodingForums.com > :: Server side development > PHP

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 11-16-2011, 04:58 PM   PM User | #1
jsquadrilla
New to the CF scene

 
Join Date: Nov 2011
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
jsquadrilla is an unknown quantity at this point
Parse/Scrape table information from another site

Hello,

I wont go into why or what for, but I need to grab data from a website that I can input into a MySQL table for me to do whatever I need with it.

I Google'd around, and found that you can do it using Simple HTML DOM? The problem is, no matter what I try, nothing seems to work.

Code:
include "simple/simple_html_dom.php";
	
$html = file_get_html('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
$es = $html->find('data standings League');

echo($es);
This is outputting "Array".

I realize this isn't much for code, but I haven't gotten anywhere with tutorials I found on the net. The table's class is "data standings league".

Basically, I want to grab the data in the NHL.com standings for the league. Then I'd like to put that info into my own MySQL table. Ideally, I'd like to automate this process everyday (although this can wait).

Any help is appreciated!

Last edited by jsquadrilla; 11-16-2011 at 05:01 PM..
jsquadrilla is offline   Reply With Quote
Old 11-16-2011, 05:24 PM   PM User | #2
tangoforce
Senior Coder

 
tangoforce's Avatar
 
Join Date: Feb 2011
Location: Your Monitor
Posts: 3,510
Thanks: 45
Thanked 439 Times in 428 Posts
tangoforce will become famous soon enoughtangoforce will become famous soon enough
Quote:
Originally Posted by jsquadrilla View Post
This is outputting "Array".
Then you need to var_dump() the result and see what the array contains. The function is probably returning several things in an array including the information you want.
__________________
Please wrap your code in [php] tags. It is a sticky topic and it HELPS us to HELP YOU!
TIP: Coding styles and $end errors :::::::::: TIP: Warning: Cannot modify header information - headers already sent :::::::::: TIP: Quotes / Parse error: syntax error, unexpected T_..
PHP Code:
//Please don't use this for your form processing:
if (isset($_POST['submit']))
//Internet explorer has a bug and does not always send the submit value. 
Explanation: The IE if(isset($_POST['submit'])) bug explained.
tangoforce is online now   Reply With Quote
Old 11-16-2011, 05:30 PM   PM User | #3
mlseim
Master Coder

 
mlseim's Avatar
 
Join Date: Jun 2003
Location: Cottage Grove, Minnesota
Posts: 9,045
Thanks: 8
Thanked 1,029 Times in 1,020 Posts
mlseim has a spectacular aura aboutmlseim has a spectacular aura aboutmlseim has a spectacular aura about
wow ... that is going to be a tough one.
I know you want to do this for free, but find out how much it would
cost to get an account with them, and possible ... they may have an
API or XML file that contains the data. Even if it costs some money
each month, it might be worth it. At least ask them about it.

Parsing their HTML is not going to be easy, but technically, it could be
done with enough scripting and enough time to debug it.


.
mlseim is offline   Reply With Quote
Old 11-16-2011, 06:04 PM   PM User | #4
jsquadrilla
New to the CF scene

 
Join Date: Nov 2011
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
jsquadrilla is an unknown quantity at this point
Quote:
Originally Posted by tangoforce View Post
Then you need to var_dump() the result and see what the array contains. The function is probably returning several things in an array including the information you want.
Now it says "array(0) { } Array"

Quote:
Originally Posted by mlseim View Post
wow ... that is going to be a tough one.
I know you want to do this for free, but find out how much it would
cost to get an account with them, and possible ... they may have an
API or XML file that contains the data. Even if it costs some money
each month, it might be worth it. At least ask them about it.

Parsing their HTML is not going to be easy, but technically, it could be
done with enough scripting and enough time to debug it.


.
Can't be that hard? I mean, I know barely nothing about scraping, but if the data is constant (table name stays the same etc.) it shouldn't be too difficult to figure out. Sadly, Google Docs does exactly what I want with the "importhtml" function. It puts it in a spreadsheet perfectly, so there has to be a way (if only I knew how importhtml functioned...)
jsquadrilla is offline   Reply With Quote
Old 11-16-2011, 06:26 PM   PM User | #5
jsquadrilla
New to the CF scene

 
Join Date: Nov 2011
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
jsquadrilla is an unknown quantity at this point
Minor success!

Code:
function get_data($url)
{
  $ch = curl_init();
  $timeout = 5;
  curl_setopt($ch,CURLOPT_URL,$url);
  curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
  curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}

$returned_content = get_data('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');

echo $returned_content;
When I run this, I get the exact page, but on my URL which is perfect. Now I just need to find a way to extract what I need from this page, which is the next step and where I'm completely lost.
jsquadrilla is offline   Reply With Quote
Old 11-16-2011, 08:10 PM   PM User | #6
Adee
Regular Coder

 
Join Date: Jul 2010
Location: Oregon City
Posts: 280
Thanks: 5
Thanked 50 Times in 49 Posts
Adee can only hope to improve
Quote:
Originally Posted by jsquadrilla View Post
Minor success!

Code:
function get_data($url)
{
  $ch = curl_init();
  $timeout = 5;
  curl_setopt($ch,CURLOPT_URL,$url);
  curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
  curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}

$returned_content = get_data('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');

echo $returned_content;
When I run this, I get the exact page, but on my URL which is perfect. Now I just need to find a way to extract what I need from this page, which is the next step and where I'm completely lost.

regular expressions. have fun with that!
Adee is offline   Reply With Quote
Old 11-16-2011, 08:19 PM   PM User | #7
jsquadrilla
New to the CF scene

 
Join Date: Nov 2011
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
jsquadrilla is an unknown quantity at this point
Quote:
Originally Posted by Adee View Post
regular expressions. have fun with that!
Any idea where I can start?

Using Simple HTML Dom, my code is only:

Code:
$html = file_get_html('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
	$e = $html->find("table", 2);	
	echo $e;
It displays the table, but still has href's, classes etc. I want to strip it all and just have the data. Just looking for a kick in the right direction for how to do that.

Last edited by jsquadrilla; 11-16-2011 at 08:22 PM..
jsquadrilla is offline   Reply With Quote
Old 11-16-2011, 08:46 PM   PM User | #8
Adee
Regular Coder

 
Join Date: Jul 2010
Location: Oregon City
Posts: 280
Thanks: 5
Thanked 50 Times in 49 Posts
Adee can only hope to improve
Is this what you're trying to do?
http://rs-downfall.com/scripts/cf/nhl.php
Adee is offline   Reply With Quote
Old 11-16-2011, 08:56 PM   PM User | #9
jsquadrilla
New to the CF scene

 
Join Date: Nov 2011
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
jsquadrilla is an unknown quantity at this point
Quote:
Originally Posted by Adee View Post
Is this what you're trying to do?
http://rs-downfall.com/scripts/cf/nhl.php
That's what I've got so far (minus the weird  that shows up on yours)

Basically, I want to put that information, as-is, into a database.

But, I believe I'd need to strip it down before I can.
jsquadrilla is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 03:56 PM.


Advertisement
Log in to turn off these ads.