View Full Version : Howto extract a webpage data using php and show on html ?

05-19-2004, 11:07 AM
Can anyone show me an example, that
using a php (not vbs, hope to run on linux) to extract certain predefined
lines (actually news) on another internet site page and then show on my
own web page in html table ??

news site: http://www.rthk.org.hk/rthk/news/expressnews/ (sorry big5)

I want to show dynamic news info like the util http://www.samurize.com

05-19-2004, 11:37 AM
This is Javascript forum and not PHP forum. Wait for the mod to move it there.

05-19-2004, 11:39 AM

Wrongly posted.....


05-19-2004, 05:47 PM
I would use something like cURL or Snoopy. This will get you the html source. Then I think you can use your own methods to grab the data you want, perhaps regular expressions or something.

I was going to do something like this at one time, but then I realized that my solution was highly dependent on their web layout. If they changed ONE little thing, it could very possibly break my code.

But yea, do some research on cURL and Snoopy and I'm sure you'll be on the track you want to be on. Then if you have any specific questions/problems, post them in the PHP forum.

Good luck,

05-20-2004, 02:36 AM

Both cURL and Snoopy need install to Linux, windows cannot ?

05-20-2004, 07:29 AM
I've installed cURL on Windows... I don't know about Snoopy, but you should be able to..


Let me know if you can't figure it out.


05-20-2004, 09:14 AM

But just wonder if a php script cannot do the extraction itself without installing any software ??

Sorry I am newbie to php. :o

05-20-2004, 05:26 PM
It can, but it's not easy and that's the reason cURL and Snoopy were written (libraries to encapsulate this).

I remember coming across links on how to do it, but it's not the easiest thing to do. Perhaps someone else can suggest how to do it without cURL or Snoopy. My suggestion would be to look at cURL or Snoopy's source code, but that might not be the easiest thing for you either!

I would say to get one of them and look at some online tutorials and you should be on your way.


05-21-2004, 04:00 AM
I have try with a single new.php to read all the html source code (text) from a web. (without using cURL and Snoopy - that I don't want to use - I only use fopen and fget).

I still have some query:

1) my web first page in a index.html and contain javascripts and java applets.
how to call the new.php and get the data return into my index.html ?
2) if I change the first page to index.php (and get the data from the site
required, extract data in a array etc) ; how php can handle my original
javascripts and java applets ?

Hope any one advise a little bit detail of sample script. ;)

Thanks very much.

05-21-2004, 11:45 AM
Snoopy (http://snoopy.sourceforge.net) is not a server library/dll but simply a script using raw PHP functions todo the fetching , its a helper class for exactly what you want to do.

if you want to fetch HTTPS then snoopy requires cURL else should work as is (on any platform) , check out the sample in the download.