View Full Version : Feeding from WEBSITES of my choice...

04-26-2004, 03:38 PM

I hope this is the appropriate place to put this coz PHP can do anything. If this is not an appropriate board then ADMINs have Full RIGHT to KILL :( my posting.
............................nyways, I want to write a script in PHP so that I can post news or any info on my website from different web sources, like CNN, BBC etc etc, and also I should have control on its layout.
I have seen many paid services like moreover.com etc, but would like to do something of my own because i am learning PHP and would like any DIRECTIONS ON HOW TO START. ALL I AM AKSING for is to be my NAVIGATOR.....................i will be the Driver and will have my CAR and WILL put my GAS. :) My car is running on PHP+MYSQL. ........

Appreciate it.
[ FIGHTCLUB is my food for soul....... :thumbsup: ]

04-26-2004, 05:22 PM
As you can guess we are not going to tell you how to grab copyrighted content as not only could it get both you and us into strife, its also bad juju in general.

Luckily ;) , many sites with content worth borrowing have ways and means to let you get at their headlines/content easily and legally , normally they are RSS/RDF XML feeds , if you like PHP so much , pop over to PHP.net as they have their headlines available in XML format & probably some pointers to common RSS/RDF parsing routines (or google for `PHP RDF RSS parser`)

04-26-2004, 08:13 PM
Well! google and yahoo have news from different web sources.

04-26-2004, 08:46 PM
Google and yahoo probably have permission to use the sources.. becuase a big company like either of them could get into alot of trouble if they didnt...

04-27-2004, 12:25 AM
similarily, I would like to be able to have the hockey scores updated on my site without having to enter them myself (as in getting them from another site)...
of course, I doubt the hockey scores are copyrighted...
so is there a way to get the http://www.rds.ca source and grab the score for the Tampa Bay Montreal Series ?

04-27-2004, 12:52 AM
try http://mikenew.net/mini-fetch.php

04-27-2004, 02:57 AM
will do, thanks :)

04-27-2004, 03:00 AM
oh, but I have to pay for that, I want to learn how to do it myself...

04-27-2004, 03:34 AM
Just to follow up on the copyright issue, if your using this for personal reasons, there is the often forgotten "fair-use" clause in copyright law. As long as you don't try to make money off the information, you have some latitude.

I personnaly have a site that grabs headlines (parses HTML for h1-h3 tags and reads RSS feeds) and puts it on my "personal information" page. Then I have it password protected. Sure it's not super secure, but if anyone cme crying to me about no permission to use blah blah blah, I'd throw the fair use right back at them.

At least that's the case in the US. Not sure about the rest of the world.

Oh and one caveat, I don't "Break in" to other sites to grab content, simply read the existing public information. It's really no more than an automatic browser of sorts. :D

04-27-2004, 04:12 AM
Are you saying I wouldn't be able to get the hockey scores and put em on my website automatically?

But anyways, is there a way to do it? If there is, where should I start?
btw, I agree about the copyright thing, but some things just can't be copywrittable..

04-27-2004, 02:15 PM
Well, it depends, if the site has an RSS feed, you'll need to parse that. Otherwise, your need to putz with regular expressions and parse their HTML page. VERY SLOW but it does the job.

Here's some code I use:

$fh = fopen("http://www.yoursite.com/page.html","r");
print "<ul class=\"contentTxt\">";
while($data = fgets($fh)) {
//strip tabs and line returns along with other markup I dont care about
$data = preg_replace("/(\r|\n|\t|<b>|<\/b>|<img.*?>)*/","",$data);
//grab the data between <font size=+2> tags
preg_match_all("/<font size=\"\+2\"><a href=\"(.*?)\">(.*?)<\/font>/",$d,$matches);
for($i=0; $i<count($matches[1]);$i++) {
print "<li><a href=\"http://ssa.usps.gov/redir.php?url=http://blue.usps.gov/news/link/".$matches[1][$i]."\">".$matches[2][$i]."</a></li>";
print "</ul>";

04-27-2004, 11:08 PM
hmm, so basically read the file and regexp your way to the desired part... makes sence. A little complexe though, anyways.

$data = reg_replace("/(\r|\n|\t|<b>|<\/b>|<img.*?>)*/","",$data);
what is this line checking for exactly?

04-27-2004, 11:12 PM
that line checks for an removes:

\r (return)
\n (newline)
\t (tab)
<b></b> Tags
<img /> tags

04-27-2004, 11:19 PM

preg_match_all("/<font size=\"\+2\"><a href=\"(.*?)\">(.*?)<\/font>/",$d,$matches);
I'm guessing this line takes all occurence of:
<font size="+2">
<a href="any text">
any text
in $d
and stores it in an array ($matches)

Am I guessing correctly?

04-27-2004, 11:23 PM
yup, yo got it

04-28-2004, 12:37 AM
so $matches[3] could return, for example: "<a href="http://www.google.ca/">"