...

View Full Version : Feeding from WEBSITES of my choice...



blyzard
04-26-2004, 02:38 PM
Hi!

I hope this is the appropriate place to put this coz PHP can do anything. If this is not an appropriate board then ADMINs have Full RIGHT to KILL :( my posting.
............................nyways, I want to write a script in PHP so that I can post news or any info on my website from different web sources, like CNN, BBC etc etc, and also I should have control on its layout.
I have seen many paid services like moreover.com etc, but would like to do something of my own because i am learning PHP and would like any DIRECTIONS ON HOW TO START. ALL I AM AKSING for is to be my NAVIGATOR.....................i will be the Driver and will have my CAR and WILL put my GAS. :) My car is running on PHP+MYSQL. ........

Appreciate it.
bLyZ
[ FIGHTCLUB is my food for soul....... :thumbsup: ]

firepages
04-26-2004, 04:22 PM
As you can guess we are not going to tell you how to grab copyrighted content as not only could it get both you and us into strife, its also bad juju in general.

Luckily ;) , many sites with content worth borrowing have ways and means to let you get at their headlines/content easily and legally , normally they are RSS/RDF XML feeds , if you like PHP so much , pop over to PHP.net as they have their headlines available in XML format & probably some pointers to common RSS/RDF parsing routines (or google for `PHP RDF RSS parser`)

blyzard
04-26-2004, 07:13 PM
Well! google and yahoo have news from different web sources.
:confused:

missing-score
04-26-2004, 07:46 PM
Google and yahoo probably have permission to use the sources.. becuase a big company like either of them could get into alot of trouble if they didnt...

shlagish
04-26-2004, 11:25 PM
similarily, I would like to be able to have the hockey scores updated on my site without having to enter them myself (as in getting them from another site)...
of course, I doubt the hockey scores are copyrighted...
so is there a way to get the http://www.rds.ca source and grab the score for the Tampa Bay Montreal Series ?

boeing747fp
04-26-2004, 11:52 PM
try http://mikenew.net/mini-fetch.php

shlagish
04-27-2004, 01:57 AM
will do, thanks :)

shlagish
04-27-2004, 02:00 AM
oh, but I have to pay for that, I want to learn how to do it myself...

bcarl314
04-27-2004, 02:34 AM
Just to follow up on the copyright issue, if your using this for personal reasons, there is the often forgotten "fair-use" clause in copyright law. As long as you don't try to make money off the information, you have some latitude.

I personnaly have a site that grabs headlines (parses HTML for h1-h3 tags and reads RSS feeds) and puts it on my "personal information" page. Then I have it password protected. Sure it's not super secure, but if anyone cme crying to me about no permission to use blah blah blah, I'd throw the fair use right back at them.

At least that's the case in the US. Not sure about the rest of the world.

Oh and one caveat, I don't "Break in" to other sites to grab content, simply read the existing public information. It's really no more than an automatic browser of sorts. :D

shlagish
04-27-2004, 03:12 AM
Are you saying I wouldn't be able to get the hockey scores and put em on my website automatically?

But anyways, is there a way to do it? If there is, where should I start?
Thanks
btw, I agree about the copyright thing, but some things just can't be copywrittable..

bcarl314
04-27-2004, 01:15 PM
Well, it depends, if the site has an RSS feed, you'll need to parse that. Otherwise, your need to putz with regular expressions and parse their HTML page. VERY SLOW but it does the job.

Here's some code I use:



<?php
$fh = fopen("http://www.yoursite.com/page.html","r");
print "<ul class=\"contentTxt\">";
$d="";
while($data = fgets($fh)) {
//strip tabs and line returns along with other markup I dont care about
$data = preg_replace("/(\r|\n|\t|<b>|<\/b>|<img.*?>)*/","",$data);
$d.=$data;
}
//grab the data between <font size=+2> tags
preg_match_all("/<font size=\"\+2\"><a href=\"(.*?)\">(.*?)<\/font>/",$d,$matches);
for($i=0; $i<count($matches[1]);$i++) {
print "<li><a href=\"http://ssa.usps.gov/redir.php?url=http://blue.usps.gov/news/link/".$matches[1][$i]."\">".$matches[2][$i]."</a></li>";
}
print "</ul>";
?>

shlagish
04-27-2004, 10:08 PM
hmm, so basically read the file and regexp your way to the desired part... makes sence. A little complexe though, anyways.

$data = reg_replace("/(\r|\n|\t|<b>|<\/b>|<img.*?>)*/","",$data);
what is this line checking for exactly?

missing-score
04-27-2004, 10:12 PM
that line checks for an removes:

\r (return)
\n (newline)
\t (tab)
<b></b> Tags
<img /> tags

shlagish
04-27-2004, 10:19 PM
thanks:


preg_match_all("/<font size=\"\+2\"><a href=\"(.*?)\">(.*?)<\/font>/",$d,$matches);
I'm guessing this line takes all occurence of:
<font size="+2">
<a href="any text">
any text
</font>
in $d
and stores it in an array ($matches)

Am I guessing correctly?

missing-score
04-27-2004, 10:23 PM
yup, yo got it

shlagish
04-27-2004, 11:37 PM
so $matches[3] could return, for example: "<a href="http://www.google.ca/">"
?



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum