PDA

View Full Version : finding href URLs


boeing747fp
08-16-2005, 05:25 PM
im trying to make a script that follows all <a href="" tags to follow the links... however ive run into a problem when it comes to where people just put filenames and not full URLs inside href="".... is there a way to search a string and see if http://$domain exists in it, without ruining it so that if someone did put a full remote URL like http://remotesite.com/filename.ext in the href="" and then following the new value?

raf
08-16-2005, 11:55 PM
i don't realy understand why you don't just take the value from the href. What does it matter if the added the http:// or not? you can first look if the http:// was included, and if not, just add it.

also, what do you mean with 'follow'? like a crawler?

boeing747fp
08-18-2005, 02:24 AM
yes. a crawler... im trying to make something similar to http://www.phpdig.net

anshul
08-18-2005, 09:44 PM
Ambitious target you've?

PCRE functions of PHP will help you.
First include() the file in your PHP script and get its text source of file_get_contents() etc.