View Full Version : finding href URLs
boeing747fp
08-16-2005, 05:25 PM
im trying to make a script that follows all <a href="" tags to follow the links... however ive run into a problem when it comes to where people just put filenames and not full URLs inside href="".... is there a way to search a string and see if http://$domain exists in it, without ruining it so that if someone did put a full remote URL like http://remotesite.com/filename.ext in the href="" and then following the new value?
i don't realy understand why you don't just take the value from the href. What does it matter if the added the http:// or not? you can first look if the http:// was included, and if not, just add it.
also, what do you mean with 'follow'? like a crawler?
boeing747fp
08-18-2005, 02:24 AM
yes. a crawler... im trying to make something similar to http://www.phpdig.net
anshul
08-18-2005, 09:44 PM
Ambitious target you've?
PCRE functions of PHP will help you.
First include() the file in your PHP script and get its text source of file_get_contents() etc.
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.