...

View Full Version : Need to create a bot



mattcuckston
03-05-2010, 03:33 PM
Hi all,

Sorry if this is in the wrong section but i know basic php so if possible, would like to make the script i need in this language.

We have a program which is being used by a number of websites. It's basically just a link through to our site but each one is different. However it will always contain 'mysite.com'.

I have a list of domain names (approx 7'000) and i need to check if any of these contain the link to our site.

Is there a way i can make this in Php or can someone reccommend how i do this?

Many thanks!

Matt :D

JAY6390
03-05-2010, 04:21 PM
Sure you can. You just need to load each site individually, check for the URL, and save the report to either a file, the screen or a database. With that many domains you will need to increase the default time limit the script can run or it will stop short (more than likely) of your 7000 domains

mattcuckston
03-05-2010, 04:32 PM
Hi,

Thanks for your response. Any idea how i would even start a script to open each page and search for the link?

Sorry... like i say, limited on php knowledge.

Thanks!

Matt

JAY6390
03-05-2010, 04:36 PM
to load page data -> file_get_contents() (http://php.net/manual/en/function.file-get-contents.php) or fopen() (http://php.net/manual/en/function.fopen.php), fread() (http://php.net/manual/en/function.fread.php), fclose() (http://php.net/manual/en/function.fclose.php)
looping -> foreach (http://php.net/manual/en/control-structures.foreach.php)

mattcuckston
03-05-2010, 04:53 PM
Thats great - thanks for your help. I'll give this a try and if i encounter any problems, i'm sure someone will be able to help.

Thanks!

Matt

mattcuckston
03-10-2010, 10:16 PM
Hi,

I'm wondering if someone can help me. Jay was very kind to give me some references but i think this is to advanced for me.

Can someone point me in the right direction of how this script would work exactly?

Many thanks!

mlseim
03-10-2010, 10:34 PM
Will your link aways appear on the main page of those 7000 sites?
To crawl through each site (recursive - through all pages) would take a huge
amount of server power and time.

mattcuckston
03-10-2010, 10:35 PM
99% of them yes will apear on the main front page so i dont need it to crawl as i dont have to worry about that margin of error.

Many thanks!

Matt

Azzaboi
03-10-2010, 11:49 PM
<?php
$recip = 'http://www.domain.com'; // this is the reciprocal url... that EXACTLY must match
$filename = 'links1.txt'; //File with sites where your link is suppose to be 1 per line
$found = 0;
$notfound = 0;
function backlinkCheck($siteurl, $recip) {
if ($arrText = file($siteurl)){
for ($i=0; $i<count($arrText); $i++) {
$text = $text . $arrText[$i];
}
if (eregi($recip, $text)) {
fclose ($fd);
} else {
return false; // set false if cklinbak is missing
fclose ($fd);
}
}
return false;
}
echo '<h2>Link Checker</h2>';
echo '<p> This will check if the text '.$recip .' is found on the webpages</p><hr>';
$file_contents=file($filename);
for ( $i=0; $i < sizeof($file_contents); $i++) {
$line = ($file_contents[$i]);
$line = trim($line);
$siteurl=$line;
if (backlinkCheck($siteurl, $recip)) {
echo '<p>Backlink was <b>FOUND</b> on: '.$siteurl."</p>\n\n";
$found++;
} else {
echo '<p>Backlink was <b>NOT FOUND</b> on: '.$siteurl."</p>\n\n";
$notfound++;
}
}
echo 'Total Found '.$found .'<br>';
echo 'Total Not Found '.$notfound .'<br>';
echo 'Total Links Checked '.($notfound+$found).'<br>';
echo 'Total Not Found '.$notfound .'<br>';
echo 'Total Links Checked '.($notfound+$found).'<br>';
?>

mattcuckston
03-11-2010, 12:14 AM
Hi,

Thats amazing, thank you. I have just tried it however and i get some errors appearing.

It comes up with


Warning: file() [function.file]: URL file-access is disabled in the server configuration in /home/accounts/public_html/test1.php on line 15

Warning: file(http://www.booking.com) [function.file]: failed to open stream: no suitable wrapper could be found in /home/accounts/public_html/test1.php on line 15

On line 15 i have
if ($arrText = file($siteurl)){

Any ideas?

Thanks!

Azzaboi
03-11-2010, 12:27 AM
URL file-access is disabled in the server configuration

It's in your php.ini, but I don't recommend changing it.

Any upgrades past PHP 4 will turn allow_url_fopen to OFF as default due to security concerns. This is most prevalent in cross-site scripting attacks, or XSS attacks. In some cases, malicious users have even enslaved a server to become a spam-email-sending nightmare: all without the administrator noticing.

Try use Relative File Paths instead and cut out the domain name all together?

The function 'eregi' might of been deprecated for 'preg_match'?

mattcuckston
03-11-2010, 01:05 AM
Hi,

Okay - i've temperarly activated the "allow_url_fopen" however it still doesnt seem to be working.

I now get an error reading

Warning: fclose(): supplied argument is not a valid stream resource in /home/accounts/public_html/test1.php on line 20

I just did a simple one searching for http://news.bbc.co.uk on http://news.bbc.co.uk but it says it found no references.

Thanks in advance!

Azzaboi
03-11-2010, 07:19 PM
For the backlinkCheck function:


if (eregi($recip, $text)) {
return true;
} else {
return false; // set false if cklinbak is missing
}

Im not the best php coder, still new. What I was trying to do was use fclose to close an open file pointer before returning from the function. Maybe someone else can provide more advance coding, it was just a quick example to get you started.

_Aerospace_Eng_
03-11-2010, 07:22 PM
For more compatibility among servers and without having to change the ini file you should probably use curl. It will do kind of the same thing as file_get_contents. There are many examples of how to use curl out there. You just need to search.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum