...

View Full Version : Googlebot is annoying the hell out of me



jcx1028
11-13-2012, 05:12 PM
Note: This question may not technically belong under HTML & CSS, but I wasn't sure where else to stick it, because it's kind of a general -- yet complicated --question. Here goes...

We have a website that is made up of automotive listings. Feel free to check it out if you'd like. It's called theusedcarplace.com (http://www.theusedcarplace.com). The site was constructed based upon a script created by a company called Flynax. Aside from some bugs and broken links that needed to be fixed during the first month or two, the script has basically done what we need it to do. Specifically, a number of those broken links were caused by the script trying to load URLs beginning with www.www. (instead of just one www.). I managed to track down the majority of these after I set up a little PHP script to email me an error report each time a user attempted to load a non-existent file or page.

In a possibly unrelated sidebar, I finally got the Apache rewrite mod to work on our site last week. The script developers had said all along that all I had to do was click one little button in the Admin section, but I could never get it to work -- it would just result in every link breaking -- until I went in and cleaned up the htaccess file, putting everything on individual lines. That fixed the problem, and the Apache rewrite is now functional.

Since (but possibly unrelated to) that change, I've been getting thousands upon thousands of 404 Error Report emails each day. All of them look something like this...

On Tue Nov 13 2012 11:02:54 am CST, 66.249.73.44 tried to load :

http://www.www.theusedcarplace.com/files/11-2012/ad6735/large_1352016416136719927.jpg

User Agent = Googlebot-Image/1.0

I don't see anything on the site or in the sitemap that has the www.www. prefix and as near as I can determine, these specific files that Googlebot is looking for have never existed.

Can anyone give me an idea -- short of telling Googlebot to not index our site -- of how to get these emails to stop? Will Googlebot eventually realize that these files don't exist, or is there something broken in the structure of our site? Did my htaccess tinkering cause this, or is it just a coincidence?

Any information at all is greatly appreciated.

Thanks!

VIPStephan
11-13-2012, 05:41 PM
Can anyone give me an idea -- short of telling Googlebot to not index our site -- of how to get these emails to stop?

Remove your little PHP script that e-mails you the error reports.
You could as well sign up for Google Webmaster tools or similar statistic tools that show you which pages/links are broken and where the bot came from. Also, you can set up a robots.txt file to tell Googlebot what and what not to index. And in the webmaster tools you can also enter any URLs that are supposed to be removed from Google’s index.

And if you post your htaccess file I’m sure someone savvy can tell you something about it, too.

Igal-Incapsula
11-15-2012, 08:18 AM
You probably have something that links to these resources.
(perhaps some URL you are unaware of or some script that generates links by mistake)

You can find out more by setting up a Google Webmaster Tools account. After a day or two you should start seeing some crawling errors, for each you also will be able to see the page the links to the nonexistent urls.

If you don't get any crawl errors, then you should see if it's really google that visits you or some kind of impersonator. (Spam bots and SEO monitoring tools love using Google user-agents to bypass security and/or get a "Google-view" of the site)

For this you can use Bototpedia Googlebot IP verification (http://www.botopedia.org/user-agent-list/search-bots/googlebot) tool.

Just put the visiting IP in the search box and it will tell you if its valid for Googlebot, or not.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum