Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 3 of 3
  1. #1
    New Coder
    Join Date
    Sep 2008
    Posts
    31
    Thanks
    4
    Thanked 0 Times in 0 Posts

    Googlebot is annoying the hell out of me

    Note: This question may not technically belong under HTML & CSS, but I wasn't sure where else to stick it, because it's kind of a general -- yet complicated --question. Here goes...

    We have a website that is made up of automotive listings. Feel free to check it out if you'd like. It's called theusedcarplace.com. The site was constructed based upon a script created by a company called Flynax. Aside from some bugs and broken links that needed to be fixed during the first month or two, the script has basically done what we need it to do. Specifically, a number of those broken links were caused by the script trying to load URLs beginning with www.www. (instead of just one www.). I managed to track down the majority of these after I set up a little PHP script to email me an error report each time a user attempted to load a non-existent file or page.

    In a possibly unrelated sidebar, I finally got the Apache rewrite mod to work on our site last week. The script developers had said all along that all I had to do was click one little button in the Admin section, but I could never get it to work -- it would just result in every link breaking -- until I went in and cleaned up the htaccess file, putting everything on individual lines. That fixed the problem, and the Apache rewrite is now functional.

    Since (but possibly unrelated to) that change, I've been getting thousands upon thousands of 404 Error Report emails each day. All of them look something like this...

    On Tue Nov 13 2012 11:02:54 am CST, 66.249.73.44 tried to load :

    http://www.www.theusedcarplace.com/f...6136719927.jpg

    User Agent = Googlebot-Image/1.0


    I don't see anything on the site or in the sitemap that has the www.www. prefix and as near as I can determine, these specific files that Googlebot is looking for have never existed.

    Can anyone give me an idea -- short of telling Googlebot to not index our site -- of how to get these emails to stop? Will Googlebot eventually realize that these files don't exist, or is there something broken in the structure of our site? Did my htaccess tinkering cause this, or is it just a coincidence?

    Any information at all is greatly appreciated.

    Thanks!

  • #2
    The fat guy next door VIPStephan's Avatar
    Join Date
    Jan 2006
    Location
    Halle (Saale), Germany
    Posts
    8,635
    Thanks
    6
    Thanked 1,003 Times in 976 Posts
    Quote Originally Posted by jcx1028 View Post
    Can anyone give me an idea -- short of telling Googlebot to not index our site -- of how to get these emails to stop?
    Remove your little PHP script that e-mails you the error reports.
    You could as well sign up for Google Webmaster tools or similar statistic tools that show you which pages/links are broken and where the bot came from. Also, you can set up a robots.txt file to tell Googlebot what and what not to index. And in the webmaster tools you can also enter any URLs that are supposed to be removed from Google’s index.

    And if you post your htaccess file I’m sure someone savvy can tell you something about it, too.

  • #3
    New to the CF scene
    Join Date
    Nov 2012
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    You probably have something that links to these resources.
    (perhaps some URL you are unaware of or some script that generates links by mistake)

    You can find out more by setting up a Google Webmaster Tools account. After a day or two you should start seeing some crawling errors, for each you also will be able to see the page the links to the nonexistent urls.

    If you don't get any crawl errors, then you should see if it's really google that visits you or some kind of impersonator. (Spam bots and SEO monitoring tools love using Google user-agents to bypass security and/or get a "Google-view" of the site)

    For this you can use Bototpedia Googlebot IP verification tool.

    Just put the visiting IP in the search box and it will tell you if its valid for Googlebot, or not.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •