Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 11 of 11
  1. #1
    New Coder
    Join Date
    Mar 2010
    Posts
    11
    Thanks
    1
    Thanked 0 Times in 0 Posts

    robots.txt is this wrong?

    Hi,

    I launched a website about ten days ago, google hasn't picked it up yet, even though I've put got three or four fairly reasonable links from pages with a high page rank, which got picked up overnight.

    I'm wondering if the robot files is stopping them from crawling the site, I'd really appreciate it someone who knows more than me could take a look at the robot file:

    Code:
    # robots.txt
    
    User-agent: Googlebot
    Disallow: 
    User-agent: googlebot-image
    Disallow: 
    User-agent: googlebot-mobile
    Disallow: 
    User-agent: MSNBot
    Disallow: 
    User-agent: Slurp
    Disallow: 
    User-agent: Teoma
    Disallow: 
    User-agent: Gigabot
    Disallow: 
    User-agent: Robozilla
    Disallow: 
    User-agent: yahoo-mmcrawler
    Disallow: 
    User-agent: psbot
    Disallow: 
    User-agent: yahoo-blogs/v3.9
    Disallow: 
    User-agent: *
    Disallow: /
    thanks in advance all.

  • #2
    Senior Coder
    Join Date
    Jul 2009
    Location
    South Yorkshire, England
    Posts
    2,318
    Thanks
    6
    Thanked 304 Times in 303 Posts

  • #3
    The Apostate Apostropartheid's Avatar
    Join Date
    Oct 2007
    Posts
    3,215
    Thanks
    16
    Thanked 265 Times in 263 Posts
    Why do you need a robots.txt file again...?

  • #4
    Master Coder Excavator's Avatar
    Join Date
    Dec 2006
    Location
    Alaska
    Posts
    9,675
    Thanks
    22
    Thanked 1,827 Times in 1,811 Posts
    Hello adamclark,
    Isn't that last line stopping all bots from visiting? That / should not be there I think...
    Code:
    User-agent: *
    Disallow: /
    Apostropartheid is right though... why have a robots.txt at all if you aren't wanting to stop the bots from crawling you?
    Validate often DURING development - Use it like a splelchecker | Debug during Development |Write it for FireFox, ignore IE
    Use the right DocType | Validate your markup | Validate your CSS | Why validating is good | Why tables are bad

  • #5
    Senior Coder
    Join Date
    Jul 2009
    Location
    South Yorkshire, England
    Posts
    2,318
    Thanks
    6
    Thanked 304 Times in 303 Posts
    Quote Originally Posted by Excavator View Post
    Isn't that last line stopping all bots from visiting?
    Not quite. It prevents any bots which aren't implicitly named in the list above it from indexing.

  • #6
    New Coder
    Join Date
    Nov 2009
    Location
    Phoenix
    Posts
    17
    Thanks
    1
    Thanked 1 Time in 1 Post
    The robots.txt file only stops spiders that pay attention to the robots.txt file. Its use by bots is voluntary and it will never stop a bad bot. There is nothing on a server that will force a bot to obey this file.

    Code:
    User-agent: *
    Disallow: /
    This may prevent some spiders from indexing the site because it is the last directive and is wild-carded. It depends upon how the spider interprets the file. I've been doing SEO work since 1997 and I would not consider using a robots.txt file that looks like that. The results are unpredictable.

    There is an error with the format of the file. In the robots.txt file, a record is delineated by a blank line. Each directive starting with "User-agent" is a separate record should therefore be separated from a prior directive by a blank line. You should not combine all of the directives into a single record. You can have as many Disallow statements as you wish in each record.

    The most basic format for this file is simply the following single record directive:

    Code:
    User-agent: *
    Disallow:
    This invites all spiders to visit the site and does not block any files or directories.

    If you want Google to visit a new site, try setting up a couple of links to the site. Sometimes Google will not index a site until it finds a path from another site.

    Another trick to get Google to visit is to use a pinging service, such as pingler.com.
    Last edited by TopDogger; 03-25-2010 at 02:20 AM.

  • #7
    New to the CF scene
    Join Date
    Mar 2010
    Posts
    9
    Thanks
    0
    Thanked 1 Time in 1 Post
    don't use those robots txt at all unless you want to hide something from Google... and let there be happiness )))

  • #8
    New Coder
    Join Date
    Mar 2010
    Posts
    11
    Thanks
    1
    Thanked 0 Times in 0 Posts
    thanks for your replys everyone.

    maybe i'll just remove the robot file all together. I don't have anything to hide on my website.

    it's just that i went on a webmaster course recently and he recommended using one to prevent all but the big names from crawling the site.

    he said there were a small minority of robots that did not visit your site with good intentions

    i already have a few links from other sites pointing to mine, these links were picked up very quickly (overnight), but for some reason the bots haven't followed them to my site (i did pick do follow, high page ranking sites to link from)

    maybe i've been added to the 'to crawl' list!

    so the general consensus seems to be that you don't really need a robots file?

  • #9
    Senior Coder
    Join Date
    Jul 2009
    Location
    South Yorkshire, England
    Posts
    2,318
    Thanks
    6
    Thanked 304 Times in 303 Posts
    Quote Originally Posted by adamclark View Post
    he said there were a small minority of robots that did not visit your site with good intentions
    They'll pay piss all attention to a robots file.


    so the general consensus seems to be that you don't really need a robots file?
    Pretty much.

  • #10
    New Coder
    Join Date
    Mar 2010
    Posts
    11
    Thanks
    1
    Thanked 0 Times in 0 Posts
    fair point, decision made.

    cheers

  • #11
    New Coder
    Join Date
    Nov 2009
    Location
    Phoenix
    Posts
    17
    Thanks
    1
    Thanked 1 Time in 1 Post
    You do not need a robots.txt file, but it may a good idea to use one. If it's missing, the spider requests will generate 404 errors on your server and spiders can get redirected to an error page. The generally accepted SEO practice is to use a robots.txt file.

    My recommendation is to use the basic robots.txt file, rather than none at all.

    Code:
    User-agent: *
    Disallow:


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •