...

View Full Version : robots.txt is this wrong?



adamclark
03-24-2010, 08:52 PM
Hi,

I launched a website about ten days ago, google hasn't picked it up yet, even though I've put got three or four fairly reasonable links from pages with a high page rank, which got picked up overnight.

I'm wondering if the robot files is stopping them from crawling the site, I'd really appreciate it someone who knows more than me could take a look at the robot file:



# robots.txt

User-agent: Googlebot
Disallow:
User-agent: googlebot-image
Disallow:
User-agent: googlebot-mobile
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: Teoma
Disallow:
User-agent: Gigabot
Disallow:
User-agent: Robozilla
Disallow:
User-agent: yahoo-mmcrawler
Disallow:
User-agent: psbot
Disallow:
User-agent: yahoo-blogs/v3.9
Disallow:
User-agent: *
Disallow: /


thanks in advance all.

MattF
03-24-2010, 09:00 PM
http://www.robotstxt.org/

Apostropartheid
03-24-2010, 09:24 PM
Why do you need a robots.txt file again...?

Excavator
03-24-2010, 10:22 PM
Hello adamclark,
Isn't that last line stopping all bots from visiting? That / should not be there I think...

User-agent: *
Disallow: /

Apostropartheid is right though... why have a robots.txt at all if you aren't wanting to stop the bots from crawling you?

MattF
03-24-2010, 10:42 PM
Isn't that last line stopping all bots from visiting?

Not quite. It prevents any bots which aren't implicitly named in the list above it from indexing.

TopDogger
03-25-2010, 02:09 AM
The robots.txt file only stops spiders that pay attention to the robots.txt file. Its use by bots is voluntary and it will never stop a bad bot. There is nothing on a server that will force a bot to obey this file.



User-agent: *
Disallow: /


This may prevent some spiders from indexing the site because it is the last directive and is wild-carded. It depends upon how the spider interprets the file. I've been doing SEO work since 1997 and I would not consider using a robots.txt file that looks like that. The results are unpredictable.

There is an error with the format of the file. In the robots.txt file, a record is delineated by a blank line. Each directive starting with "User-agent" is a separate record should therefore be separated from a prior directive by a blank line. You should not combine all of the directives into a single record. You can have as many Disallow statements as you wish in each record.

The most basic format for this file is simply the following single record directive:



User-agent: *
Disallow:


This invites all spiders to visit the site and does not block any files or directories.

If you want Google to visit a new site, try setting up a couple of links to the site. Sometimes Google will not index a site until it finds a path from another site.

Another trick to get Google to visit is to use a pinging service, such as pingler.com.

timsoulo
03-25-2010, 02:19 PM
don't use those robots txt at all unless you want to hide something from Google... and let there be happiness :))))

adamclark
03-25-2010, 06:51 PM
thanks for your replys everyone.

maybe i'll just remove the robot file all together. I don't have anything to hide on my website.

it's just that i went on a webmaster course recently and he recommended using one to prevent all but the big names from crawling the site.

he said there were a small minority of robots that did not visit your site with good intentions

i already have a few links from other sites pointing to mine, these links were picked up very quickly (overnight), but for some reason the bots haven't followed them to my site (i did pick do follow, high page ranking sites to link from)

maybe i've been added to the 'to crawl' list!

so the general consensus seems to be that you don't really need a robots file?

MattF
03-25-2010, 06:55 PM
he said there were a small minority of robots that did not visit your site with good intentions

They'll pay piss all attention to a robots file.




so the general consensus seems to be that you don't really need a robots file?

Pretty much.

adamclark
03-25-2010, 07:17 PM
fair point, decision made.

cheers

TopDogger
03-26-2010, 04:26 AM
You do not need a robots.txt file, but it may a good idea to use one. If it's missing, the spider requests will generate 404 errors on your server and spiders can get redirected to an error page. The generally accepted SEO practice is to use a robots.txt file.

My recommendation is to use the basic robots.txt file, rather than none at all.



User-agent: *
Disallow:



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum