I wonder if anyone can tell me what code I can add to web pages I don't want to be indexed by web spiders. I have heard that this is possible, but I don't know how to do it exactly. Can anyone help me?

<meta name="robots" content="noindex,nofollow" />

or to have the links followed without having the page indexed:

<meta name="robots" content="noindex,follow" />

i'm curious why some want to avoid robots. what's the downside?


The META robots is almost entirely useless/disused. Very few robots today even look for it. The only reliable way to control spiders is to use a ROBOTS.TXT file and give explicit instructions for all bots to not visit certain files/areas on your site, or explicit instructions for only certain bots....

For example, the attached file is what I've used on shadowstorm for years, it blocks all bad spiders from indexing the site (of course, I could write a bot that specifically ignores the robots.txt orders, anyone could...so it only blocks the common siphons/harvesters and naughty spiders).


And apple...you block certan bots/spiders who you know do bad things, like email harvesting or content siphoning. You can also block certain search engines from indexing certain areas of your site (for example, block google from viewing any area of your site that is not built specifically for best google results...though this method is not as efficient any longer, considering improved spider redirects)

now i guess i'd like to know the upside of being crawled.


Do these spiders start at your IP and follow links? I have a site that has password protected pages (using JS) but the pages can be acsessed by simply typing the address. Can the spiders see these pages or not?

Applesauce: the reason why I don't want the site to be indexed by robots is simple... It is a website about a party and the birthday boy has a kind of 'high profile' profession and doesn't want outsiders to find this kind of information too easily by typing in some keywords...

Crawlers tend to avoid indexing JavaScript-generated content.

In my HTML() bookmarklet library (http://www.angelfire.com/ca/redwards/html__.calendar.html)'s "roll your own" example, the text of the page generated by
document.write(('Untitled'.TITLE().HEAD()+'Building HTML pages is easy!'.P().BODY()).HTML()) generally wouldn't be indexed by web crawlers. However, URL literals can usually be recognized inside of JavaScript code, unless they are assembled by the script itself.

I often use
"No.Spam(a)Thank.You".split("(a)").join("@") for mailto: links.