...

View Full Version : hide from web spiders



wwwolf
01-01-2003, 02:04 PM
Hello,
I wonder if anyone can tell me what code I can add to web pages I don't want to be indexed by web spiders. I have heard that this is possible, but I don't know how to do it exactly. Can anyone help me?
Thanks!
Wolf

PauletteB
01-01-2003, 03:18 PM
<meta name="robots" content="noindex,nofollow" />

or to have the links followed without having the page indexed:

<meta name="robots" content="noindex,follow" />

wwwolf
01-01-2003, 03:32 PM
Great! Thanks!
Wolf

applesauce
01-02-2003, 06:40 PM
i'm curious why some want to avoid robots. what's the downside?

thanks

Feyd
01-02-2003, 07:01 PM
The META robots is almost entirely useless/disused. Very few robots today even look for it. The only reliable way to control spiders is to use a ROBOTS.TXT file and give explicit instructions for all bots to not visit certain files/areas on your site, or explicit instructions for only certain bots....

For example, the attached file is what I've used on shadowstorm for years, it blocks all bad spiders from indexing the site (of course, I could write a bot that specifically ignores the robots.txt orders, anyone could...so it only blocks the common siphons/harvesters and naughty spiders).

http://www.robotstxt.org/wc/robots.html
http://www.searchengineworld.com/robots/robots_tutorial.htm

And apple...you block certan bots/spiders who you know do bad things, like email harvesting or content siphoning. You can also block certain search engines from indexing certain areas of your site (for example, block google from viewing any area of your site that is not built specifically for best google results...though this method is not as efficient any longer, considering improved spider redirects)

applesauce
01-02-2003, 07:03 PM
thanks!

now i guess i'd like to know the upside of being crawled.

:D

Jacob-Bushnell
01-02-2003, 07:48 PM
Do these spiders start at your IP and follow links? I have a site that has password protected pages (using JS) but the pages can be acsessed by simply typing the address. Can the spiders see these pages or not?

wwwolf
01-02-2003, 08:07 PM
Thanks a lot for all the information!

Applesauce: the reason why I don't want the site to be indexed by robots is simple... It is a website about a party and the birthday boy has a kind of 'high profile' profession and doesn't want outsiders to find this kind of information too easily by typing in some keywords...

applesauce
01-02-2003, 08:09 PM
ah!

ca_redwards
01-02-2003, 10:52 PM
Crawlers tend to avoid indexing JavaScript-generated content.

In my HTML() bookmarklet library (http://www.angelfire.com/ca/redwards/html__.calendar.html)'s "roll your own" example, the text of the page generated by
document.write(('Untitled'.TITLE().HEAD()+'Building HTML pages is easy!'.P().BODY()).HTML()) generally wouldn't be indexed by web crawlers. However, URL literals can usually be recognized inside of JavaScript code, unless they are assembled by the script itself.

I often use
"No.Spam(a)Thank.You".split("(a)").join("@") for mailto: links.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum