Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 10 of 10
  1. #1
    New Coder
    Join Date
    Nov 2002
    Posts
    16
    Thanks
    0
    Thanked 0 Times in 0 Posts

    hide from web spiders

    Hello,
    I wonder if anyone can tell me what code I can add to web pages I don't want to be indexed by web spiders. I have heard that this is possible, but I don't know how to do it exactly. Can anyone help me?
    Thanks!
    Wolf

  • #2
    Regular Coder
    Join Date
    Jun 2002
    Posts
    166
    Thanks
    0
    Thanked 0 Times in 0 Posts
    <meta name="robots" content="noindex,nofollow" />

    or to have the links followed without having the page indexed:

    <meta name="robots" content="noindex,follow" />

  • #3
    New Coder
    Join Date
    Nov 2002
    Posts
    16
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Great! Thanks!
    Wolf

  • #4
    Regular Coder
    Join Date
    Jun 2002
    Location
    Dallas, Texas
    Posts
    188
    Thanks
    0
    Thanked 0 Times in 0 Posts
    i'm curious why some want to avoid robots. what's the downside?

    thanks

  • #5
    Regular Coder Feyd's Avatar
    Join Date
    May 2002
    Location
    Los Angeles, CA Maxim: Subvert Society
    Posts
    404
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The META robots is almost entirely useless/disused. Very few robots today even look for it. The only reliable way to control spiders is to use a ROBOTS.TXT file and give explicit instructions for all bots to not visit certain files/areas on your site, or explicit instructions for only certain bots....

    For example, the attached file is what I've used on shadowstorm for years, it blocks all bad spiders from indexing the site (of course, I could write a bot that specifically ignores the robots.txt orders, anyone could...so it only blocks the common siphons/harvesters and naughty spiders).

    http://www.robotstxt.org/wc/robots.html
    http://www.searchengineworld.com/rob...s_tutorial.htm

    And apple...you block certan bots/spiders who you know do bad things, like email harvesting or content siphoning. You can also block certain search engines from indexing certain areas of your site (for example, block google from viewing any area of your site that is not built specifically for best google results...though this method is not as efficient any longer, considering improved spider redirects)
    Attached Files Attached Files
    Moderator, Perl/CGI Forum
    shadowstorm.net - subvert society

  • #6
    Regular Coder
    Join Date
    Jun 2002
    Location
    Dallas, Texas
    Posts
    188
    Thanks
    0
    Thanked 0 Times in 0 Posts
    thanks!

    now i guess i'd like to know the upside of being crawled.


  • #7
    New Coder
    Join Date
    Oct 2002
    Location
    Noth Idaho
    Posts
    27
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Do these spiders start at your IP and follow links? I have a site that has password protected pages (using JS) but the pages can be acsessed by simply typing the address. Can the spiders see these pages or not?
    How do you know the universe is 20,000,000,000, years old Grandpa? Were you there?

  • #8
    New Coder
    Join Date
    Nov 2002
    Posts
    16
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks a lot for all the information!

    Applesauce: the reason why I don't want the site to be indexed by robots is simple... It is a website about a party and the birthday boy has a kind of 'high profile' profession and doesn't want outsiders to find this kind of information too easily by typing in some keywords...

  • #9
    Regular Coder
    Join Date
    Jun 2002
    Location
    Dallas, Texas
    Posts
    188
    Thanks
    0
    Thanked 0 Times in 0 Posts
    ah!

  • #10
    Regular Coder
    Join Date
    Dec 2002
    Posts
    169
    Thanks
    0
    Thanked 0 Times in 0 Posts

    You could protect it with JavaScript...

    Crawlers tend to avoid indexing JavaScript-generated content.

    In my HTML() bookmarklet library's "roll your own" example, the text of the page generated by
    Code:
    document.write(('Untitled'.TITLE().HEAD()+'Building HTML pages is easy!'.P().BODY()).HTML())
    generally wouldn't be indexed by web crawlers. However, URL literals can usually be recognized inside of JavaScript code, unless they are assembled by the script itself.

    I often use
    Code:
    "No.Spam(a)Thank.You".split("(a)").join("@")
    for mailto: links.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •