Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4
  1. #1
    Regular Coder DR.Wong's Avatar
    Join Date
    Jan 2005
    Posts
    360
    Thanks
    23
    Thanked 1 Time in 1 Post

    Disabling directory indexing : does it stop crawlers too?

    Hi there,

    I've read a lot lately about preventing crawlers from indexing sensitive parts of your website's file structure. Almost every article I've read focuses on the use of "Robots.txt" which I find unhelpful. Including the directories that you don't want indexed in a public file seems counterintuitive. Also, those articles are all qualified by saying that "complying with Robots.txt is optional and bad bots will index whatever they want". Not one of these articles makes mention of disabling directory indexing.

    I've always worked under the assumption that disabling directory indexing will prevent crawlers from being able to index files that aren't linked to in public files. Is this correct?

    For instance, you may have the following directory structure : example.com/frontend; and example.com/admin.

    The files in /frontend you don't mind being indexed since they're the "public face" of your website. However, the files in /admin you really don't want crawlers to go near. If you've set indexing as "disabled" in either a .htaccess file or in httpd.conf, is it really necessary to even consider a Robots.txt file?

    As far as I see it, if the crawler tries to access the directory, it will be faced with a "forbidden" message just like any other user. The only way it would be able to determine what files are in the directory is by studying public files that link to that directory. Since your admin section is (hopefully) behind a login, it won't get access to the files that contain those links.
    -DR.Wong

    Wheres the food at?

  • #2
    Super Moderator Inigoesdr's Avatar
    Join Date
    Mar 2007
    Location
    Florida, USA
    Posts
    3,638
    Thanks
    2
    Thanked 404 Times in 396 Posts
    Quote Originally Posted by DR.Wong View Post
    I've always worked under the assumption that disabling directory indexing will prevent crawlers from being able to index files that aren't linked to in public files. Is this correct?
    Yeah, this is correct regarding search engines. You probably still want to disallow the admin folder because you don't really want the login page contents showing up in your search results.

    Quote Originally Posted by DR.Wong View Post
    If you've set indexing as "disabled" in either a .htaccess file or in httpd.conf, is it really necessary to even consider a Robots.txt file?
    It's considered best practice to always have a robots.txt file and have it exclude any directories you don't want indexed for search engines, even if you disable indexing. In your example if you have "Options -Indexes" in your config, your admin login page contents can still be indexed and show up on search results if you don't disallow it in robots.txt.

  • Users who have thanked Inigoesdr for this post:

    DR.Wong (06-08-2013)

  • #3
    Regular Coder DR.Wong's Avatar
    Join Date
    Jan 2005
    Posts
    360
    Thanks
    23
    Thanked 1 Time in 1 Post
    Thanks for this!
    -DR.Wong

    Wheres the food at?

  • #4
    New to the CF scene
    Join Date
    May 2013
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    interesting


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •