I've read a lot lately about preventing crawlers from indexing sensitive parts of your website's file structure. Almost every article I've read focuses on the use of "Robots.txt" which I find unhelpful. Including the directories that you don't want indexed in a public file seems counterintuitive. Also, those articles are all qualified by saying that "complying with Robots.txt is optional and bad bots will index whatever they want". Not one of these articles makes mention of disabling directory indexing.
I've always worked under the assumption that disabling directory indexing will prevent crawlers from being able to index files that aren't linked to in public files. Is this correct?
For instance, you may have the following directory structure : example.com/frontend; and example.com/admin.
The files in /frontend you don't mind being indexed since they're the "public face" of your website. However, the files in /admin you really don't want crawlers to go near. If you've set indexing as "disabled" in either a .htaccess file or in httpd.conf, is it really necessary to even consider a Robots.txt file?
As far as I see it, if the crawler tries to access the directory, it will be faced with a "forbidden" message just like any other user. The only way it would be able to determine what files are in the directory is by studying public files that link to that directory. Since your admin section is (hopefully) behind a login, it won't get access to the files that contain those links.