...

View Full Version : WGET and the OPTIONS Indexes directive



JamesOxford
08-21-2011, 07:07 AM
So I just discovered wget (http://www.gnu.org/s/wget/), and how powerful this tool potentially is. I would like to know how to safegaurd against it if it is at all possible. I am not really sure how it works; I just figured it out, and I am able to recursively download from a couple of my domains. I havn't tested it on my PHP code, just images, so I don't know how the server will actually send the PHP. As PHP code, or as HTML code that the PHP script outputs. If it is by HTTP protocol, I think it will just send the HTML markup but I am not sure.

Will denying Indexes with the Options directive safeguard against wget or do I have to do some more advanced configuration? Help here is appreciated.

Inigoesdr
08-21-2011, 10:43 PM
Will denying Indexes with the Options directive safeguard against wget or do I have to do some more advanced configuration? Help here is appreciated.

In general, unless you have an explicit need to list the files, you should disable indexing. Spiders can still crawl your pages to retrieve the images/files you use on them(wget can do this), but they can't get a list of everything in your folders and follow it recursively, if you disable the indexes. They also can't see the source of your PHP files because they are parsed by the server when they are requested. An exception would be if you named something .phps or an extension that is not handled by Apache(like .phpbak for example).

To disable indexes for your site put this in an .htaccess in the document root:

Options -Indexes

JamesOxford
08-22-2011, 05:41 AM
Again, thanks for your help. If I disable indexes in an .htacess file in the root directory, would I be able to override it in a sub-directory or no? There are a couple of places where indexes are convenient.

In directories where I did want to index, would denying spiders in a robot.txt file, and setting a valid-user requirement with basic authentication be sufficient to to stop recursive downloads of the entire folder?

Inigoesdr
08-22-2011, 04:07 PM
If I disable indexes in an .htacess file in the root directory, would I be able to override it in a sub-directory or no?
Yep.

In directories where I did want to index, would denying spiders in a robot.txt file, and setting a valid-user requirement with basic authentication be sufficient to to stop recursive downloads of the entire folder?
No, not really. robots.txt is more of a suggestion and only well-behaved spiders will follow it. You may just end up making it easier for people to find the directories you don't want indexed... so they can index them. That is, if you are worried about bad robots to begin with.

JamesOxford
08-22-2011, 08:24 PM
At this point it is more of a hypothetical, than a true concern. The basic authentication won't stop them? Won't they get a 404 redirect instead of a 200 OK if they tried to access the directory without authenticating?

Inigoesdr
08-22-2011, 09:01 PM
The basic authentication won't stop them? Won't they get a 404 redirect instead of a 200 OK if they tried to access the directory without authenticating?

Whoops, I didn't see that you were adding authentication. That should be sufficient to block recursive indexing. They will get a 403 if they can't authenticate.

JamesOxford
08-22-2011, 10:07 PM
404 redirect instead of a 200 OK if they tried to access the directory without authenticating?

I meant 403 :).

Thanks again for all your help.

BTW, how do I add the user I am quoting when I wrap text in {QUOTE}?

Inigoesdr
08-23-2011, 05:54 AM
BTW, how do I add the user I am quoting when I wrap text in {QUOTE}?

The easiest way is to hit the quote button at the bottom of the post, but you can use this format too:

some text



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum