Go Back   CodingForums.com > :: Server side development > Apache configuration

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 11-27-2012, 04:39 PM   PM User | #1
Rowsdower!
Senior Coder

 
Rowsdower!'s Avatar
 
Join Date: Oct 2008
Location: Some say it's everything.
Posts: 2,007
Thanks: 5
Thanked 395 Times in 388 Posts
Rowsdower! has a spectacular aura aboutRowsdower! has a spectacular aura aboutRowsdower! has a spectacular aura about
Rewrite robots.txt with robots.php, but block direct access to robots.php itself?

I have the following as part of my .htaccess file:

Code:
RewriteRule ^(.*)robots.txt$ $1/robots.php [L]
This allows me to do some PHP stuff with page metrics before actually sending the robots.txt directives, and it makes the presentation of robots.txt seemless as expected.

However, today Google threw me a curve ball by requesting robots.php out of the blue. I don't want them (or anyone else) to do that so I need to block direct access to robots.php, issuing a 404. To do that, I tried the following:

Code:
RedirectMatch 404 ^(.*)robots.php$
And it worked...too well. Now my robots.txt returns my 404 page as well.

I presume this is a direct result of the rewrite rule which must count just the same as a direct URI "get" request to the page (which I had not intuited).

So is there a way to simultaneously serve robots.php when robots.txt is requested AND issue a 404 error when robots.php is directly requested?
__________________
The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
See Mediocrity in its Infancy
It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
Seek and you shall find... basically:
validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting
Rowsdower! is offline   Reply With Quote
Old 11-27-2012, 11:08 PM   PM User | #2
Rowsdower!
Senior Coder

 
Rowsdower!'s Avatar
 
Join Date: Oct 2008
Location: Some say it's everything.
Posts: 2,007
Thanks: 5
Thanked 395 Times in 388 Posts
Rowsdower! has a spectacular aura aboutRowsdower! has a spectacular aura aboutRowsdower! has a spectacular aura about
UPDATE: I got a bit closer with this:

Code:
# Condition prevents redirect loops (when script is found)
RewriteCond %{ENV:REDIRECT_STATUS} ^$

# Forbid direct access to PHP extension
RewriteRule ^(.*)robots.php$ forbidden [F,L]
I narrowed that down from an answer to a related question on stackoverflow. This gets me to the point where robots.txt works as desired, and robots.php is at least sending a 403 forbidden error.

This is now the full .htaccess that is relevant to this issue:
Code:
RewriteEngine On
# serve robots.php as robots.txt
RewriteRule ^(.*)robots.txt$ $1/robots.php [L]

# and block robots.php (google got smart with me and tried it)
#RedirectMatch 404 ^(.*)robots.php$

# Condition prevents redirect loops (when script is found)
RewriteCond %{ENV:REDIRECT_STATUS} ^$

# Forbid direct access to PHP extension
RewriteRule ^(.*)robots.php$ forbidden [F,L]
Just have to figure out how to get that to be a 404 without breaking robots.txt access again and I'm set. But I'm not very good with .htaccess and I'm out of ideas for today. I'll pick this up again in the morning if nobody else has responded with a fix by then.

If you have any ideas please chime in.
__________________
The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
See Mediocrity in its Infancy
It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
Seek and you shall find... basically:
validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting

Last edited by Rowsdower!; 11-27-2012 at 11:11 PM..
Rowsdower! is offline   Reply With Quote
Old 11-28-2012, 04:38 PM   PM User | #3
Rowsdower!
Senior Coder

 
Rowsdower!'s Avatar
 
Join Date: Oct 2008
Location: Some say it's everything.
Posts: 2,007
Thanks: 5
Thanked 395 Times in 388 Posts
Rowsdower! has a spectacular aura aboutRowsdower! has a spectacular aura aboutRowsdower! has a spectacular aura about
OK, final update...

After a little sleep things always look better. I think this finally solves it:

Code:
RewriteEngine On

# serve robots.php as robots.txt
	RewriteRule ^(.*)robots.txt$ $1/robots.php [L]

#NOTE: The "L" flag tells apache to stop processing any other rules for this request.
#It MUST be here in order to prevent a 404 from being returned for requests to "robots.txt" once that request is redirected to robots.php (else the block for robots.php that is set up below will kill the whole thing)

# and block direct access to robots.php (google got smart with me and tried it)
	# Condition prevents redirect loops (when script is found)
	RewriteCond %{ENV:REDIRECT_STATUS} ^$

	# Send 404 upon attempted access to the PHP extension
	RewriteRule ^(.*)robots.php$ 404
I posted the much more verbose version of my situation here, if it helps anyone:
http://www.rowsdower.org/other/?page...le_in_htaccess
__________________
The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
See Mediocrity in its Infancy
It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
Seek and you shall find... basically:
validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting
Rowsdower! is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 04:51 AM.


Advertisement
Log in to turn off these ads.