Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 3 of 3
  1. #1
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts

    Rewrite robots.txt with robots.php, but block direct access to robots.php itself?

    I have the following as part of my .htaccess file:

    Code:
    RewriteRule ^(.*)robots.txt$ $1/robots.php [L]
    This allows me to do some PHP stuff with page metrics before actually sending the robots.txt directives, and it makes the presentation of robots.txt seemless as expected.

    However, today Google threw me a curve ball by requesting robots.php out of the blue. I don't want them (or anyone else) to do that so I need to block direct access to robots.php, issuing a 404. To do that, I tried the following:

    Code:
    RedirectMatch 404 ^(.*)robots.php$
    And it worked...too well. Now my robots.txt returns my 404 page as well.

    I presume this is a direct result of the rewrite rule which must count just the same as a direct URI "get" request to the page (which I had not intuited).

    So is there a way to simultaneously serve robots.php when robots.txt is requested AND issue a 404 error when robots.php is directly requested?
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting

  • #2
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts
    UPDATE: I got a bit closer with this:

    Code:
    # Condition prevents redirect loops (when script is found)
    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    
    # Forbid direct access to PHP extension
    RewriteRule ^(.*)robots.php$ forbidden [F,L]
    I narrowed that down from an answer to a related question on stackoverflow. This gets me to the point where robots.txt works as desired, and robots.php is at least sending a 403 forbidden error.

    This is now the full .htaccess that is relevant to this issue:
    Code:
    RewriteEngine On
    # serve robots.php as robots.txt
    RewriteRule ^(.*)robots.txt$ $1/robots.php [L]
    
    # and block robots.php (google got smart with me and tried it)
    #RedirectMatch 404 ^(.*)robots.php$
    
    # Condition prevents redirect loops (when script is found)
    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    
    # Forbid direct access to PHP extension
    RewriteRule ^(.*)robots.php$ forbidden [F,L]
    Just have to figure out how to get that to be a 404 without breaking robots.txt access again and I'm set. But I'm not very good with .htaccess and I'm out of ideas for today. I'll pick this up again in the morning if nobody else has responded with a fix by then.

    If you have any ideas please chime in.
    Last edited by Rowsdower!; 11-27-2012 at 11:11 PM.
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting

  • #3
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts
    OK, final update...

    After a little sleep things always look better. I think this finally solves it:

    Code:
    RewriteEngine On
    
    # serve robots.php as robots.txt
    	RewriteRule ^(.*)robots.txt$ $1/robots.php [L]
    
    #NOTE: The "L" flag tells apache to stop processing any other rules for this request.
    #It MUST be here in order to prevent a 404 from being returned for requests to "robots.txt" once that request is redirected to robots.php (else the block for robots.php that is set up below will kill the whole thing)
    
    # and block direct access to robots.php (google got smart with me and tried it)
    	# Condition prevents redirect loops (when script is found)
    	RewriteCond %{ENV:REDIRECT_STATUS} ^$
    
    	# Send 404 upon attempted access to the PHP extension
    	RewriteRule ^(.*)robots.php$ 404
    I posted the much more verbose version of my situation here, if it helps anyone:
    http://www.rowsdower.org/other/?page...le_in_htaccess
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •