Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 2 of 2
  1. #1
    Regular Coder
    Join Date
    Oct 2002
    Posts
    299
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Robots.txt (no dir) - Meta (index,follow) - *.hta (password)

    Hi,

    I have some question about search-engines and site-copying software.

    Example.

    Suppose, I have an index.htm file, a robot.txt file and 2 directories (dir1 and dir2) in the root-dir, like:

    index.htm
    robots.txt
    dir1
    dir2

    Q1.
    I want a search-engine to index, follow etc all files in dir1, but not dir2!

    Is it correct for me to think to do:

    index.htm + <meta name="robots" content="index,follow">
    robots.txt + disable dir2

    Or do I miss something?

    Example.

    Suppose I protect all files in dir2 using *.hta (password)

    Q2.
    Can I include these files in the index.htm without being ask for a password, like:

    In dir2 i have world.gif

    In index.htm I have <img src="dir2/world.gif">

    Or is the user being ask for a password / how does it work / from within the url?

    When Q2 is yes:

    Q3.
    I have no *.hta file on the server, can I make one myself and put it in the root directory?

    Q4.
    The name of the *.hta-file needs to be .hta (so [0.3])?
    The rest of my .hta questions are answered in wsa's perfect tutorial.

    Q5.
    I know that after visiting a site all files are on the HD etc, etc and I am not protecting anything in what-ever way.
    However, I do not like copy soft-ware copying everything in one time, can I prevent this? I can not do anything server-side!


    Thanks for Your effort / ideas,
    Jerome

  • #2
    New to the CF scene
    Join Date
    May 2003
    Location
    Mexico
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Supposedly...

    Supposedly the Robots txt file permits or prevents the access of robots from searching sudirectories that you do not want them into.
    Don't count on it.
    However, if you want to keep robots and people not authorized OUT of a folder, then basically you can place an .htaccess file in that particular folder... or as you called it *.hta ...
    However.. that won't work...
    If you are serving your site up form your own computer, and you are in windows then minor problem... however then you can load up apache and in the config file rename .htaccess whatever you want, that works. If you are posting to a web site on the web then you have to call it, .htacess.
    place one with retrictions in each folder you do want limited or no access.
    adn in each open to the public folder just don't put one.

    c:\program files\apache group\apache2\htdocs\
    no .htaccess
    yes robot.txt
    yes index.html etc. etc.
    yes login.html to grant access to dir2 after passwrod and
    user name confirmed
    c:\program files\apache group\apache2\htdocs\dir1
    no .htaccess
    yes robot.txt
    yes index.html etc. etc.

    c:\program files\apache group\apache2\htdocs\dir2 (locked)
    yes .htaccess needed to lock folder
    no robot.txt not needed
    yes index.html etc. etc.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •