View Full Version : Allow robots to access a page, but not visitors

07-20-2012, 06:17 PM
I'm wondering if there is a method I can use to allow robots/crawlers to visit a page, but forward any visitors that land there (through a search engine or by accident) back to the home page.

I'm making a wordpress custom post type that's only meant to be viewed in an lightbox. The layout, while semantically correct for search engines, looks really bad outside of the lightbox, so while I want the site to be scanned and noted by search crawlers, I want users to be forward to the home page where they can view the items in the lightboxes as meant.

Anyone have any idea how I can do this? I also understand if its not PHP thing, something I can achieve through a robots.txt file or .htaccess or something else, and appreciate advice in that direction.

07-21-2012, 02:14 AM
your question is not as straightforward as I first thought it was :) everything I have thought of to date requires a lot of if/buts and maybes ...cookies seem the ideal answer but I don't know how all robots deal with cookies and javascript.

e.g. check for a cookie value on that page, if it does not exist then redirect to the frontpage, but you would have to do this via javascript rather than PHP else the robot will never get that far

you could of course check all the possible robot user-agents (http://www.robotstxt.org/dbexport.html) ... not sure I really want to check all that on every page load though :)

07-21-2012, 03:37 PM
Yah, its the same logic issue I've run into... how to determine what is a robot and what is a user...

07-23-2012, 04:05 PM
As we're back from the weekend, I gave this another look, and still nothing. I tried to see if there was some settings I could apply for robots via a metatag or robots.txt, but came up with nothing. I'm guessing there's no 'standard' list of robots I could use to check referral from or something?

07-23-2012, 11:59 PM
Some robots masquerade as browsers in order to bypass chnecks that block robots.

Some people have their browser set so it pretends to be a robot so they can see pages the way a search engine does.

The second of these is a lot less likely than the first but basically there is no real way to tell the difference without applying some form of CAPTCHA (not necessarily a graphic one but something that distinguishes between what a real person is likely to do and how a robot would behave.

07-24-2012, 12:08 AM
Hm... very good points... time to tell the client we can't do it his way!

07-24-2012, 01:56 AM
you could of course check all the possible robot user-agents (http://www.robotstxt.org/dbexport.html) ... not sure I really want to check all that on every page load though :)
once checked you could just write the result to session