...

View Full Version : robots.txt - should crawl but not index



abduraooft
10-10-2009, 11:40 AM
Hi all,

I've a search page with a url like http://mysite.com/search/ to search a set of profiles, created dynamically.

I use the same page to display the search results and have pagination in it. So, when someone hit the submit button displayed in this search page ( without opting anything), he'll reach http://mysite.com/search/page/1/?from=0&to=0 . (and he can use the pagination links to get the other pages too)

I've also given a "browse link" in the footer, which randomly show different links like
http://mysite.com/search/page/10/?from=0&to=0
http://mysite.com/search/page/20/?from=0&to=0 etc. (where from and to corresponds to drop-downs for selecting the age ranges)

Each of this search page has links to various profile pages like
http://mysite.com/profile/1234
http://mysite.com/profile/2143 etc.

My question is, how can I direct search engines to crawl all these search/browse pages, without indexing them. I only need indexes for the profile pages. (Blocking the search page by "Disallow" and giving a sitemap is not as effective as allowing all the search pages)

I believe, I can use a meta like
<meta name="robots" content="noindex"/>, but would that be effective? Or is there anyway to make a rule in robots.txt file to achieve the goal?

Fisher
10-10-2009, 03:51 PM
I don't have an exact answer for you, but it sounds a lot like preventing session IDs from being spidered. The noindex tag should be sufficient.

You could look into canonical (http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html) and nofollow (http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=96569) links. Canonical links should weed out the duplicate content. From the Google canonical link:
If your site has identical or vastly similar content that's accessible through multiple URLs, this format provides you with more control over the URL returned in search results

abduraooft
10-11-2009, 03:31 PM
Thanks, I've added
<?php
if($page=='search' && isset($query_count))
echo '<meta name="robots" content="noindex"/>'."\n";
echo '<link rel="canonical" href="http://mysite.com/search/" /> \n'
?> and waiting ..... :)

shakir
10-12-2009, 04:16 PM
I thing this robot.txt or meta tag are doing the same, hiding from Google bot. but its not indexing means no chance of crawl.. So both r related ie crawled pages r indexing...correct me if m wrong

realistic
10-13-2009, 01:56 PM
Hi,

As I am one of the seo beginner,

read this discussion and note down difficulties and solution about robots.txt for

future reference.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum