Go Back   CodingForums.com > :: Client side development > General web building

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 07-23-2009, 12:07 AM   PM User | #1
ClancyCat
New to the CF scene

 
Join Date: Jul 2009
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
ClancyCat is an unknown quantity at this point
Help with robots.txt file issue

Hi:

I have a site that has both categories and geographic areas, and provides sorting and filtering by either. This results in some url's that look like this:

Code:
www.mysite.com/category-1-0/Category Name.html
wherein "1" = the category number and "0" = ALL geographic areas. This also results in a boatload of virtual urls (for all 11 areas) that look like this:

Code:
www.mysite.com/category-1-1/Category Name.html
www.mysite.com/category-1-2/Category Name.html
www.mysite.com/category-1-3/Category Name.html
and so forth. End result is 1500+ urls I don't want crawled. I've tried multiple disallow schemes, which don't seem to be working. My latest looks like this:

Code:
Disallow: /category-*-1/* 
Disallow: /category-*-2/* 
Disallow: /category-*-3/* 
Disallow: /category-*-4/* 
Disallow: /category-*-5/* 
Disallow: /category-*-6/* 
Disallow: /category-*-7/* 
Disallow: /category-*-8/* 
Disallow: /category-*-9/* 
Disallow: /category-*-10/* 
Disallow: /category-*-11/* 
Disallow: /category-*-12/* 
Disallow: /category-*-13/* 
Disallow: /category-*-14/*
And it is simply NOT working. I've been having this discussion with webado2 over at Google's GSoftCrawler discussion group (as part of tweaking my sitemap), but apparently she's out of ideas as well as to why this is not working vis-a-vis the Googlebot. I know that wildcards mayn't be accepted by some bots, but I have to get this under control at least partially. Interestingly, this:

Code:
www.mysite/category-15-11/real-estate-and-property/land-for-sale/ 
offer_wanted-all.html 

with a disallow of this:

/*/*/*/offer_wanted-all.html
IS working, so it's not my robots.txt file in general that's ferschplutzed.

Does ANYONE have any suggestions as to what might work in this situation? (I didn't write the original php code, please don't hose me over the naming conventions, thanks). I'd appreciate any help, I've perused myriad articles on this but cannot seem to sort out what I'm not getting right.

Thanks,

ClancyCat
ClancyCat is offline   Reply With Quote
Reply

Bookmarks

Tags
bots, crawlers, disallow, robots.txt, wildcards

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 04:52 PM.


Advertisement
Log in to turn off these ads.