...

View Full Version : web crawlers with dynamic content



kwhubby
10-08-2003, 08:34 AM
How do the web crawlers of the main search engines around today act with dynamically generated content. I have a page were the navigation is dynamically generated with javascript, and I am wondering- do the web crawlers actually pull url's out of scripts, or only out of links in the actual body

Kor
10-08-2003, 09:13 AM
How do the web crawlers of the main search engines around today act with dynamically generated content


... they don't. As far as I know. If anyone knows otherwise, I'll be glad to hear, I am also interested in subject.

kwhubby
10-08-2003, 09:17 AM
if they dont pull urls out of scripts, would a method for having the search engine crawl those urls to also have the urls as links in a hidden div

Kor
10-08-2003, 09:53 AM
I don't know... As far as I know, the search engines search for words as text and ignore words in code, so... If have a JavaScript dynamically generated page is somehow the same to have a CGI generated page... I mean a virtual page, from the crowler's point of view... I don't know much...

BrainJar
10-08-2003, 05:11 PM
Crawlers will look at what the web server returns. In the case of CGI or any server-side scripting like ASP, the web server executes the code and outputs an HTML page. A crawler will see that output, not the actual code that generated it. In other words, it sees what you see when you load a page in a browser and select "View Source."

In the case of client-side scripts, the crawler will only see the raw code embedded in the HTML. I don't know of any that can actually execute the code as the browser does. In fact, that would be rather difficult as many times code runs off user-driven events like clicking on links.

A crawler could conceivably scan the javascript code and look for strings that appear to be URLs but I imagine most don't. As an example, I might have plain text on a page like "http://www.abc.net/test.html" which is a URL but is not an actual link. A search engine crawler should ignore that. Instead it should only look at URLs within the href part of A tags.

If you have links generated in client-side script (that don't appear elsewhere on the page in static HTML), you can place those same links inside a <noscript></noscript> tag pair to make then visible to crawlers. That has the added advantage of displaying the links to any users with scripting turned off on their browser.

oracleguy
10-08-2003, 05:28 PM
Originally posted by BrainJar
If you have links generated in client-side script (that don't appear elsewhere on the page in static HTML), you can place those same links inside a <noscript></noscript> tag pair to make then visible to crawlers. That has the added advantage of displaying the links to any users with scripting turned off on their browser.

That is a good solution.

kwhubby
10-08-2003, 09:09 PM
ah, thank you! thats a good idea.... now also..... what about frames--- I know some crawlers dont function at all with them, and some do, but the question is is which do



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum