![]() |
How to get the links in a responseText/responseXML?
Hi!
How could I find the links (like in document.links[0] ...) from the content i retrieved through a XMLHttpRequest? I get the error: MyThis.tempObj has no properties. Code:
<html><head>Greetings Dieter |
The page you're requesting is not an XML page. It's only natural that responseXML is not available for it.
|
Quote:
Code:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">That didn't work (probably as I had an error in the syntax). There came a quite ugly long error-message. Now I have found that example: Code:
var MyresponseXML = new DOMParser().parseFromString(e.target.responseText, 'text/xml');Do you think that would look good? Of course I will try it now. Greetings Dieter |
Well, that didn'T work.
First error message now was "nicht wohlgeformt" (is in word by word translation in English "not well formed"). (A second error message of course followed then saying "MyThis.tempObj.links has no properties" ) Instead of text/xml i just tried text/html too, but that gave an exception with "NS_ERROR_NOT_IMPLEMENTED" at the parseFromString-function. Could I use setMimeHeader? But I read that only working in IE. (I'm using Firefox.) Thank you already for all your help! Greetings, Dieter P.S. Actually I started first by retrieving the links by regex-functions from the source of loaded iFrame, but everyone suggested to just use XMLHttpRequest instead. Really, Firefox doesn't seem to have a simple function to get the source of a loaded website/window/document/iFrame/Frame. So now I tried to use XMLHttpRequest and the links-property and it doesn't work again. |
Sadly, there is to my knowledge no easy to use parser for HTML that generates an HTMLDocument object. DOMParser requires well formed XML for instance, and so does [object XMLHttpRequest].responseXML. I can imagine there is a way to get around it using innerHTML, or using iframe.src='data:test/html,'+encodeURIComponent(source); or something similar. I advise you to try one of those.
|
Quote:
InnerHTML will work, but my experience so far is that this way is not only slow, but will probably use much too much memory. Iframe should work too, but likely using too much memory too - as my project is a search application for my own use that should load around at least 10000-search-pages a day. (Though it shall display around 10000 of the found linked pages too a day, I'm already curious how that will affect memory :P ). I need my own application, as I want to implement my own analysis function and tools easily. Thank you very much for your help! Greetings, Dieter |
I could try reading through some of the extensions that extract information from Google.de and other search engines. At least there are some that alter the pages. Let's see, maybe some isn't first loading the pages in iframes/frames first, but handles this somethow different (somehow faster and less memory consumptive.)
|
Quote:
//ff2 only var s = new XMLSerializer(); var d = document; var str = s.serializeToString(d); alert(str); instead of d, you can feed it your responseBody. you can also create a hidden div, and set the innerHTML to the responseText. then something like: var newLinks=hiddenDiv.getElementsByTagName("a"); should work, as long as it is hidden, it shouldn't take that long. but use the spankin new serializer if you can. |
| All times are GMT +1. The time now is 08:08 AM. |
Powered by vBulletin®
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.