Quote:
Originally Posted by liorean
Sadly, there is to my knowledge no easy to use parser for HTML that generates an HTMLDocument object. DOMParser requires well formed XML for instance, and so does [object XMLHttpRequest].responseXML. I can imagine there is a way to get around it using innerHTML, or using iframe.src='data:test/html,'+encodeURIComponent(source); or something similar. I advise you to try one of those.
|
I will take a last try by searching in the Mozilla/Firefox-Source, where the function is, that parses the not-well-formed html-sources, when they are loaded in the browser. If I'm lucky, it's possible to contact that function through XPCOM. Else I will use REGEX-functions again.
InnerHTML will work, but my experience so far is that this way is not only slow, but will probably use much too much memory.
Iframe should work too, but likely using too much memory too - as my project is a search application for my own use that should load around at least 10000-search-pages a day. (Though it shall display around 10000 of the found linked pages too a day, I'm already curious how that will affect memory :P ).
I need my own application, as I want to implement my own analysis function and tools easily.
Thank you very much for your help!
Greetings, Dieter