Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8
  1. #1
    New Coder
    Join Date
    May 2006
    Posts
    42
    Thanks
    0
    Thanked 0 Times in 0 Posts

    How to get the links in a responseText/responseXML?

    Hi!
    How could I find the links (like in document.links[0] ...) from the content i retrieved through a XMLHttpRequest?
    I get the error: MyThis.tempObj has no properties.

    Code:
    <html><head>
    <script language="javascript" type="text/javascript">
    MyXMLHttpRequest = function(fuURL,fuCallMeOnLoad) {
      var MyThis = this;
      this.status = 0; // siehe http-response-codes bzw xmlgetrequest-response-codes
      this.theURL = fuURL; this.HTMLofURL = ''; this.CallMeOnLoad = fuCallMeOnLoad;
      this.extractedLinks = new Array();
      function onLoad (e)    {
          MyThis.status = 200;
          MyThis.HTMLofURL = e.target.responseText; 
          var tempObj = e.target.responseXML;  //That's not working
              alert(tempObj.links[1]);  // and so this gives an error
    //      MyThis.CallMeOnLoad(MyThis.theURL,MyThis.HTMLofURL);
      }
      this.LoadPage = function() {
        try { netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect"); } catch (e) {   alert("Permission UniversalXPConnect denied."); }
        var r = new XMLHttpRequest();
        r.onload = onLoad;
        r.open ("GET", this.theURL, true);
        r.send (null);
      }
    }
    function Show(t,h) {alert(t + "\n" + h);}
    var Seite = new MyXMLHttpRequest('http://www.google.com',Show);
    Seite.LoadPage();
    </script></head><body> Hello! </body></html>
    I have tried some other stuff already too, like parser, documentElement, ... but somehow responseXML is not there.

    Greetings
    Dieter
    Last edited by DH2006; 06-08-2007 at 02:23 AM. Reason: Forgot the line "Seite.LoadPage();"

  • #2
    Master Coder
    Join Date
    Feb 2003
    Location
    UmeŚ, Sweden
    Posts
    5,575
    Thanks
    0
    Thanked 83 Times in 74 Posts
    The page you're requesting is not an XML page. It's only natural that responseXML is not available for it.
    liorean <[lio@wg]>
    Articles: RegEx evolt wsabstract , Named Arguments
    Useful Threads: JavaScript Docs & Refs, FAQ - HTML & CSS Docs, FAQ - XML Doc & Refs
    Moz: JavaScript DOM Interfaces MSDN: JScript DHTML KDE: KJS KHTML Opera: Standards

  • #3
    New Coder
    Join Date
    May 2006
    Posts
    42
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by liorean View Post
    The page you're requesting is not an XML page. It's only natural that responseXML is not available for it.
    I already thought something like that, but then i wasn'T sure, as in the head of the html-source it's marked as
    Code:
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    Therefor I tried something like new DomParser(e.target.responseText) (can't remember the exact term), that should return a document-object which should be parseable then.
    That didn't work (probably as I had an error in the syntax). There came a quite ugly long error-message.

    Now I have found that example:
    Code:
    var MyresponseXML = new DOMParser().parseFromString(e.target.responseText, 'text/xml');

    Do you think that would look good? Of course I will try it now.
    Greetings
    Dieter

  • #4
    New Coder
    Join Date
    May 2006
    Posts
    42
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Well, that didn'T work.
    First error message now was "nicht wohlgeformt" (is in word by word translation in English "not well formed").
    (A second error message of course followed then saying "MyThis.tempObj.links has no properties" )

    Instead of text/xml i just tried text/html too, but that gave an exception with "NS_ERROR_NOT_IMPLEMENTED" at the parseFromString-function.

    Could I use setMimeHeader? But I read that only working in IE. (I'm using Firefox.)

    Thank you already for all your help!
    Greetings, Dieter
    P.S.
    Actually I started first by retrieving the links by regex-functions from the source of loaded iFrame, but everyone suggested to just use XMLHttpRequest instead. Really, Firefox doesn't seem to have a simple function to get the source of a loaded website/window/document/iFrame/Frame. So now I tried to use XMLHttpRequest and the links-property and it doesn't work again.

  • #5
    Master Coder
    Join Date
    Feb 2003
    Location
    UmeŚ, Sweden
    Posts
    5,575
    Thanks
    0
    Thanked 83 Times in 74 Posts
    Sadly, there is to my knowledge no easy to use parser for HTML that generates an HTMLDocument object. DOMParser requires well formed XML for instance, and so does [object XMLHttpRequest].responseXML. I can imagine there is a way to get around it using innerHTML, or using iframe.src='data:test/html,'+encodeURIComponent(source); or something similar. I advise you to try one of those.
    liorean <[lio@wg]>
    Articles: RegEx evolt wsabstract , Named Arguments
    Useful Threads: JavaScript Docs & Refs, FAQ - HTML & CSS Docs, FAQ - XML Doc & Refs
    Moz: JavaScript DOM Interfaces MSDN: JScript DHTML KDE: KJS KHTML Opera: Standards

  • #6
    New Coder
    Join Date
    May 2006
    Posts
    42
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by liorean View Post
    Sadly, there is to my knowledge no easy to use parser for HTML that generates an HTMLDocument object. DOMParser requires well formed XML for instance, and so does [object XMLHttpRequest].responseXML. I can imagine there is a way to get around it using innerHTML, or using iframe.src='data:test/html,'+encodeURIComponent(source); or something similar. I advise you to try one of those.
    I will take a last try by searching in the Mozilla/Firefox-Source, where the function is, that parses the not-well-formed html-sources, when they are loaded in the browser. If I'm lucky, it's possible to contact that function through XPCOM. Else I will use REGEX-functions again.

    InnerHTML will work, but my experience so far is that this way is not only slow, but will probably use much too much memory.
    Iframe should work too, but likely using too much memory too - as my project is a search application for my own use that should load around at least 10000-search-pages a day. (Though it shall display around 10000 of the found linked pages too a day, I'm already curious how that will affect memory :P ).
    I need my own application, as I want to implement my own analysis function and tools easily.

    Thank you very much for your help!
    Greetings, Dieter

  • #7
    New Coder
    Join Date
    May 2006
    Posts
    42
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I could try reading through some of the extensions that extract information from Google.de and other search engines. At least there are some that alter the pages. Let's see, maybe some isn't first loading the pages in iframes/frames first, but handles this somethow different (somehow faster and less memory consumptive.)

  • #8
    Senior Coder rnd me's Avatar
    Join Date
    Jun 2007
    Location
    Urbana
    Posts
    4,299
    Thanks
    10
    Thanked 585 Times in 566 Posts
    Quote Originally Posted by liorean View Post
    Sadly, there is to my knowledge no easy to use parser for HTML that generates an HTMLDocument object. DOMParser requires well formed XML for instance, and ...
    cheer up, and turn it into valid XML!

    //ff2 only
    var s = new XMLSerializer();
    var d = document;
    var str = s.serializeToString(d);
    alert(str);


    instead of d, you can feed it your responseBody.


    you can also create a hidden div, and set the innerHTML to the responseText.
    then something like:

    var newLinks=hiddenDiv.getElementsByTagName("a");

    should work, as long as it is hidden, it shouldn't take that long.
    but use the spankin new serializer if you can.
    Last edited by rnd me; 06-15-2007 at 08:11 AM.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •