Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 9 of 9
  1. #1
    Regular Coder
    Join Date
    Jan 2006
    Posts
    243
    Thanks
    14
    Thanked 2 Times in 2 Posts

    grabbing text from page

    While on vacation I've been writing a bunch of javascript/css text effects for fun and I'm planning to look into how I can implement my code inside a url. So now I want to know how I can recognise and grab the text from any random webpage and put it inside of a variable, my script can take it from there.

  • #2
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    17,917
    Thanks
    203
    Thanked 2,531 Times in 2,509 Posts
    You cannot do this using Javascript. You need to use a server-side language.


    All advice is supplied packaged by intellectual weight, and not by volume. Contents may settle slightly in transit.

  • #3
    Regular Coder
    Join Date
    Jan 2006
    Posts
    243
    Thanks
    14
    Thanked 2 Times in 2 Posts
    Hm, why would I need a server side language for something local?
    Just out of curiosity, what languages/methods are you thinking of?

    If worst comes to worst, I can filter out all non text elements right?
    Or I might check for text-only styles (font-weight etc), also I can have the script check for exceptions that won't work, so it won't throw an error.

    If you're certain there's something preventing me from doing this, please point it out to me. Additional ideas for how I might get it to work are even more welcome!

  • #4
    Regular Coder
    Join Date
    May 2009
    Location
    China
    Posts
    133
    Thanks
    1
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Kirl View Post
    While on vacation I've been writing a bunch of javascript/css text effects for fun and I'm planning to look into how I can implement my code inside a url. So now I want to know how I can recognise and grab the text from any random webpage and put it inside of a variable, my script can take it from there.
    maybe you should try to explain the question in more detail.


  • #5
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    17,917
    Thanks
    203
    Thanked 2,531 Times in 2,509 Posts
    Quote Originally Posted by KevinJohnson View Post
    maybe you should try to explain the question in more detail.

    ... and explain why you want to do this.

  • #6
    Regular Coder
    Join Date
    May 2009
    Location
    China
    Posts
    133
    Thanks
    1
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Kirl View Post
    While on vacation I've been writing a bunch of javascript/css text effects for fun and I'm planning to look into how I can implement my code inside a url. So now I want to know how I can recognise and grab the text from any random webpage and put it inside of a variable, my script can take it from there.
    ok, so i just re-read your post again to try to better understand what you're trying to do. And, i do see what you're trying to do now.
    As Philip mentioned... "...and why " - lol.
    To answer your question though, it CAN be done. But the solution is not pretty.

    In short, there is only one browser (that i have experience writing extensions for) which can do this.
    Firefox.

    What you'll need to do is write some privileged code. Which of course has some limitations and risks.

    Sure, you can grant privileges to do almost anything, however it will only work if you run it from the local file system (which is what your doing i think).

    so what you need to do is surf around on Mozilla's Developer Center (dev.mozilla.com) and look around.

    in more detail, you will need to write a TCP connections function that grabs a web page. Saves it's content into a string, and then you can do some string filtering and filter out everything but the Text that you want (some RegEx stuff). Then use innerHTML property modifications to put that data into your DHTML app (your writing a DHTML Client app - not a web page - LOL).

    *ghasps*

    So....the better questions is "WHY?"

    LOL

  • #7
    Regular Coder
    Join Date
    Jan 2006
    Posts
    243
    Thanks
    14
    Thanked 2 Times in 2 Posts
    Question in detail:
    How can I put the visible text of any webpage inside of a variable, so that I can aply text effects to it? Say I'd like to implement various text effects on my own webpages without putting all the targetted text inside of special tags, meaby I'll implement the script through php, or I'll just manually paste it in the head section (forget I mentioned implementing the code through url, if this confuses you). The DOM is able to tell the various element types on a page right, you're telling me it is unable to make a distinction between text and images for example?

    Why?
    Simply for fun, you all forgot to have fun with programming? You serious coders can be a boring lot sometimes, rather off putting...

  • #8
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,166
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    The reason you normally can't do this is security. Pure and simple.

    You can't use xmlhttp (or equivalent) to read content from another site.

    And while you could use an <iframe> to load content from another site, JavaScript can't then look inside the <iframe>. It's called "cross site scripting" and you are welcome to google for why that's such a bad thing and why modern browsers have agressively clamped down on it.

    It has nothing whatsoever to do with the KIND of content in the pages. You simply can't get *TO* the pages. At all.

    (Unless you use the Mozilla hack mentioned by Kevin.)

    On the other hand, doing this server-side is easy. PHP/JSP/ASP can all do it. The server-side equivalents of xmlhttp in those various systems don't have the "not in my domain" restrictions of a browser.

    So unless you are fanatical about doing it in JS--and willing to hack it the Mozilla way--why not use a tiny bit of server-side code to help yourself???

    One very very simple thing to do would be to create a server-side "proxy server" for foreign site content. That is, you'd use AJAX (or IFRAME!) to hit a page on your own site with a URL something like:
    Code:
        http://www.mysite.com/proxy.php?www.anothersite.com/somepage.html
    The PHP (or ASP or JSP) proxy page simply loads the given URL using server side code and returns the full source of the page back to your AJAX code or IFRAME. Presto. You can now see the full HTML and do whatever you want to do.

    I am studiously ignoring the legal aspects of this. I am *assuming* that of course you have contacted these various sites and gotten permission to use their copyrighted material. And I hope Santa Claus was good to you last month.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #9
    Regular Coder
    Join Date
    Jan 2006
    Posts
    243
    Thanks
    14
    Thanked 2 Times in 2 Posts
    Sorry to labour the question, but I seem to have created a misunderstanding in my first post.

    What if I'd like to implement the script on MY OWN webpage?
    Can I sort text from other elements in my page, without wrapping it inside id containers?


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •