Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4
  1. #1
    New to the CF scene
    Join Date
    Jun 2012
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Image extraction without reloading from src

    Hey there,

    There is a foreign site whose content is not under my control.
    When I open this site, it contains an image.
    Using DOM traversal, I can predictably store the according <img> tag into a variable.

    I want that image somehow extracted, without having to intervene manually.

    It is intended that other programs, running under my control on localhost, can furtherly manipulate and use the image.
    I thought of creating a local web server, and letting the JS send me "some data" via post to that web server.

    However, simply reloading from src is NOT possible, since the content changes dynamically (non-idempotent behaviour of the foreign server in response to a get request), and I badly want exactly the same image as displayed in the browser.

    According to w3schools, the <img> tag is just a container.
    That leaves me with the question: How can I access the image itself (programatically) without reloading it from src?
    Google hasn't turned up a result, I'm new to JS, and nobody I know can give me a hint.

    Just to make my point clear: I'm only talking about that image-extraction part.
    Everything else can be adapted to this (except the foreign server, of course)

    (Posted under DOM because I thought the solution might somehow DOM-related. Not sure though.)

    osschef

  • #2
    New Coder
    Join Date
    Apr 2012
    Location
    United Kingdom
    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by osschef View Post
    Hey there,

    There is a foreign site whose content is not under my control.
    When I open this site, it contains an image.
    Using DOM traversal, I can predictably store the according <img> tag into a variable.

    I want that image somehow extracted, without having to intervene manually.

    It is intended that other programs, running under my control on localhost, can furtherly manipulate and use the image.
    I thought of creating a local web server, and letting the JS send me "some data" via post to that web server.

    However, simply reloading from src is NOT possible, since the content changes dynamically (non-idempotent behaviour of the foreign server in response to a get request), and I badly want exactly the same image as displayed in the browser.

    According to w3schools, the <img> tag is just a container.
    That leaves me with the question: How can I access the image itself (programatically) without reloading it from src?
    Google hasn't turned up a result, I'm new to JS, and nobody I know can give me a hint.

    Just to make my point clear: I'm only talking about that image-extraction part.
    Everything else can be adapted to this (except the foreign server, of course)

    (Posted under DOM because I thought the solution might somehow DOM-related. Not sure though.)

    osschef
    Could you give more details on how the foreign site is generating the image. Can you show an example of the page in question?

    This is probably going to be more complex than just using javascript. You may need to use a server side language such as PHP and manipulate the DOM and use CURL to store cookies, send correct headers etc.
    Providing quality professional Mobile Applications, Web Applications and Website Development Services.

  • #3
    Regular Coder Lerura's Avatar
    Join Date
    Aug 2005
    Location
    Denmark
    Posts
    911
    Thanks
    0
    Thanked 120 Times in 119 Posts
    The reason why the WM have made it this way is because they want to prevent you from using/saving their image.
    And they probably also have copyright of the image.

    There is no way that you will be able to use/save the image.
    And if there was, it would be considered hacking.

    These forums rules (1.4: No illegal requests) do not allow you to ask for help with illegal activity.

  • #4
    Senior Coder rnd me's Avatar
    Join Date
    Jun 2007
    Location
    Urbana
    Posts
    4,301
    Thanks
    10
    Thanked 586 Times in 567 Posts
    you question is unclear.
    use html scraping techs to get the img tag attribs, use curl to get the src binary data.

    if it's on another server that's not setting access-control headers, use node.js or greasemonkey to run your javascript.
    my site (updated 13/9/26)
    BROWSER STATS [% share] (2014/5/28) IE7:0.1, IE8:5.3, IE11:8.4, IE9:3.2, IE10:3.2, FF:18.2, CH:46, SF:7.9, NON-MOUSE:32%


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •