Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4
  1. #1
    Regular Coder
    Join Date
    Feb 2005
    Location
    Lawrence, Kansas
    Posts
    125
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Need some wget help

    I'm trying to make a mirrored copy of a website (that I helped build) using wget (Windows or Unix based, I don't care).

    I believe the main programmer used cookies instead of sessions to control everything. So, when I go to crawl it, I am unable to get all of the pages. I need to get this site crawled by next Wednesday and nothing's working.

    Any wget experts out there that can help me?

    Thanks,

    Eric

  • #2
    Regular Coder
    Join Date
    Sep 2005
    Location
    Madison, Indiana, USA
    Posts
    166
    Thanks
    0
    Thanked 0 Times in 0 Posts
    wget uses cookies by default. However, it does not store them between invocations.

    If you want to download part of the site and kill wget, then download the rest of it later, use the --save-cookies <filename> --keep-session-cookies options to save the cookies to a file. then use --load-cookies <filename> to use them in the next session.

    However, since wget is already using cookies, I suspect that is not the problem you are having.



    .

  • #3
    Regular Coder
    Join Date
    Feb 2005
    Location
    Lawrence, Kansas
    Posts
    125
    Thanks
    0
    Thanked 0 Times in 0 Posts
    --keep-session-cookies is coming up for me as an unrecognized option for both windows and OS X version of wget. Any ideas?

    Thanks,

    Eric

  • #4
    Regular Coder
    Join Date
    Sep 2005
    Location
    Madison, Indiana, USA
    Posts
    166
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The only reason you need to save the cookies at all is if you want to download the site in several sessions. If you are just going to start wget and let it run until the entire site is downloaded, you don't really need to save the cookies to a file.

    If --keep-session-cookies is not recognized on Windoze or OSX, don't use it. I'm looking at the man page on Linux and it says it will work on Linux. I suppose your other option is to get a Linux machine and use --keep-session-cookies there

    I don't really think that cookies are the cause of your inability to download the entire site. I would suggest you look at the missing pages, find where they are linked from and figure out what elements of the referrer page caused wget to not download the linked pages.

    You might want to consider the -c and -nc options. -c tells wget to continue a previous download and -nc the tells wget not to clobber any exisitng files.



    .
    Last edited by hyperbole; 11-17-2005 at 10:27 PM.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •