11-10-2005, 02:57 PM
I'm trying to make a mirrored copy of a website (that I helped build) using wget (Windows or Unix based, I don't care).
I believe the main programmer used cookies instead of sessions to control everything. So, when I go to crawl it, I am unable to get all of the pages. I need to get this site crawled by next Wednesday and nothing's working.
Any wget experts out there that can help me?
11-10-2005, 05:01 PM
If you want to download part of the site and kill wget, then download the rest of it later, use the --save-cookies <filename> --keep-session-cookies options to save the cookies to a file. then use --load-cookies <filename> to use them in the next session.
However, since wget is already using cookies, I suspect that is not the problem you are having.
11-16-2005, 03:25 PM
--keep-session-cookies is coming up for me as an unrecognized option for both windows and OS X version of wget. Any ideas?
11-17-2005, 06:37 PM
The only reason you need to save the cookies at all is if you want to download the site in several sessions. If you are just going to start wget and let it run until the entire site is downloaded, you don't really need to save the cookies to a file.
If --keep-session-cookies is not recognized on Windoze or OSX, don't use it. I'm looking at the man page on Linux and it says it will work on Linux. I suppose your other option is to get a Linux machine and use --keep-session-cookies there http://codingforums.com/images/icons/icon7.gif
I don't really think that cookies are the cause of your inability to download the entire site. I would suggest you look at the missing pages, find where they are linked from and figure out what elements of the referrer page caused wget to not download the linked pages.
You might want to consider the -c and -nc options. -c tells wget to continue a previous download and -nc the tells wget not to clobber any exisitng files.