View Full Version : virtual server side browser? (interacting with websites automatically)

06-22-2009, 11:23 PM
I am looking to find some way to interact with a website in a server side script or application. I basically want to monitor multiple websites for certain information. In order for me to do this, I need something to log into a secure website and navigate a few dynamically generated/ajax controlled pages to get the information I need.
I have basically done this client side in Firefox with Greasemonkey, but I want to be able to put this on my server, update a database, and be able to log into more than one session per website at a time.
I know at least a certain extent of what I want to do is done all the time with search engine crawlers and spam bots. The part I am worried about, is supporting javascript, (XMLHttpRequest) and DOM manipulations.
I don't really care what language this is done with, but I would like to keep the secure login sessions open while I need them, so I am thinking Java servlets might be better than say perl or python.
Does anybody have any suggestions about how to do this or what I might look at to try to do this?

06-30-2009, 01:05 AM
Funny, I got an email about a reply to my thread, and it was an automated spam message (now deleted). If only I had ways of automating web browsing like the spammers do I could accomplish the legitimate things I want to accomplish.
While nobody has suggested anything here, I was doing some research and I've found some interesting information that I might be able to use, unfortunately nothing as complete as I would like it to be.
At the very lowest level, cURL or libcurl (http://curl.haxx.se/) might do what I want if I can script every interaction that a webpage would do with a server. That might be somewhat nasty when simulating some complex ajax populated selection menus or getting specific information from complicated tables.
I was also looking at some open source web crawlers to see if I could somehow use any parsing engines to help simplify website interaction.
Some of the interesting ones included:
JoBo (http://www.matuschek.net/jobo-menu/)
WebSPHINX (http://www-2.cs.cmu.edu/~rcm/websphinx/)

Something very promising I found, included the use of Mozilla's Rhino (javascript for java) to implement a DOM environment. envs.js (http://ejohn.org/blog/bringing-the-browser-to-the-server/)
Although this seems promising, I'm not sure if the HTML parser or support for some of the other things I need (HTTPS, cookies) are as functional as I like and I don't know how I would go about making that work.

Anyways, I'm still trying to figure out exactly how I am going to make these things work. I keep seeing glaring examples of the result of what I want (spammers, ebay auction sniping, etc) but I haven't found any practical and efficient ways to do it. Maybe I have to do it at a tedious low level, which might not be too bad if I could maybe find some tools to trace all the http/https interactions. Or maybe I integrate a few different things, or find someone who already has.

Any suggestions or comments?

07-08-2009, 11:36 PM
I too am looking for an easy way to do this. So far, the recommendation has been to embed "Gecko" the rendering engine of Firefox to accomplish this task.

If anybody else has any good ideas, I am all ears.