PDA

View Full Version : How to retrieve a full source code automatically?



polvoazul
Oct 20th, 2008, 06:49 AM
Hello! I am writing a program (in ruby) that collects data from certain websites and stores that in a database.
Although i am fairly familiar with ruby, i am new to web coding...
My main problem is that i want to retrieve the full html source code of websites which use javascript.

If you use firebug (an extension for firefox) you can see the html of the page with all the modifications of the included JS. I would like to retrieve that to my ruby program and then to my DB.

Another important thing is the ability to transmit data to the websites' JS via the program, i also need to learn that.

If anyone can help me with any of these it will be very helpful.
Any questions please ask!

ifubad
Oct 20th, 2008, 07:05 AM
What's a server side language have do with html and css?

polvoazul
Oct 20th, 2008, 07:19 AM
What's a server side language have do with html and css?

I dont know what exactly what is a server side language, but what i mean is that i get things on the screen of my browser, i want that data to be saved in a text file. I simply want to download the source code of the website into a file.

BUT i need the full html, the html AFTER JS modifications.

itsallkizza
Oct 20th, 2008, 07:45 AM
That's not going to be possible, not with any conventional server-side languages. Even google can't do it - innapropriate client-side dynamic content is one way to deceive googlebot and get blacklisted ;)

That said, everything is possible, somehow. Only way I can think of right now is to create a program in something like Java or C that utilizes a browser to open urls, wait a designated amount of time for the changes to appear, /then/ pull the data.

Unfortunately I am limited in my knowledge of Java (though I have a few ideas of how to get started) and my C is almost nonexistant, so I can't help you much there.

You might try surfing the internet for a program that already does this. Caching hard-coded data is easy, but caching client-side dynamic changes is muuuch more difficult.

EDIT: Also wanted to add that interpretation of javascript is browser-specific. Different browsers have different root libraries and different translators.

polvoazul
Oct 20th, 2008, 08:16 AM
yeah, what you said does make sence, considering the JS modifications is done by the browser we need the browser opened... Thks, i had not thought of that.

But maybe there is a standalone interpreter or a light invisible 'browser' available..
It wouldn't be very convenient for my program to open FF (HEAVY!!!) . But if it is my only option then i dont mind, the only thing is that i would have to make the program control the browser, anyone knows how to do that?
And wait, isn't the data already on my computer? just waiting to be 'showed' on the html by the JS? Isn't there a way to get to that core data?

[Maybe this is more a javascript topic than HTML & CSS]

gnomeontherun
Oct 20th, 2008, 01:54 PM
This sounds like a C++ or some kind of programming not for the web.