PDA

View Full Version : taking XML data via raw HTML?



alex80
Oct 22nd, 2002, 02:15 AM
hiya, i was wonderingif there is a way to take and format data using XML, but data that does not come, to you properly, data that just sits on a website (ex, weather data on bbc.co.uk, or sillilar) and is HTML?

Alex Vincent
Oct 22nd, 2002, 02:38 AM
I don't quite follow you. Can you give me an example of the sort of source code you're talking about?

Which is the content the user calls on and which is the content you want the page to automatically call on?

BrainJar
Oct 22nd, 2002, 05:06 PM
You can if the page you're inputting is valid XHTML or some very simple, well-formed HTML. But if there are any unclosed tags or other errors your parser will likely error out. I don't know of any XML parser that will clean up a document for you.

You could also try writing your own parser to clean up bad HTML but that's bound to get complicated considering just how bad HTML can be and still appear readable on a browser.

I would caution that you need to be careful scrapping data from other's web sites. You could be infringing on their copywrites or terms of use so be sure to ask permission first.