...

View Full Version : Parse XML - Write to a database --- PLEASE HELP!



maros174
03-02-2010, 02:43 PM
This will kill me, it is taking my whole work day - each day, checking for errors so my visitors don't notice...

BACKGROUND:
I have a financial website that display's Stock Market Data from my country's stock exchange.
Stock Market Data is read from an XML file and written to a database.
Data is refreshed every ten minutes (around 40 times per day), for 200+ stocks.

PROBLEM:
Most of the time everything works fine, but, several times each day, during a refresh, a data for a random stock, usually only one, is read only partialy.
(for example. instead of HighPrice=320.99, to a database is written HighPrice=99, or, instead of DailyLow=4270.01, database contains DailyLow=70.01 )

PROBLEM SCRIPTS:
XML.class.php - parses XML data and places it in a structured PHP array
ZSE.class.php - reads the data and writes it to a database. (I suspect function getMarketTradingData could be an issue (on lines 167-209), everything above just writes to a database)

Potentially important facts:

XML is allways clean of errors.
Time of data collection(refresh) is not important, sometimes the error appears at 10:25, sometimes at 12:35, sometimes at 15:45, etc...
Usually the error is limitied to only one stock out of 200+, rarely two or three of them have errors.
Data is collected for several fields - but only one has an error: High Price, Low Price, Closing Price, Buy Price, Sell Price, Volume, Average Price. But every time the data is read partially it is limited to only one field. For example, sometimes the Closing price has an error (to a database is written 16 instead of 316), sometimes some other price like daily low price or average price.
Dellimiter is not an issue, the data is read partially independent of it. For e.g. sometimes instead of Average Price=24.49 to a database is written only the decimal part 49, sometimes instead of SellPrice=3420 database contains SellPrice=20.

Example:
http://img638.imageshack.us/img638/8433/errortuho.th.png (http://img638.imageshack.us/i/errortuho.png/)

tomws
03-02-2010, 03:23 PM
I don't know what database you're using, but if it and the tables support transactions, you should consider implementing them, if you're not already. Then you can parse-and-insert an usual, but inside a transaction. Before commit, run a comparison between the XML and inserted data. If match, commit the transaction.

If you don't have transactions available, there are probably other ways to do something similar.

I'm not sure what other options are available, but you apparently need to implement some sort of error-checking since you can't trust your code.

tomws
03-02-2010, 03:43 PM
Some other thoughts after looking at the code:

How large is the XML file? Does it significantly vary in size between updates? If variable size, can you find a pattern between the corrupted data and the byte location in the file?

Is the data always consistently formatted? There are never stray characters or spaces in these problem areas?

The getMarketTradingData function seems to pull data without attempting test or format any of it. If you expect a float or integer, you can add extra error-checking with functions like is_float or is_int.

maros174
03-02-2010, 06:19 PM
tomws,
thanks for looking at the code... every little bit helps. This thing is killing me.


I don't know what database you're using, but if it and the tables support transactions, you should consider implementing them, if you're not already...

it's MySQL. I don't know if transactions are implemented or even supported.

Web development company made the site for me, since then, because of the recession, they've gone bust, and now I'm left to my own knowledge which is very limited.


How large is the XML file? Does it significantly vary in size between updates? If variable size, can you find a pattern between the corrupted data and the byte location in the file?

It's around 130 KB. I think it's more or less the same size.

It is online: (http://webservice.zse.hr/delayed/Delayed15MarketData.xml),
and it is refreshed during stock market working hours 10h-16h



Is the data always consistently formatted? There are never stray characters or spaces in these problem areas?

No, it's perfectly clean every time. I checked this.

Nothing distinguishes the area from which only the partial number is "pulled", from thousands of other exactly the same areas from which the number is pulled correctly.

I took copies of the file on several occasions when the parser made the error. I can upload them if they can be helpful.


The getMarketTradingData function seems to pull data without attempting test or format any of it. If you expect a float or integer, you can add extra error-checking with functions like is_float or is_int.

I don't think it would help. The expected number can be float or an integer, both of which is correct.
The data that is only partially parsed from an xml and wrongly written to the database looks something like this:
157 instead of 2157, or 99 instead of 211.99, or 5.33 instead of 445.33

tomws
03-02-2010, 07:08 PM
MySQL can use transactions in InnoDB tables, but not MyISAM. (I think that's still the case.)

Past that, I think I'm not skilled enough to diagnose the problem. The only things I could offer would be adding extra layers of data integrity testing. Perhaps someone with more large-scale experience has some ideas on this.

maros174
03-04-2010, 02:53 PM
Thanks anyway.

I hope someone will be able to help.
I can easily see myself going nuts checking more than a 1000 data points every 10 minutes every working day...



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum