View Full Version : Persisting Data: redundance in event of failure
Are there guidelines to follow that reduce the potential for loss when saving modifications to a data source?
e.g.,
source.xml stores data which is used to configure an object
object interaction alters the data
the altered data is saved as source.xml:
- write current data to temp.xml
- read temp.xml & confirm content against current data
if confirmed,
- rename source.xml ('old.xml'),
- rename temp.xml ('source.xml'),
- delete old.xml
The effectiveness of that would seem governed by how often it was performed (i.e., each time data changed vs. at the end of a session), such that failure at the end of a session would result in the loss of all new data.
Perhaps there's a better way.
So, does anyone here work with programs that overwrite files?
If so, are there concerns about losing all data, should something go wrong?
From those, what measures are taken to lessen such a risk?
liorean
05-23-2004, 08:45 PM
Well, I believe a common system is to keep one copy of the original file and keep a +/- changelog together with that, as well as write the changed file. You always perform everything in the changelog on the copy of the original at program load, compare that to file in question, and try at recovery if they do not match. When the changelog passes beyond a certain threshold you reverse it and archive the reversed changelog, and replace the copy of the original with a copy of the current file, and start a new changelog.
Of course, you only do this for all-important, valuable files. You can see variations on the theme in many commercial version tracking source code managers, for instance.
Would there happen to be a technical name for "changelog"?
Or, more broadly, a name for this programming concept/contingency?
(I'm having a difficult time searching for such. :confused: )
mordred
05-23-2004, 10:06 PM
Search for "diff", "diff algorithm", perhaps "diff version CVS", because the mentioned changelog system is essentially how versioning systems like CVS handle revisions. They store each single revision (the +/- part). A "diff" is the difference between two revisions and the name of a tool that creates such a difference-output.
Does that help?
Does that help?
Yes. Thank you. :)
The way I have seen programs handle such situations is by making multiple files.
for example,
start with
file0.dat -> file1.dat ->file2.dat ->file3.dat->file0.dat
they use the timestamp to determine the latest one and overwrite the next one in line for the next save. This allows them to roll back to a good state when it detects some corruption. most of them use 10 files.
If data is so critical, you should be using a reliable database.
I woud also prefer a db-sollution.
Then you can use transactions and rollback if not all actions were succesfull. But i don't think there is a sort of transaction system for filesystem operations.
liorean
05-23-2004, 11:06 PM
Actually, I believe the system I mentioned is born out of the requirement to have a transactions like system for files outside of a database. Source code management taking place within the database itself would be very ineffective.
But, really, the answer is that what you need depends on what you have to work with. If you have a nice amount of processor resources and a database server handy, then that is probably a good way to go.
This topic was aimed more at learning to handle data correctly than being an immediate necessity.
Good information... exactly what I was after. :thumbsup:
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.