...

View Full Version : Large Database Synchronization



ashkan
09-07-2005, 11:06 PM
I Have a large database and I need synchronize it with the main databese everyday. The problem is that the database is too large and it will takes afew hours if I want to update it completely everynight.
Is there any way so I can deduct the number of records updated and just update the records that have been changed?
If anybody had any experience even in different database , please let me know.
Thanks.
:confused:

fci
09-07-2005, 11:59 PM
I found this but I don't know if this matches what you need:
http://www.mysqldiff.org/

if there is a way to find specific differences, you could write something to dump the rows you want.

ashkan
09-08-2005, 12:57 AM
Thanks for your help. Actually I work with PostgreSql and I can't use this tool. you wrote that there is a way to find the differences and dump the rows I want, but I'm not sure which record was updated or created the day before. Do yo have any idea how can I find out those?

raf
09-08-2005, 10:55 PM
if i interpret your question correctly, then you're looking for an 'incremental refresh'.

how this is best done depends on the size of your tables + on how much your data changes between the synchronisations.

the easiest sollution is to add a timestamp-collumn to your live-db tables. With each synchronization, you need to record the timestamp. When you run your select for the next sync, you just add a where-clause like 'WHERE timestamplive > timestamp_last_sync'

if you have a lott of changing data (as a percentage of the total table) or if you can't lock the live-tables during the synchronisations select, then your best sollution will be to buffer all changes during the day (in a table or textfile) and to then proces it. This way, your live-tables doen't need to be locked, and you'd still have a clean cutoff.

ashkan
09-09-2005, 11:13 PM
The size o fthe database is 16G and It will takes a long time If I want just add a timestamp column and search and update my database with it.
The second solution seems really good. But I don't know how to buffer the changes during the day. How can I do that? Would you please let me know? I really appreciate your help.
Thanks,






if i interpret your question correctly, then you're looking for an 'incremental refresh'.

how this is best done depends on the size of your tables + on how much your data changes between the synchronisations.

the easiest sollution is to add a timestamp-collumn to your live-db tables. With each synchronization, you need to record the timestamp. When you run your select for the next sync, you just add a where-clause like 'WHERE timestamplive > timestamp_last_sync'

if you have a lott of changing data (as a percentage of the total table) or if you can't lock the live-tables during the synchronisations select, then your best sollution will be to buffer all changes during the day (in a table or textfile) and to then proces it. This way, your live-tables doen't need to be locked, and you'd still have a clean cutoff.

raf
09-10-2005, 10:04 AM
it of course depends on how your application works and how you handle your db-manipulations.

your main db-manipulations will be inserts, updates and deletes.
lets assume that the first 2 are on a 1 record base --> one query is needed to insert/update each record
the last (deletes) can be done for multiple records with one go --> you just need the collection of primary key values of all records from a table (or tables if you use cascades) that need to be deleted

so for the inserts and updates, you'll probably store the querys you executed on your live db in a textfile so that you can just run these querys later against your secondary db.
for the deletes, you best just keep a table like
PK_deletes|tablename|pk_value_delete|timestampcol
PK_deletes is you autonum
tablename contains the tablename of your live db you deleted a record from
pk_value_delete contains the PK value from the table you deleted from
When you then sync the db, you just select all records with a timestamp < your cutofftimestamp and you sort them by tablename so you can build 1 deletequery to remove all deleted records from 1 table (i don't know if you can do a groupconcat in Postgres but i assume you can --> this way you would only get 1 record for each table, with the PK's already turned into a commaseperated string)

of course, you'll need to makes sure that your querys are ran in the right order (first the inserts, then the updates, then the deletes?)

to buffer these querys and maintain the deltetable efficiently, you'll of course best use a middletier for your db-manipulations. or a few functions that will do the db-manipulation on your live db + create a bufferfile or append the querys to it.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum