Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    New to the CF scene
    Join Date
    Aug 2005
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Large Database Synchronization

    I Have a large database and I need synchronize it with the main databese everyday. The problem is that the database is too large and it will takes afew hours if I want to update it completely everynight.
    Is there any way so I can deduct the number of records updated and just update the records that have been changed?
    If anybody had any experience even in different database , please let me know.
    Thanks.

  • #2
    fci
    fci is offline
    Senior Coder
    Join Date
    Aug 2004
    Location
    Twin Cities
    Posts
    1,345
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I found this but I don't know if this matches what you need:
    http://www.mysqldiff.org/

    if there is a way to find specific differences, you could write something to dump the rows you want.

  • #3
    New to the CF scene
    Join Date
    Aug 2005
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for your help. Actually I work with PostgreSql and I can't use this tool. you wrote that there is a way to find the differences and dump the rows I want, but I'm not sure which record was updated or created the day before. Do yo have any idea how can I find out those?

  • #4
    raf
    raf is offline
    Master Coder
    Join Date
    Jul 2002
    Posts
    6,589
    Thanks
    0
    Thanked 0 Times in 0 Posts
    if i interpret your question correctly, then you're looking for an 'incremental refresh'.

    how this is best done depends on the size of your tables + on how much your data changes between the synchronisations.

    the easiest sollution is to add a timestamp-collumn to your live-db tables. With each synchronization, you need to record the timestamp. When you run your select for the next sync, you just add a where-clause like 'WHERE timestamplive > timestamp_last_sync'

    if you have a lott of changing data (as a percentage of the total table) or if you can't lock the live-tables during the synchronisations select, then your best sollution will be to buffer all changes during the day (in a table or textfile) and to then proces it. This way, your live-tables doen't need to be locked, and you'd still have a clean cutoff.
    Posting guidelines I use to see if I will spend time to answer your question : http://www.catb.org/~esr/faqs/smart-questions.html

  • #5
    New to the CF scene
    Join Date
    Aug 2005
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The size o fthe database is 16G and It will takes a long time If I want just add a timestamp column and search and update my database with it.
    The second solution seems really good. But I don't know how to buffer the changes during the day. How can I do that? Would you please let me know? I really appreciate your help.
    Thanks,





    Quote Originally Posted by raf
    if i interpret your question correctly, then you're looking for an 'incremental refresh'.

    how this is best done depends on the size of your tables + on how much your data changes between the synchronisations.

    the easiest sollution is to add a timestamp-collumn to your live-db tables. With each synchronization, you need to record the timestamp. When you run your select for the next sync, you just add a where-clause like 'WHERE timestamplive > timestamp_last_sync'

    if you have a lott of changing data (as a percentage of the total table) or if you can't lock the live-tables during the synchronisations select, then your best sollution will be to buffer all changes during the day (in a table or textfile) and to then proces it. This way, your live-tables doen't need to be locked, and you'd still have a clean cutoff.

  • #6
    raf
    raf is offline
    Master Coder
    Join Date
    Jul 2002
    Posts
    6,589
    Thanks
    0
    Thanked 0 Times in 0 Posts
    it of course depends on how your application works and how you handle your db-manipulations.

    your main db-manipulations will be inserts, updates and deletes.
    lets assume that the first 2 are on a 1 record base --> one query is needed to insert/update each record
    the last (deletes) can be done for multiple records with one go --> you just need the collection of primary key values of all records from a table (or tables if you use cascades) that need to be deleted

    so for the inserts and updates, you'll probably store the querys you executed on your live db in a textfile so that you can just run these querys later against your secondary db.
    for the deletes, you best just keep a table like
    PK_deletes|tablename|pk_value_delete|timestampcol
    PK_deletes is you autonum
    tablename contains the tablename of your live db you deleted a record from
    pk_value_delete contains the PK value from the table you deleted from
    When you then sync the db, you just select all records with a timestamp < your cutofftimestamp and you sort them by tablename so you can build 1 deletequery to remove all deleted records from 1 table (i don't know if you can do a groupconcat in Postgres but i assume you can --> this way you would only get 1 record for each table, with the PK's already turned into a commaseperated string)

    of course, you'll need to makes sure that your querys are ran in the right order (first the inserts, then the updates, then the deletes?)

    to buffer these querys and maintain the deltetable efficiently, you'll of course best use a middletier for your db-manipulations. or a few functions that will do the db-manipulation on your live db + create a bufferfile or append the querys to it.
    Posting guidelines I use to see if I will spend time to answer your question : http://www.catb.org/~esr/faqs/smart-questions.html


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •