This is a discussion on [OT] Database synchronisation within the PHP Language forums, part of the PHP Programming Forums category; I have a database called LOCAL_DB in a local computer. This DB is managed by third party software so I ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I have a database called LOCAL_DB in a local computer. This DB is managed
by third party software so I have no control over it. I have another database called INET_DB in an Internet server. I design and maintain this one (it's an extranet written in PHP). I need to write an app that dumps certain fields and tables from LOCAL_DB into INET_DB every night, using a DSL connection, so all changes made in the office during workday are available at the extranet the following day. Both DBMS are different so I’m accessing data through ODBC drivers. The first simple bold algorithm I can think of is: 1) Empty INET_DB 2) Get records from LOCAL_DB 3) For each record : Insert record into INET_DB Its pitfalls are pretty obvious: 1) It will take years to execute (128Kbps, dude) 2) During the update process, the extranet is useless 3) If my app crashes... well, go figure My second attempt would be: 1) Get primary keys from INET_DB 2) Get records from LOCAL_DB 3) For each record: 3.1) If PK exists in INET_DB: Update record in INET_DB 3.2) Else: Insert record into INET_DB 4) Remove records from INET_DB where PK no longer exists in LOCAL_DB Now, will someone stop me before I reinvent the square wheel? I'm not sure this is yet the right approach. I’ve been googling in search of an algorithm but I could only find solutions for very specific situations. I’d appreciate a lot any hint or link. -- -+ http://alvaro.es - Álvaro G. Vicario - Burgos, Spain ++ Mi sitio sobre programación web: http://bits.demogracia.com +- Mi web de humor con rayos UVA: http://www.demogracia.com -- |
|
|||
|
Alvaro G. Vicario wrote:
> I need to write an app that dumps certain fields and tables from LOCAL_DB > into INET_DB every night, using a DSL connection, so all changes made in > the office during workday are available at the extranet the following day. > Both DBMS are different so I'm accessing data through ODBC drivers. > We do a similar thing here at my work. We have a variety of data imports to run and summary tables to build on a nightly basis. Now, as you indicated, you can get as fancy and efficient as you want by comparing PK's and such, but we went with the pure brute-force method you mentioned first. Granted, this is all happening on our local network, so we weren't concerned with data transfer rates, which could turn out to be a problem for you depending on the quantity of data you have to move. As you mentioned, the obvious problem with the brute force method is that between dumping the contents of the table and re-inserting, the table (and anything that references it) is useless. We got around this by using temporary tables to do the processing, then moving them into place seamlessly when finished. For example (in psuedo-mysql, so you'll need to adjust for your DB), assuming 'mytable' is the table you're trying to update: DROP TABLE mytable_new // just in case it still exists CREATE TABLE mytable_new LIKE mytable //do whatever you need to do for inserting into 'mytable_new' DROP TABLE mytable_old // just in case it still exists ALTER TABLE mytable RENAME mytable_old ALTER TABLE mytable_new RENAME mytable DROP TABLE mytable_old As you can see, we create a temporary table just like the original and do our inserts into that one. Then, when finished, move the old one out of the way and put the new one in. Depending on your preference, you can keep the _old table around to have a revolving backup, or just toss it. This has the effect of solving two of your potential issues: 1) the extranet is not 'useless' while the import is taking place because the old data doesn't leave until the new data is ready, and 2) if your import crashes mid-execution, the only thing that should be affected is the _new table and you can just drop it and start the import again later if that happens. And it even might solve the issue of the speed of data transfer, depending on how necessary it is to have yesterday's data by start of business the next day. If that isn't essential, then since the data import is invisible to the end user, it doesn't really matter how long it takes (as long as its still < 24 hours, of course). Hope that gives you some ideas, Moot |
|
|||
|
Alvaro G. Vicario wrote:
> I have a database called LOCAL_DB in a local computer. This DB is managed > by third party software so I have no control over it. I have another > database called INET_DB in an Internet server. I design and maintain this > one (it's an extranet written in PHP). > There's a thing on phpclassess which claims to replicate MySQL schemas (you didn't say what DBMS) but it doesn't understand about indices and doesn't replicate data. > I need to write an app that dumps certain fields and tables from LOCAL_DB > into INET_DB every night, using a DSL connection, so all changes made in > the office during workday are available at the extranet the following day. > Both DBMS are different so I’m accessing data through ODBC drivers. > > The first simple bold algorithm I can think of is: > > 1) Empty INET_DB > 2) Get records from LOCAL_DB > 3) For each record : Insert record into INET_DB > Assuming we're talking MySQL and you are just publishing the data, then the bestway to do it would be to slave the remote DB from the local - but that asssumes you can reconfigure the remote DB and connect across the MySQL port. An alternative approach would be to run rsync - which is supposed to just update the bits of a large file which have changed. But you'd need to to shutdown the DBMS while it ran. Failing that why not just shutdown the DBMS and transfer the files across - it'll probably be a lot quicker than rebuilding the table one row at a time. The best solution though would be a custom script which copies over any changes however you need to make sure that: 1) you never delete data (well you can - you just mark it as inactive then run a housekeeping job to delete things which have been inactive for, say 2 successful update cycles). 2) you need a timestamp and primary key field on every table. (BTW - why bother to check if a record exists before you try and insert it - if its already there and you've got a primary key, the DBM<S will tell you at the time of insertion). HTH C. |
|
|||
|
mootmail-googlegroups@yahoo.com wrote: > Alvaro G. Vicario wrote: > > I need to write an app that dumps certain fields and tables from LOCAL_DB > > into INET_DB every night, using a DSL connection, so all changes made in > > the office during workday are available at the extranet the following day. > > Both DBMS are different so I'm accessing data through ODBC drivers. > > > > As you mentioned, the obvious problem with the brute force method is > that between dumping the contents of the table and re-inserting, the > table (and anything that references it) is useless. We got around this > by using temporary tables to do the processing, then moving them into > place seamlessly when finished. > > For example (in psuedo-mysql, so you'll need to adjust for your DB), > assuming 'mytable' is the table you're trying to update: > DROP TABLE mytable_new // just in case it still exists > CREATE TABLE mytable_new LIKE mytable > //do whatever you need to do for inserting into 'mytable_new' > DROP TABLE mytable_old // just in case it still exists > ALTER TABLE mytable RENAME mytable_old > ALTER TABLE mytable_new RENAME mytable > DROP TABLE mytable_old > > As you can see, we create a temporary table just like the original and > do our inserts into that one. Then, when finished, move the old one > out of the way and put the new one in. Depending on your preference, > you can keep the _old table around to have a revolving backup, or just > toss it. > > This has the effect of solving two of your potential issues: 1) the > extranet is not 'useless' while the import is taking place because the > old data doesn't leave until the new data is ready, and 2) if your > import crashes mid-execution, the only thing that should be affected is > the _new table and you can just drop it and start the import again > later if that happens. And it even might solve the issue of the speed > of data transfer, depending on how necessary it is to have yesterday's > data by start of business the next day. If that isn't essential, then > since the data import is invisible to the end user, it doesn't really > matter how long it takes (as long as its still < 24 hours, of course). > > Hope that gives you some ideas, > Moot This is a good idea, using temporary tables. Since you are somewhat bandwidth limited, it may be more efficient if you dump the database to a text file (either comma seperated or SQL) and load it locally on the remote machine. This way you can compress it before transfering (and the plain text will compress very nicely), and it will execute much faster on the remote machine. Also, building on rsync from another post, another option would be to diff the old dump and the new dump, then send that to the remote. This would be usefull if, even after compression, you are still transfering a lot of data. In either event, you should be able to dump an SQL file and send that. It'd be much faster than reading a row from local, inserting to remote, rinse, repeat. |