[OT] Database synchronisation

This is a discussion on [OT] Database synchronisation within the PHP Language forums, part of the PHP Programming Forums category; I have a database called LOCAL_DB in a local computer. This DB is managed by third party software so I ...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 07-26-2006
Alvaro G. Vicario
 
Posts: n/a
Default [OT] Database synchronisation

I have a database called LOCAL_DB in a local computer. This DB is managed
by third party software so I have no control over it. I have another
database called INET_DB in an Internet server. I design and maintain this
one (it's an extranet written in PHP).

I need to write an app that dumps certain fields and tables from LOCAL_DB
into INET_DB every night, using a DSL connection, so all changes made in
the office during workday are available at the extranet the following day.
Both DBMS are different so I’m accessing data through ODBC drivers.

The first simple bold algorithm I can think of is:

1) Empty INET_DB
2) Get records from LOCAL_DB
3) For each record : Insert record into INET_DB

Its pitfalls are pretty obvious:

1) It will take years to execute (128Kbps, dude)
2) During the update process, the extranet is useless
3) If my app crashes... well, go figure

My second attempt would be:

1) Get primary keys from INET_DB
2) Get records from LOCAL_DB
3) For each record:
3.1) If PK exists in INET_DB: Update record in INET_DB
3.2) Else: Insert record into INET_DB
4) Remove records from INET_DB where PK no longer exists in LOCAL_DB

Now, will someone stop me before I reinvent the square wheel? I'm not sure
this is yet the right approach. I’ve been googling in search of an
algorithm but I could only find solutions for very specific situations. I’d
appreciate a lot any hint or link.


--
-+ http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
++ Mi sitio sobre programación web: http://bits.demogracia.com
+- Mi web de humor con rayos UVA: http://www.demogracia.com
--
Reply With Quote
  #2 (permalink)  
Old 07-26-2006
mootmail-googlegroups@yahoo.com
 
Posts: n/a
Default Re: Database synchronisation

Alvaro G. Vicario wrote:
> I need to write an app that dumps certain fields and tables from LOCAL_DB
> into INET_DB every night, using a DSL connection, so all changes made in
> the office during workday are available at the extranet the following day.
> Both DBMS are different so I'm accessing data through ODBC drivers.
>


We do a similar thing here at my work. We have a variety of data
imports to run and summary tables to build on a nightly basis. Now, as
you indicated, you can get as fancy and efficient as you want by
comparing PK's and such, but we went with the pure brute-force method
you mentioned first. Granted, this is all happening on our local
network, so we weren't concerned with data transfer rates, which could
turn out to be a problem for you depending on the quantity of data you
have to move.

As you mentioned, the obvious problem with the brute force method is
that between dumping the contents of the table and re-inserting, the
table (and anything that references it) is useless. We got around this
by using temporary tables to do the processing, then moving them into
place seamlessly when finished.

For example (in psuedo-mysql, so you'll need to adjust for your DB),
assuming 'mytable' is the table you're trying to update:
DROP TABLE mytable_new // just in case it still exists
CREATE TABLE mytable_new LIKE mytable
//do whatever you need to do for inserting into 'mytable_new'
DROP TABLE mytable_old // just in case it still exists
ALTER TABLE mytable RENAME mytable_old
ALTER TABLE mytable_new RENAME mytable
DROP TABLE mytable_old

As you can see, we create a temporary table just like the original and
do our inserts into that one. Then, when finished, move the old one
out of the way and put the new one in. Depending on your preference,
you can keep the _old table around to have a revolving backup, or just
toss it.

This has the effect of solving two of your potential issues: 1) the
extranet is not 'useless' while the import is taking place because the
old data doesn't leave until the new data is ready, and 2) if your
import crashes mid-execution, the only thing that should be affected is
the _new table and you can just drop it and start the import again
later if that happens. And it even might solve the issue of the speed
of data transfer, depending on how necessary it is to have yesterday's
data by start of business the next day. If that isn't essential, then
since the data import is invisible to the end user, it doesn't really
matter how long it takes (as long as its still < 24 hours, of course).

Hope that gives you some ideas,
Moot

Reply With Quote
  #3 (permalink)  
Old 07-26-2006
Colin McKinnon
 
Posts: n/a
Default Re: [OT] Database synchronisation

Alvaro G. Vicario wrote:

> I have a database called LOCAL_DB in a local computer. This DB is managed
> by third party software so I have no control over it. I have another
> database called INET_DB in an Internet server. I design and maintain this
> one (it's an extranet written in PHP).
>


There's a thing on phpclassess which claims to replicate MySQL schemas (you
didn't say what DBMS) but it doesn't understand about indices and doesn't
replicate data.

> I need to write an app that dumps certain fields and tables from LOCAL_DB
> into INET_DB every night, using a DSL connection, so all changes made in
> the office during workday are available at the extranet the following day.
> Both DBMS are different so I’m accessing data through ODBC drivers.
>
> The first simple bold algorithm I can think of is:
>
> 1) Empty INET_DB
> 2) Get records from LOCAL_DB
> 3) For each record : Insert record into INET_DB
>


Assuming we're talking MySQL and you are just publishing the data, then the
bestway to do it would be to slave the remote DB from the local - but that
asssumes you can reconfigure the remote DB and connect across the MySQL
port.

An alternative approach would be to run rsync - which is supposed to just
update the bits of a large file which have changed. But you'd need to to
shutdown the DBMS while it ran.

Failing that why not just shutdown the DBMS and transfer the files across -
it'll probably be a lot quicker than rebuilding the table one row at a
time.

The best solution though would be a custom script which copies over any
changes however you need to make sure that:

1) you never delete data (well you can - you just mark it as inactive then
run a housekeeping job to delete things which have been inactive for, say 2
successful update cycles).

2) you need a timestamp and primary key field on every table.

(BTW - why bother to check if a record exists before you try and insert it -
if its already there and you've got a primary key, the DBM<S will tell you
at the time of insertion).

HTH

C.

Reply With Quote
  #4 (permalink)  
Old 07-27-2006
Richard Levasseur
 
Posts: n/a
Default Re: Database synchronisation


mootmail-googlegroups@yahoo.com wrote:
> Alvaro G. Vicario wrote:
> > I need to write an app that dumps certain fields and tables from LOCAL_DB
> > into INET_DB every night, using a DSL connection, so all changes made in
> > the office during workday are available at the extranet the following day.
> > Both DBMS are different so I'm accessing data through ODBC drivers.
> >

>
> As you mentioned, the obvious problem with the brute force method is
> that between dumping the contents of the table and re-inserting, the
> table (and anything that references it) is useless. We got around this
> by using temporary tables to do the processing, then moving them into
> place seamlessly when finished.
>
> For example (in psuedo-mysql, so you'll need to adjust for your DB),
> assuming 'mytable' is the table you're trying to update:
> DROP TABLE mytable_new // just in case it still exists
> CREATE TABLE mytable_new LIKE mytable
> //do whatever you need to do for inserting into 'mytable_new'
> DROP TABLE mytable_old // just in case it still exists
> ALTER TABLE mytable RENAME mytable_old
> ALTER TABLE mytable_new RENAME mytable
> DROP TABLE mytable_old
>
> As you can see, we create a temporary table just like the original and
> do our inserts into that one. Then, when finished, move the old one
> out of the way and put the new one in. Depending on your preference,
> you can keep the _old table around to have a revolving backup, or just
> toss it.
>
> This has the effect of solving two of your potential issues: 1) the
> extranet is not 'useless' while the import is taking place because the
> old data doesn't leave until the new data is ready, and 2) if your
> import crashes mid-execution, the only thing that should be affected is
> the _new table and you can just drop it and start the import again
> later if that happens. And it even might solve the issue of the speed
> of data transfer, depending on how necessary it is to have yesterday's
> data by start of business the next day. If that isn't essential, then
> since the data import is invisible to the end user, it doesn't really
> matter how long it takes (as long as its still < 24 hours, of course).
>
> Hope that gives you some ideas,
> Moot



This is a good idea, using temporary tables.

Since you are somewhat bandwidth limited, it may be more efficient if
you dump the database to a text file (either comma seperated or SQL)
and load it locally on the remote machine. This way you can compress
it before transfering (and the plain text will compress very nicely),
and it will execute much faster on the remote machine.

Also, building on rsync from another post, another option would be to
diff the old dump and the new dump, then send that to the remote. This
would be usefull if, even after compression, you are still transfering
a lot of data.

In either event, you should be able to dump an SQL file and send that.
It'd be much faster than reading a row from local, inserting to remote,
rinse, repeat.

Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 07:27 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0