This is a discussion on Is my understanding of rsync correct? within the Linux Networking forums, part of the Linux Forums category; Hi, I am going to do a rsync between 2 machines over a slow network connection. On source machine there ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi,
I am going to do a rsync between 2 machines over a slow network connection. On source machine there is a large directory set i need to copy over to machine 2. Here is my plan to do this using rsync: 1. I run zip to zip up all the directories into one zip file 2. run rsync to copy over the zip file 3. unzip the file on machine 2 4. create cronjob to do this nightly The size of the zip file should be around 30M - 40M, but after the 1st copy over, it should be OK because rsync only copies the changed parts of the zip file. Do you see if this works? |
|
|||
|
On 19 Oct 2006 11:14:52 -0700, linq936@hotmail.com wrote:
>Hi, > I am going to do a rsync between 2 machines over a slow network >connection. On source machine there is a large directory set i need to >copy over to machine 2. Here is my plan to do this using rsync: > > 1. I run zip to zip up all the directories into one zip file > 2. run rsync to copy over the zip file > 3. unzip the file on machine 2 > 4. create cronjob to do this nightly > > The size of the zip file should be around 30M - 40M, but after the >1st copy over, it should be OK because rsync only copies the changed >parts of the zip file. > > Do you see if this works? It fails. Grant. -- http://bugsplatter.mine.nu/ |
|
|||
|
On Thu, 19 Oct 2006 11:14:52 -0700, linq936 wrote:
> Hi, > I am going to do a rsync between 2 machines over a slow network > connection. On source machine there is a large directory set i need to > copy over to machine 2. Here is my plan to do this using rsync: > > 1. I run zip to zip up all the directories into one zip file 2. run > rsync to copy over the zip file 3. unzip the file on machine 2 > 4. create cronjob to do this nightly > > The size of the zip file should be around 30M - 40M, but after the > 1st copy over, it should be OK because rsync only copies the changed parts > of the zip file. > > Do you see if this works? The nice thing about rsync is that it allows to copy directories, and keep the copies in sync with the originals, without having to package them (with zip, tar or whatever) before. I do not think that the scheme that you are proposing would work; but, even if it did, it is not a good use of the capabilities of rsync. I think that you should read the rsync documentation more carefully. |
|
|||
|
Frank W. Steiner wrote: > On Thu, 19 Oct 2006 11:14:52 -0700, linq936 wrote: > > > Hi, > > I am going to do a rsync between 2 machines over a slow network > > connection. On source machine there is a large directory set i need to > > copy over to machine 2. Here is my plan to do this using rsync: > > > > 1. I run zip to zip up all the directories into one zip file 2. run > > rsync to copy over the zip file 3. unzip the file on machine 2 > > 4. create cronjob to do this nightly > > > > The size of the zip file should be around 30M - 40M, but after the > > 1st copy over, it should be OK because rsync only copies the changed parts > > of the zip file. > > > > Do you see if this works? > > The nice thing about rsync is that it allows to copy directories, and > keep the copies in sync with the originals, without having to package them > (with zip, tar or whatever) before. I do not think that the scheme that > you are proposing would work; but, even if it did, it is not a good use of > the capabilities of rsync. > > I think that you should read the rsync documentation more carefully. Thanks for you reply. The problem is the directory set is very large and they are not under one or several root directories, actually they spread a lot. If I run rsync with root directories, I will have to run rsync many many times, each time with a root directory. You say my plan does not work, could you elaborate on that? My understanding is, let us say i have one file, the size is 1M. I know rsync divides the file into pieces and run checksum to compare whether a piece needs to be updated. Let us say the piece size is 1k, then there are 1000 pieces for this file. If there are only 5 pieces whose checksum are different between source and destination file, then only those 5 pieces are copied over. This understanding is not correct? |
|
|||
|
On Thu, 19 Oct 2006 11:14:52 -0700, linq936 wrote:
> Hi, > I am going to do a rsync between 2 machines over a slow network > connection. On source machine there is a large directory set i need to > copy over to machine 2. Here is my plan to do this using rsync: > > 1. I run zip to zip up all the directories into one zip file > 2. run rsync to copy over the zip file > 3. unzip the file on machine 2 > 4. create cronjob to do this nightly > > The size of the zip file should be around 30M - 40M, but after the > 1st copy over, it should be OK because rsync only copies the changed > parts of the zip file. > > Do you see if this works? depends on how large a 'large' file is on your slow network, but you may find that sending 1 large file has issues... if it doesn't go... and stops at 95% for example... and crashes... you lose all of the data sent and have to send it again. I use a program I found called 'splitpea'. It splits the large file into 'chunks'. I rsync the small chucks... if the link dies, I only have to resend the chuck that failed. Once all the data is over there, I use splitpea to re-assemble the large file. jack -- D.A.M. - Mothers Against Dyslexia see http://www.jacksnodgrass.com for my contact info. jack - Grapevine/Richardson |
|
|||
|
linq936@hotmail.com wrote: > The problem is the directory set is very large and they are not under > one or several root directories, actually they spread a lot. > If I run rsync with root directories, I will have to run rsync many > many times, each time with a root directory. Then create a special image of the directories just to rsync. > You say my plan does not work, could you elaborate on that? > My understanding is, let us say i have one file, the size is 1M. I know > rsync divides the file into pieces and run checksum to compare whether > a piece needs to be updated. Let us say the piece size is 1k, then > there are 1000 pieces for this file. > If there are only 5 pieces whose checksum are different between source > and destination file, then only those 5 pieces are copied over. > This understanding is not correct? That is correct. However, how will that help you? The 'zip' function mixes all the file pieces together when it compresses them. Even if a file is unchanged, it will not compress to the same thing in a different context. For example, suppose you have ten files all of which contain the letters 'ab'. The first one may not compress, but the other nine may compress to the idea of 'same as the first file'. Now, what if someone changes that first file? All of a sudden, the encoding of all ten files has changed. This is the norm, not the exception. That is, in a typical zip application, changing one file will affect the encoding of every compressible file after it. (Through a domino affect, basically.) DS |
|
|||
|
David Schwartz wrote: > linq936@hotmail.com wrote: > > > The problem is the directory set is very large and they are not under > > one or several root directories, actually they spread a lot. > > > If I run rsync with root directories, I will have to run rsync many > > many times, each time with a root directory. > > Then create a special image of the directories just to rsync. > > > You say my plan does not work, could you elaborate on that? > > > My understanding is, let us say i have one file, the size is 1M. I know > > rsync divides the file into pieces and run checksum to compare whether > > a piece needs to be updated. Let us say the piece size is 1k, then > > there are 1000 pieces for this file. > > > If there are only 5 pieces whose checksum are different between source > > and destination file, then only those 5 pieces are copied over. > > > This understanding is not correct? > > That is correct. However, how will that help you? The 'zip' function > mixes all the file pieces together when it compresses them. Even if a > file is unchanged, it will not compress to the same thing in a > different context. > > For example, suppose you have ten files all of which contain the > letters 'ab'. The first one may not compress, but the other nine may > compress to the idea of 'same as the first file'. Now, what if someone > changes that first file? All of a sudden, the encoding of all ten files > has changed. > > This is the norm, not the exception. That is, in a typical zip > application, changing one file will affect the encoding of every > compressible file after it. (Through a domino affect, basically.) > > DS Thanks, this really makes sense. Then do you know if tar can work here? Does tar use some algorithm to combine data or just simply concatenate? Or you have any suggestion for my situation? |
|
|||
|
linq936@hotmail.com writes:
>Hi, > I am going to do a rsync between 2 machines over a slow network >connection. On source machine there is a large directory set i need to >copy over to machine 2. Here is my plan to do this using rsync: > 1. I run zip to zip up all the directories into one zip file > 2. run rsync to copy over the zip file > 3. unzip the file on machine 2 > 4. create cronjob to do this nightly > The size of the zip file should be around 30M - 40M, but after the >1st copy over, it should be OK because rsync only copies the changed >parts of the zip file. > Do you see if this works? Really teriible idea. Any compression looks through the file to figure out what a good compressionscheme is.Ie even small changes in teh file can totally change the zipped file. Just transfer all the files using rsync. As you say after the first time only the changes will be transfered. |
|
|||
|
linq936@hotmail.com wrote:
> David Schwartz wrote: >> linq936@hotmail.com wrote: >> >>> The problem is the directory set is very large and they are not under >>> one or several root directories, actually they spread a lot. >>> If I run rsync with root directories, I will have to run rsync many >>> many times, each time with a root directory. >> Then create a special image of the directories just to rsync. >> >>> You say my plan does not work, could you elaborate on that? >>> My understanding is, let us say i have one file, the size is 1M. I know >>> rsync divides the file into pieces and run checksum to compare whether >>> a piece needs to be updated. Let us say the piece size is 1k, then >>> there are 1000 pieces for this file. >>> If there are only 5 pieces whose checksum are different between source >>> and destination file, then only those 5 pieces are copied over. >>> This understanding is not correct? >> That is correct. However, how will that help you? The 'zip' function >> mixes all the file pieces together when it compresses them. Even if a >> file is unchanged, it will not compress to the same thing in a >> different context. >> >> For example, suppose you have ten files all of which contain the >> letters 'ab'. The first one may not compress, but the other nine may >> compress to the idea of 'same as the first file'. Now, what if someone >> changes that first file? All of a sudden, the encoding of all ten files >> has changed. >> >> This is the norm, not the exception. That is, in a typical zip >> application, changing one file will affect the encoding of every >> compressible file after it. (Through a domino affect, basically.) >> >> DS > > Thanks, this really makes sense. > > Then do you know if tar can work here? Does tar use some algorithm to > combine data or just simply concatenate? > > Or you have any suggestion for my situation? > Two questions: 1. Why are you afraid of running rsync multiple times, once for each directory ? 2. If you add ten small files somewhere in the middle of the tar archive, the poor rsync would have to work a lot to identify where in the tar archive the modified parts start and where they end. This is not really efficient. I suggest you write a bash script that looks like this: for dir in dir1, dir2, dir3, ... do rsync -azv -e "ssh -C -l remote_user" --delete $dir user@server:backup/$dir done Note that if you use ssh you can achieve 2 goals at once: encrypted connection and compression (via the -C flag). If you really wish to run rsync only once (for religious reasons): IMAGE=/tmp/image/ mkdir $IMAGE for i in dir1, dir2, ... do # this only works if $i and $IMAGE are on the same partition # alternatively you can make symbolic links # and tell rsync to follow them cp -al $i $IMAGE done rsync -azv $IMAGE user@server:backup Mihai PS: I haven't checked the above scripts, perhaps I got some syntax wrong. Take them only as guidelines. |
|
|||
|
linq936@hotmail.com wrote:
> Frank W. Steiner wrote: > > On Thu, 19 Oct 2006 11:14:52 -0700, linq936 wrote: > The problem is the directory set is very large and they are not under > one or several root directories, actually they spread a lot. > If I run rsync with root directories, I will have to run rsync many > many times, each time with a root directory. So what? In your scenario you will have to specify all the directories to zip instead. I don't see how this will make things easier/faster/whatever. As others have pointed out already: Just run rsync on all directories. Maybe you could tell us why you feel this is not a good option. You may also want to have a look at unison: http://www.cis.upenn.edu/~bcpierce/unison/ cu Philipp -- Dr. Philipp Pagel Tel. +49-8161-71 2131 Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186 Technical University of Munich http://mips.gsf.de/staff/pagel |