View Single Post

  #2 (permalink)  
Old 08-24-2005
Doug Freyburger
 
Posts: n/a
Default Re: interfilesystem copies: large du diffs

orgone wrote:
>
> I recently rsync'd around 2.8TB between a RHE server (jfs fs) and a
> Netapps system. Did a 'du -sk' against each to verify the transfers:
>
> 2894932960 sources total, KB
> 2751664496 destination total, KB


"df" uses actual blocks allocated. "du" takes the
file size and concludes that all blocks are allocated.

> That's a 140GB discrepancy. Subsequent verbose rsyncs have turned up
> nothing that was not originally transferred.
>
> I often note similar behaviour with smaller transfers between servers
> with similar OS/fs combos and have always seen it to come extent with
> transfers between systems of any type. It's just that the usual
> discrepancies in this case are magnified greatly by the sheer volume of
> data. Needless to say, 140GB going missing would be a bit of a problem
> and it's not much fun picking through 2.8TB for MIA data.
>
> Can anyone shed some light on why this happens?


My best guess is the NetApp somehow handles sparsely allocated
files differently so that "du" sees the block actually
allocated not just the file size using the address of the last
byte.

Alternate theory that is far less likely: On your source tree
you have a history of making hundreds of thousands of files
and then deleting nearly all of them, leaving a lot of very
large directories. On your target tree the directories are
much smaller.

Yet another alternate theory: Smaller blcok/fragment/extent
size on the target. So on the source any file has a fairly
large minimum block count but on the target smaller files
take fewer blocks. You would need very many small files to
account for a 3% difference, but a few 100K files under 512
bytes should cause this.

Reply With Quote