This is a discussion on Filesystems created identically, but have different sizes within the Linux General forums, part of the Linux Forums category; I've inherited a build process that creates a filesystem-in-a-file using a bash script which goes, in ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I've inherited a build process that creates a filesystem-in-a-file using a
bash script which goes, in part, like this: dd if=/dev/zero of=fsfile bs=1024 count=4096 mke2fs -F fsfile mount -t ext2 -o loop fsfile /mnt/fs cp -a /somedir/. /mnt/fs/. umount /mnt/fs sync gzip -9 fsfile The contents of /somedir remain the same across builds. What puzzles me is that the resulting file "fsfile.gz" can have different sizes when produced by successive builds. Right now I'm looking at two with sizes 1578454 and 1565838. They both expand into unzipped files of the same length, 4194304, and when I mount them and compare them with "diff -r", no differences are found. Given all this, how can the two .gz files be different? Assuming gzip is fully deterministic, the only possible discrepancy I can think of is in some kind of filesystem metadata--a creation timestamp, perhaps? I'm not very knowledgable in this area. It's difficult to conceive of a large enough divergence in metadata to cause a difference of 12,616 bytes (almost 1%) in the final compressed files. Is there any good reason for what I'm seeing? Thanks in advance for any advice. -- Sean McAfee -- etzwane@schwag.org |
|
|||
|
On Wed, 16 Mar 2005 19:50:19 -0000, S McAfee staggered into the Black
Sun and said: > dd if=/dev/zero of=fsfile bs=1024 count=4096 > mke2fs -F fsfile > mount -t ext2 -o loop fsfile /mnt/fs > cp -a /somedir/. /mnt/fs/. > umount /mnt/fs > sync > gzip -9 fsfile > > The contents of /somedir remain the same across builds. What puzzles > me is that the resulting file "fsfile.gz" can have different sizes > when produced by successive builds. When I mount them and compare > them with "diff -r", no differences are found. > > Given all this, how can the two .gz files be different? Assuming gzip > is fully deterministic, the only possible discrepancy I can think of > is in some kind of filesystem metadata--a creation timestamp, perhaps? Nope. Creation time is not stored on ext2. ctime, mtime, and atime, however, are. > It's difficult to conceive of a large enough divergence in metadata to > cause a difference of 12,616 bytes ctime, mtime, and atime are all 4 bytes. If there are 1000 files in somedir, you've got 12,000 bytes of timestamps. You probably don't have that many files, but there's also block allocation. Successive copies of files into the loopback-mounted filesystem may have the same blocks stored in different block groups, which would naturally cause gzip differences. The UUID will also be different on each filesystem for obvious reasons. I tested this out with small loopback ext2 filesystems and mount and cp and dumpe2fs and such, and found that a number of things on the filesystem varied: UUID, creation time, mount time, write time, last fsck, next fsck, directory hash, and free block numbers. The timestamps on the files were variable as well. All of those things are probably accounting for the differences in the sizes of the gzipped files. HTH, -- Matt G|There is no Darkness in Eternity/But only Light too dim for us to see Brainbench MVP for Linux Admin / mail: TRAP + SPAN don't belong http://www.brainbench.com / Hire me! -----------------------------/ http://crow202.dyndns.org/~mhgraham/resume |
|
|||
|
In article <slrnd3h570.gi0.danSPANceswitTRAPhcrows@samantha.c row202.dyndns.org>,
Dances With Crows <daSPANnceswithcroTRAPws@gmail.com> wrote: >On Wed, 16 Mar 2005 19:50:19 -0000, S McAfee staggered into the Black >Sun and said: >> dd if=/dev/zero of=fsfile bs=1024 count=4096 >> mke2fs -F fsfile >> mount -t ext2 -o loop fsfile /mnt/fs >> cp -a /somedir/. /mnt/fs/. >> umount /mnt/fs >> sync >> gzip -9 fsfile >> Given all this, how can the two .gz files be different? Assuming gzip >> is fully deterministic, the only possible discrepancy I can think of >> is in some kind of filesystem metadata--a creation timestamp, perhaps? > >Nope. Creation time is not stored on ext2. ctime, mtime, and atime, >however, are. I meant metadata for the filesystem as a whole, not for its contents. >> It's difficult to conceive of a large enough divergence in metadata to >> cause a difference of 12,616 bytes >ctime, mtime, and atime are all 4 bytes. If there are 1000 files in >somedir, you've got 12,000 bytes of timestamps. But "cp -a" preserves the timestamps. >I tested this out with small loopback ext2 filesystems and mount and cp >and dumpe2fs and such, and found that a number of things on the >filesystem varied: UUID, creation time, mount time, write time, last >fsck, next fsck, directory hash, and free block numbers. The timestamps >on the files were variable as well. All of those things are probably >accounting for the differences in the sizes of the gzipped files. HTH, It sure does, thanks. I'll stop worrying now. -- Sean McAfee -- etzwane@schwag.org |