Bluehost.com Web Hosting $6.95

Filesystems created identically, but have different sizes

This is a discussion on Filesystems created identically, but have different sizes within the Linux General forums, part of the Linux Forums category; I've inherited a build process that creates a filesystem-in-a-file using a bash script which goes, in ...


Go Back   Usenet Forums > Linux Forums > Linux General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 03-16-2005
S McAfee
 
Posts: n/a
Default Filesystems created identically, but have different sizes

I've inherited a build process that creates a filesystem-in-a-file using a
bash script which goes, in part, like this:

dd if=/dev/zero of=fsfile bs=1024 count=4096
mke2fs -F fsfile
mount -t ext2 -o loop fsfile /mnt/fs
cp -a /somedir/. /mnt/fs/.
umount /mnt/fs
sync
gzip -9 fsfile

The contents of /somedir remain the same across builds. What puzzles me is
that the resulting file "fsfile.gz" can have different sizes when produced
by successive builds. Right now I'm looking at two with sizes 1578454 and
1565838. They both expand into unzipped files of the same length, 4194304,
and when I mount them and compare them with "diff -r", no differences are
found.

Given all this, how can the two .gz files be different? Assuming gzip is
fully deterministic, the only possible discrepancy I can think of is in
some kind of filesystem metadata--a creation timestamp, perhaps? I'm not
very knowledgable in this area. It's difficult to conceive of a large
enough divergence in metadata to cause a difference of 12,616 bytes (almost
1%) in the final compressed files. Is there any good reason for what I'm
seeing?

Thanks in advance for any advice.


--
Sean McAfee -- etzwane@schwag.org
Reply With Quote
  #2 (permalink)  
Old 03-16-2005
Dances With Crows
 
Posts: n/a
Default Re: Filesystems created identically, but have different sizes

On Wed, 16 Mar 2005 19:50:19 -0000, S McAfee staggered into the Black
Sun and said:
> dd if=/dev/zero of=fsfile bs=1024 count=4096
> mke2fs -F fsfile
> mount -t ext2 -o loop fsfile /mnt/fs
> cp -a /somedir/. /mnt/fs/.
> umount /mnt/fs
> sync
> gzip -9 fsfile
>
> The contents of /somedir remain the same across builds. What puzzles
> me is that the resulting file "fsfile.gz" can have different sizes
> when produced by successive builds. When I mount them and compare
> them with "diff -r", no differences are found.
>
> Given all this, how can the two .gz files be different? Assuming gzip
> is fully deterministic, the only possible discrepancy I can think of
> is in some kind of filesystem metadata--a creation timestamp, perhaps?


Nope. Creation time is not stored on ext2. ctime, mtime, and atime,
however, are.

> It's difficult to conceive of a large enough divergence in metadata to
> cause a difference of 12,616 bytes


ctime, mtime, and atime are all 4 bytes. If there are 1000 files in
somedir, you've got 12,000 bytes of timestamps. You probably don't have
that many files, but there's also block allocation. Successive copies
of files into the loopback-mounted filesystem may have the same blocks
stored in different block groups, which would naturally cause gzip
differences. The UUID will also be different on each filesystem for
obvious reasons.

I tested this out with small loopback ext2 filesystems and mount and cp
and dumpe2fs and such, and found that a number of things on the
filesystem varied: UUID, creation time, mount time, write time, last
fsck, next fsck, directory hash, and free block numbers. The timestamps
on the files were variable as well. All of those things are probably
accounting for the differences in the sizes of the gzipped files. HTH,

--
Matt G|There is no Darkness in Eternity/But only Light too dim for us to see
Brainbench MVP for Linux Admin / mail: TRAP + SPAN don't belong
http://www.brainbench.com / Hire me!
-----------------------------/ http://crow202.dyndns.org/~mhgraham/resume
Reply With Quote
  #3 (permalink)  
Old 03-16-2005
S McAfee
 
Posts: n/a
Default Re: Filesystems created identically, but have different sizes

In article <slrnd3h570.gi0.danSPANceswitTRAPhcrows@samantha.c row202.dyndns.org>,
Dances With Crows <daSPANnceswithcroTRAPws@gmail.com> wrote:
>On Wed, 16 Mar 2005 19:50:19 -0000, S McAfee staggered into the Black
>Sun and said:
>> dd if=/dev/zero of=fsfile bs=1024 count=4096
>> mke2fs -F fsfile
>> mount -t ext2 -o loop fsfile /mnt/fs
>> cp -a /somedir/. /mnt/fs/.
>> umount /mnt/fs
>> sync
>> gzip -9 fsfile


>> Given all this, how can the two .gz files be different? Assuming gzip
>> is fully deterministic, the only possible discrepancy I can think of
>> is in some kind of filesystem metadata--a creation timestamp, perhaps?

>
>Nope. Creation time is not stored on ext2. ctime, mtime, and atime,
>however, are.


I meant metadata for the filesystem as a whole, not for its contents.

>> It's difficult to conceive of a large enough divergence in metadata to
>> cause a difference of 12,616 bytes


>ctime, mtime, and atime are all 4 bytes. If there are 1000 files in
>somedir, you've got 12,000 bytes of timestamps.


But "cp -a" preserves the timestamps.

>I tested this out with small loopback ext2 filesystems and mount and cp
>and dumpe2fs and such, and found that a number of things on the
>filesystem varied: UUID, creation time, mount time, write time, last
>fsck, next fsck, directory hash, and free block numbers. The timestamps
>on the files were variable as well. All of those things are probably
>accounting for the differences in the sizes of the gzipped files. HTH,


It sure does, thanks. I'll stop worrying now.


--
Sean McAfee -- etzwane@schwag.org
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 06:40 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0