This is a discussion on Combining large files within the Linux Administration forums, part of the Linux Forums category; I have several 1M sized files (around 10000) of them that make up one large, 20G tar file. I would ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I have several 1M sized files (around 10000) of them that make up one large,
20G tar file. I would like to combine them all. So I started with a simple script to do a "cat" on each file and combine it with the next one in series. That process seems to work but it's EXTREMLY slow. In DOS, it's posbile to copy files as follows, where bulk of the work is done by the copy command itself: copy file1+file2+file3 new_file There is no need to concatenate individual files. On my board, concatenating a combined 300M file with another 1M, eg, takes about 5 minutes.. not great performance. So I am wondering if there is a utlity similar utility out there for Linux that supports the DOS copy feature. I am really trying to avoid writing a C program to accomplish this task :) TIA Salman |
|
|||
|
> cat file1 file2 file3 > new_file
If they're numbered sequentially, you could get away with: cat file* >new_file This assumes the names are 01 to 20 and NOT 1 to 20. Since the default sort is by ASCII sequence. Assuming an ASCII based platform and other defaults are in place. I join mpg and other files this way all the time. This also assumes the new_file name differs enough from the file? name that it doesn't get included in the cat portion. And that no other extraneous files get grabbed by the wild card. Otherwise DOS's: copy file1/b+file2/b+file3/b new_file under linux is roughly equal to: cat file1 file2 file3 >new_file You do NOT need to step it up like this: cat file1 file2 >new_file1 cat new_file1 file3 >new_file2 cat new_file2 file4 >new_file1 cat new_file1 file5 >new_file rm new_file1 new_file2 That would be very wasteful and slow. But I've known a few sadists in my day who enjoyed typing and would do it that way. One other limitation of sorts is that 32 bit processors are likely to limit the maximum size of files to 2.1G. So you may be limited in only forming your 20G file on a x86-64 or other 64+ bit platform. HTH, Shadow_7 |
|
|||
|
I had over 20G of data that I wanted to backup before I re-build my linux
machine. Due to disk space constraints, I tarred it all up and then compressed it using bzip2. It shrank down to around 11G file. Before storing the data file, I verified bzip2 integrity using bzip2 utility. There are other alternatives to doing a more reliable backup, but this process seemed pretty fast and simple. Everything was fine until I copied the data file back and tried decompressing it. bzip2 utility complained about CRC errors. Hence I used bzip2recover to recover undamanged bzip2 blocks. Uusually those blocks (depending on how they are intially set when creating bz2 compressed file) are 900K chunks. bzip2recover utility apparently receovered all 13500~ blocks, and stored them in 900K sized bz2-format files. So in order to get the orignal 20G tar file, I started unziping each compressed block and then combining the resulting data files together. That process is quite lenghty and taking a long time. "David Utidjian" <utidjian@nospamremarque.org> wrote in message news:pan.2003.06.28.19.37.00.971319.2168@nospamrem arque.org... > Interesting problem. > > Why is it useful for you to have 10,000 1M files rolled into one huge > file? > > -DU-...etc... |
|
|||
|
On Sat, 28 Jun 2003 21:23:17 -0400, Salman Moghal wrote:
> I had over 20G of data that I wanted to backup before I re-build my > linux machine. Due to disk space constraints, I tarred it all up and > then compressed it using bzip2. It shrank down to around 11G file. > Before storing the data file, I verified bzip2 integrity using bzip2 > utility. There are other alternatives to doing a more reliable backup, > but this process seemed pretty fast and simple. > > Everything was fine until I copied the data file back and tried > decompressing it. bzip2 utility complained about CRC errors. Hence I > used bzip2recover to recover undamanged bzip2 blocks. Uusually those > blocks (depending on how they are intially set when creating bz2 > compressed file) are 900K chunks. bzip2recover utility apparently > receovered all 13500~ blocks, and stored them in 900K sized bz2-format > files. > > So in order to get the orignal 20G tar file, I started unziping each > compressed block and then combining the resulting data files together. > That process is quite lenghty and taking a long time. Hmmmm... I think see your problem... it basically boils down to not having enough of the right kind of storage media and/or a solid tested plan for using it. I apologize in advance if some of what follows sounds harsh or uncaring but when it comes to solid backup plans, as you will learn, there is zero room for error... and by extension... very little room for kindness and understanding. With that said... I think I do understand the position you are in. I hope for your sake that your livelihood does not depend on the recovery of all these files. Are/were all these files located in a single subdirectory off of / ? Perhaps /home or /var? If a single subdirectory was it also a separate partition? If you had kept the data in its own subdir on its own partition you could have avoided the neccessity of moving it off of the current media in the first place. What version of bzip2 are you using? According to the manpage versions 1.0.1 and earlier have a limit of 512MBytes for file size. This restriction is removed with version 1.0.2. Not sure what the max is after that. Also according to the manpage the way to restore the original file after completing a successful bzip2recover is to do this: bzip2 -dc rec*file.bz2 > recovered_data Does that work? If so then, I guess you can untar the recovered_data file. In the future... you should consider a more robust backup plan. If your data is valuable to you and/or your employer then you should consider getting,at the very least, one or more backup disks so that the data can be mirrored. Even better... get a good tape backup system. I have had very good luck with DLT tapes and drives. I have had very bad luck with DAT/DDS tapes and drives. A DLT tape system can handle up to 40/80G of data and 200+G in the SuperDLT drives. Having a good (and tested) backup plan means never having to say you are sorry. -DU-...etc... |
![]() |
| Thread Tools | |
| Display Modes | |
|
|