0

I have a folder of roughly 600MB of images in a folder output_test. I then create 100MB chunks of tar.gz files using the following command:

tar -czf - output_test/ | split --bytes=100MB -d -a 3 - output_test.tar.gz.

which gives me the following files

-rw-rw-r-- 1 martin 96M Nov 13 17:12 output_test.tar.gz.000
-rw-rw-r-- 1 martin 96M Nov 13 17:12 output_test.tar.gz.001
-rw-rw-r-- 1 martin 96M Nov 13 17:12 output_test.tar.gz.002
-rw-rw-r-- 1 martin 96M Nov 13 17:12 output_test.tar.gz.003
-rw-rw-r-- 1 martin 96M Nov 13 17:12 output_test.tar.gz.004
-rw-rw-r-- 1 martin 96M Nov 13 17:12 output_test.tar.gz.005
-rw-rw-r-- 1 martin 26M Nov 13 17:12 output_test.tar.gz.006

Looks all good (although it seems total size hasn't been reduced?), but then when I try to un-tar one of the files

tar -xzf output_test.tar.gz.000

I get the following error

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

Does anyone know why this error happens?

Though it seems like the files are extracted fine. So, I'm not sure if I can safely ignore this error?

1
  • 1
    The last file inside your .000 archive is probably (almost certainly) split across .000 and .001 so you're not extracting it. You'll probably get an error if you try to extract from .001 since its header is not that of a zip archive. You need to cat all of the files together and pipe that to tar. The archive is not much smaller since most image formats are already compressed. Trying to compress them again might actually increase the file size.
    – doneal24
    Commented Nov 13, 2022 at 17:39

1 Answer 1

1

When you split output like this, the results are part of the main file; they're not complete themselves.

So if you only look at the ".000" file then you'll only be looking at thee first part of the output.

To recreate the "real" file you need to cat them together.

So you'd do something like:

cat output_test.tar.gz.* | tar xzf -
4
  • Thanks, indeed that fixed it for me! Do you know of a way to end up with complete workable splits? Commented Nov 13, 2022 at 18:15
  • @Martin You would have to make lists of each group of files of the approximate size you need to split, and fake the file numbering yourself. Split works on byte count or line count, but there is no way it can be aware of the tar file boundaries. As noted in another comment, most image formats (like jpeg) have integral compression, which is probably mire effective than generic gzip because they can take advantage of typical image features. Further attempts at compression just wastes CPU cycles. Commented Nov 13, 2022 at 22:34
  • What you're asking for isn't supported by "native" Unix tooling. You'd need a tool that would evaluate all the files to be archived, potentially optimise them in terms of size, and then create <n> independent archives. There's noting "out of box" that does this. I could see a scripted solution to this, and I'm sure someone somewhere has solved it. But I'm not sure it's a useful thing. Typically we archive things so we can restore them; cat * | tar... makes it easier to recover file1234.jpg. If it's split over 10 different files then it becomes harder to do the restore! Commented Nov 14, 2022 at 0:06
  • this not solve my problem
    – FABBRj
    Commented Dec 18, 2023 at 15:47

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .