T O P
zebediah49

Interesting. The 4GB limit is a bit of an issue for practical use though -- the primary case I'd want this is when you're looking to pull something out of a TB-class tar file. In a situation like that, the random access nature of the application could be a major benefit. (Yes, I know that I should be using a random access data format for that size of archive. I do, but sometimes you come across a file where someone else didn't)


TheWheez

Quite a justified request for a program meant specifically for random access of archived files


xkcd__386

> The 4GB limit is a bit of an issue for practical use though oh I had missed that. I wonder if that's a limit on the archive size, or on each individual file within (an unlimited size) archive if the archive is less than 4 GB I may not really care about random access, and then tar+zstd beats the hell out of most things I have seen


wviana

What is a random access data format? Are they common?


zebediah49

Basically, the format needs an index that lists all the things in the archive, and maintains the offsets to them. It's relatively uncommon for files to work like this -- rather than just "write the data", you need to go through it multiple times, so that you know what goes where upfront. So, tar, for example, is just a header of "file foo/bar.txt is a 78 byte text file", followed by its content. Then it gets to the next file's info, and so on. Then you can compress the whole thing or whatever. However, if you want to read the contents of "foo/test.dat", you more or less need to scan the file from beginning to end, find the file you want, and then output it. An example that maintains that random access is squashfs: it works much more like a conventional filesystem, with a tree of basically-inodes, compressed data, and pointers to the location in the compressed archive with the file data. So now when you want to read "foo/test.dat", you look up foo, which directs you to test.dat, which directs you to the location in the archive with that data.


xkcd__386

impressive; your numbers are so small that your hyperfine shows wild variation (17 +/- 13) in the report but the output size matters too, IMO; would be curious what you have I've switched to squashfs for archival and transport because, apart from being faster than unzip for random access, *I can mount the whole thing* when I need it. I do this with archives from old projects, where I need the archive to be read-only but searchable -- I make the sqfs file system after I've added a full "recoll" index (and including the index) so I can just mount it and search. Happens rarely but boy when it does this is a huge convenience!