Example: I have a book which I wanna archive. Would sending a zip with the pages take less storage than sending the, let’s say, 10 individual pages sparatedly?
Wouldn’t trying it out and seeing how much it saved be about the same amount of work as typing in this question?
The larger the file, the more patterns that can be compressed. Plus some of the dictionary overhead will be duplicated in multiple files.
The single file should be, at most, no larger than the sum of smaller files. But potentially much smaller.
I’m not quite sure what you’re asking.
ZIP, by default, is a compression tool. It takes multiple files, creates an index of the files within and then performs compression on all the files combined (to allow for a better dictionary). The index and dictionary are “overhead” that exists for each ZIP file.
Sending multiple files, uncompressed, or sending multiple ZIP files (one for each file) will almost certainly be less efficient.
Example: Book A I have all 10 pages of it in a jpg each.
Let’s say the size of all these 10 pages togheter is 300MB (not tech savy, don’t know if this is realistic).
If I put them on a zip, will the size be smaller? Like, reduce to 250MB or something?
For jpg’s, no they will not get smaller. Maybe even a smidge bigger if you zip them. Usually not enough to make a practical difference.
Zip does generic lossless compression, meaning it can be extracted for a bit-perfect copy of the original. Very simplified it works by finding patterns repeating and replacing a long pattern with a short key, and storing an index to replace the keys with the original pattern on extraction.
Jpg’s use lossy compression, meaning some detail is lost and can never be reproduced. Jpg is highly optimized to only drop details that don’t matter much for human perception of the image.
Since jpg is already compressed, there will not be any repeating patterns (duplicate information) for the zip algorithm to find.
For images it may be better but images are already compressed so there may not be a large saving in zipping them.
Alternative options would be to use more storage efficient formats like webp for instance.
I don’t know the details, but in principle, the zip compression process tries to identify the textual commonalities between the pages. The more commonalities the 10 pages have, the smaller the zip file will be.
If each page is textually very different, (example, Page 1 is “AB” , Page 2 is “CD”, etc.), it’s possible that the zip file will be larger. It’s because it will contain the full contents of each page, plus the metadata of the zip file.
Anyone more knowledgeable can correct me on this.
If the compressed files are of the same/similar format, more compression is possible as the algorithm can detect more related patterns to compress.
But if you toss in a variety of file formats, compression will tend to suffer more.
Sometimes, the easiest way is just to try and see, different formats lend themselves to better or worse compression.
The files that tend to be worst at compression are the ones that are already compressed themselves.
This depends on the file format used for the pages. If it’s plain txt zipping them will greatly decrease the file size. If you just scanned the pages and have them as jpg, png or pdf zipping them will not greatly decrease the file size. The size might still decrease a little bit or increase a little bit.
Taking less storage is almost the entire point of a zip file. It only takes more space than the original files in pathological cases (e.g. maybe if you’re trying to compress already-compressed data, like a video file).





