ziplimit.txt 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257
  1. ziplimit.txt
  2. A1) Hard limits of the Zip archive format (without Zip64 extensions):
  3. Number of entries in Zip archive: 64 Ki (2^16 - 1 entries)
  4. Compressed size of archive entry: 4 GiByte (2^32 - 1 Bytes)
  5. Uncompressed size of entry: 4 GiByte (2^32 - 1 Bytes)
  6. Size of single-volume Zip archive: 4 GiByte (2^32 - 1 Bytes)
  7. Per-volume size of multi-volume archives: 4 GiByte (2^32 - 1 Bytes)
  8. Number of parts for multi-volume archives: 64 Ki (2^16 - 1 parts)
  9. Total size of multi-volume archive: 256 TiByte (4G * 64k)
  10. The number of archive entries and of multivolume parts are limited by
  11. the structure of the "end-of-central-directory" record, where the these
  12. numbers are stored in 2-Byte fields.
  13. Some Zip and/or UnZip implementations (for example Info-ZIP's) allow
  14. handling of archives with more than 64k entries. (The information
  15. from "number of entries" field in the "end-of-central-directory" record
  16. is not really neccessary to retrieve the contents of a Zip archive;
  17. it should rather be used for consistency checks.)
  18. Length of an archive entry name: 64 KiByte (2^16 - 1)
  19. Length of archive member comment: 64 KiByte (2^16 - 1)
  20. Total length of "extra field": 64 KiByte (2^16 - 1)
  21. Length of a single e.f. block: 64 KiByte (2^16 - 1)
  22. Length of archive comment: 64 KiByte (2^16 - 1)
  23. Additional limitation claimed by PKWARE:
  24. Size of local-header structure (fixed fields of 30 Bytes + filename
  25. local extra field): < 64 KiByte
  26. Size of central-directory structure (46 Bytes + filename +
  27. central extra field + member comment): < 64 KiByte
  28. A2) Hard limits of the Zip archive format with Zip64 extensions:
  29. In 2001, PKWARE has published version 4.5 of the Zip format specification
  30. (together with the release of PKZIP for Windows 4.5). This specification
  31. defines new extra field blocks that allow to break the size limits of the
  32. standard zipfile structures. This extended "Zip64" format enlarges the
  33. theoretical limits to the following values:
  34. Number of entries in Zip archive: 16 Ei (2^64 - 1 entries)
  35. Compressed size of archive entry: 16 EiByte (2^64 - 1 Bytes)
  36. Uncompressed size of entry: 16 EiByte (2^64 - 1 Bytes)
  37. Size of single-volume Zip archive: 16 EiByte (2^64 - 1 Bytes)
  38. Per-volume size of multi-volume archives: 16 EiByte (2^64 - 1 Bytes)
  39. Number of parts for multi-volume archives: 4 Gi (2^32 - 1 parts)
  40. Total size of multi-volume archive: 2^96 Byte (16 Ei * 4Gi)
  41. The Info-ZIP software releases (beginning with Zip 3.0 and UnZip 6.0)
  42. support Zip64 archives on selected environments (where the underlying
  43. operating system capabilities are sufficient, e.g. Unix, VMS and Win32).
  44. B) Implementation limits of UnZip:
  45. 1. Size limits caused by file I/O and decompression handling:
  46. a) Without "Zip64" and "LargeFile" extensions:
  47. Size of Zip archive: 2 GiByte (2^31 - 1 Bytes)
  48. Compressed size of archive entry: 2 GiByte (2^31 - 1 Bytes)
  49. b) With "Zip64" enabled and "LargeFile" supported:
  50. Size of Zip archive: 8 EiByte (2^63 - 1 Bytes)
  51. Compressed size of archive entry: 8 EiByte (2^63 - 1 Bytes)
  52. Uncompressed size of entry: 8 EiByte (2^63 - 1 Bytes)
  53. Note: On some systems, even UnZip without "LargeFile" extensions enabled
  54. may support archive sizes up to 4 GiByte. To get this support, the
  55. target environment has to meet the following requirements:
  56. a) The compiler's intrinsic "long" data types must be able to hold
  57. integer numbers of 2^32. In other words - the standard intrinsic
  58. integer types "long" and "unsigned long" have to be wider than
  59. 32 bit.
  60. b) The system has to supply a C runtime library that is compatible
  61. with the more-than-32-bit-wide "long int" type of condition a)
  62. c) The standard file positioning functions fseek(), ftell() (and/or
  63. the Unix style lseek() and tell() functions) have to be capable
  64. to move to absolute file offsets of up to 4 GiByte from the file
  65. start.
  66. On 32-bit CPU hardware, you generally cannot expect that a C compiler
  67. provides a "long int" type that is wider than 32-bit. So, many of the
  68. most popular systems (i386, PowerPC, 680x0, et. al) are out of luck.
  69. You may find environment that provide all requirements on systems
  70. with 64-bit CPU hardware. Examples might be Cray number crunchers,
  71. Compaq (former DEC) Alpha AXP machines, or Intel/AMD x64 computers.
  72. The number of Zip archive entries is unlimited. The "number-of-entries"
  73. field of the "end-of-central-dir" record is checked against the "number
  74. of entries found in the central directory" modulus 64k (2^16) (without
  75. Zip64 extension) or modulus 2^64 (with Zip64 extensions enabled for
  76. Zip64 archives).
  77. Multi-volume archive extraction is not (yet) supported.
  78. Memory requirements are mostly independent of the archive size
  79. and archive contents.
  80. In general, UnZip needs a fixed amount of internal buffer space
  81. plus the size to hold the complete information of the currently
  82. processed entry's local header. Here, a large extra field
  83. (could be up to 64 kByte) may exceed the available memory
  84. for MSDOS 16-bit executables (when they were compiled in small
  85. or medium memory model, with a fixed 64 KiByte limit on data space).
  86. The other exception where memory requirements scale with "larger"
  87. archives is the "restore directory attributes" feature. Here, the
  88. directory attributes info for each restored directory has to be held
  89. in memory until the whole archive has been processed. So, the amount
  90. of memory needed to keep this info scales with the number of restored
  91. directories and may cause memory problems when a lot of directories
  92. are restored in a single run.
  93. C) Implementation limits of the Zip executables:
  94. 1. Size limits caused by file I/O and compression handling:
  95. a) Without "Zip64" and "LargeFile" extensions:
  96. Size of Zip archive: 2 GiByte (2^31 - 1 Bytes)
  97. Compressed size of archive entry: 2 GiByte (2^31 - 1 Bytes)
  98. Uncompressed size of entry: 2 GiByte (2^31 - 1 Bytes),
  99. (could/should be 4 GiBytes...)
  100. b) With "Zip64" enabled and "LargeFile" supported:
  101. Size of Zip archive: 8 EiByte (2^63 - 1 Bytes)
  102. Compressed size of archive entry: 8 EiByte (2^63 - 1 Bytes)
  103. Uncompressed size of entry: 8 EiByte (2^63 - 1 Bytes)
  104. Multi-volume archive creation now supported in the form of split
  105. archives. Currently up to 99,999 splits are supported.
  106. 2. Limits caused by handling of archive contents lists
  107. 2.1. Number of archive entries (freshen, update, delete)
  108. a) 16-bit executable: 64k (2^16 -1) or 32k (2^15 - 1),
  109. (unsigned vs. signed type of size_t)
  110. a1) 16-bit executable: <16k ((2^16)/4)
  111. (The smaller limit a1) results from the array size limit of
  112. the "qsort()" function.)
  113. 32-bit executable: <1G ((2^32)/4)
  114. (usual system limit of the "qsort()" function on 32-bit systems)
  115. 64-bit executable: <2Ei ((2^64)/8)
  116. (theoretical limit of 64-bit flat memory model, the actual limit of
  117. currently available OS implementations is several orders of magnitude
  118. lower)
  119. b) stack space needed by qsort to sort list of archive entries
  120. NOTE: In the current executables, overflows of limits a) and b) are NOT
  121. checked!
  122. c) amount of free memory to hold "central directory information" of
  123. all archive entries; one entry needs:
  124. 128 bytes (Zip64), 96 bytes (32-bit) resp. 80 bytes (16-bit)
  125. + 3 * length of entry name
  126. + length of zip entry comment (when present)
  127. + length of extra field(s) (when present, e.g.: UT needs 9 bytes)
  128. + some bytes for book-keeping of memory allocation
  129. Conclusion:
  130. For systems with limited memory space (MSDOS, small AMIGAs, other
  131. environments without virtual memory), the number of archive entries
  132. is most often limited by condition c).
  133. For example, with approx. 100 kBytes of free memory after loading and
  134. initializing the program, a 16-bit DOS Zip cannot process more than 600
  135. to 1000 (+) archive entries. (For the 16-bit Windows DLL or the 16-bit
  136. OS/2 port, limit c) is less important because Windows or OS/2 executables
  137. are not restricted to the 1024k area of real mode memory. These 16-bit
  138. ports are limited by conditions a1) and b), say: at maximum approx.
  139. 16000 entries!)
  140. 2.2. Number of "new" entries (add operation)
  141. In addition to the restrictions above (2.1.), the following limits
  142. caused by the handling of the "new files" list apply:
  143. a) 16-bit executable: <16k ((2^64)/4)
  144. b) stack size required for "qsort" operation on "new entries" list.
  145. NOTE: In the current executables, the overflow checks for these limits
  146. are missing!
  147. c) amount of free memory to hold the directory info list for new entries;
  148. one entry needs:
  149. 32 bytes (Zip64), 24 bytes (32-bit) resp. 22 bytes (16-bit)
  150. + 3 * length of filename
  151. NOTE: For larger systems, the actual usability limits may be more
  152. performance issues (how long you want to wait) rather than available
  153. memory and other resources.
  154. D) Some technical remarks:
  155. 1. For executables without support for "Zip64" archives and "LargeFile"
  156. I/O extensions, the 2GiByte size limit on archive files is a consequence
  157. of the portable C implementation used for the Info-ZIP programs.
  158. Zip archive processing requires random access to the archive file for
  159. jumping between different parts of the archive's structure.
  160. In standard C, this is done via stdio functions fseek()/ftell() resp.
  161. unix-io functions lseek()/tell(). In many (most?) C implementations,
  162. these functions use "signed long" variables to hold offset pointers
  163. into sequential files. In most cases, this is a signed 32-bit number,
  164. which is limited to ca. 2E+09. There may be specific C runtime library
  165. implementations that interpret the offset numbers as unsigned, but for
  166. us, this is not reliable in the context of portable programming.
  167. 2. Similarly, for executables without "Zip64" and "LargeFile" support,
  168. the 2GiByte limit on the size of a single compressed archive member
  169. is again a consequence of the implementation in C.
  170. The variables used internally to count the size of the compressed
  171. data stream are of type "long", which is guaranted to be at least
  172. 32-bit wide on all supported environments.
  173. But, why do we use "signed" long and not "unsigned long"?
  174. Throughout the I/O handling of the compressed data stream, the sign bit
  175. of the "long" numbers is (mis-)used as a kind of overflow detection.
  176. In the end, this is caused by the fact that standard C lacks any
  177. overflow checking on integer arithmetics and does not support access
  178. to the underlying hardware's overflow detection (the status bits,
  179. especially "carry" and "overflow" of the CPU's flags-register) in a
  180. system-independent manner.
  181. So, we "misuse" the most-significant bit of the compressed data size
  182. counters as carry bit for efficient overflow/underflow detection. We
  183. could change the code to a different method of overflow detection, by
  184. using a bunch of "sanity" comparisons (kind of "is the calculated result
  185. plausible when compared with the operands"). But, this would "blow up"
  186. the code of the "inner loop", with remarkable loss of processing speed.
  187. Or, we could reduce the amount of consistency checks of the compressed
  188. data (e.g. detection of premature end of stream) to an absolute minimum,
  189. at the cost of the programs' stability when processing corrupted data.
  190. 3. The argumentation above is somewhat out-dated. Beginning with the
  191. releases of Zip 3 and UnZip 6, Info-ZIP programs support archive
  192. sizes larger than 4GiB on systems where the required underlying
  193. support for 64-bit file offsets and file sizes is available from
  194. the OS (and the C runtime environment).
  195. For executables with support for "Zip64" archive format and "LargeFile"
  196. extension, the I/O limits are lifted by applying extended 64-bit off_t
  197. file offsets. All limits discussed above are then based on integer
  198. sizes of 64 bits instead of 32, this should allow to handle file and
  199. archive sizes up to the limits of manufacturable hardware for the
  200. foreseeable future. The reduction of the theoretical limits from
  201. (2^64 - 1) to (2^63 - 1) because of the throughout use of signed
  202. numbers can be neglected with the currently imaginable hardware.
  203. However, this new support partially breaks compatibility with older
  204. "legacy" systems. And it should be noted that the portability and
  205. readability of the UnZip and Zip code has suffered somehow caused
  206. by the extensive use of non-standard language extension needed for
  207. 64-bit support on the major target systems.
  208. Please report any problems to: Zip-Bugs at www.info-zip.org
  209. Last updated: 25 May 2008, Ed Gordon
  210. 02 January 2009, Christian Spieler