vp3-format.txt 46 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306
  1. VP3 Bitstream Format and Decoding Process
  2. by Mike Melanson (mike at multimedia.cx)
  3. v0.5: December 8, 2004
  4. [December 8, 2004: Note that this document is not complete and likely
  5. will never be completed. However, it helped form the basis of Theora I
  6. specification available at
  7. http://www.theora.org/doc/Theora_I_spec.pdf ]
  8. Contents
  9. --------
  10. * Introduction
  11. * Underlying Coding Concepts
  12. * VP3 Coding Overview
  13. * VP3 Chunk Format
  14. * Decoding The Frame Header
  15. * Initializing The Quantization Matrices
  16. * Hilbert Coding Pattern
  17. * Unpacking The Block Coding Information
  18. * Unpacking The Macroblock Coding Mode Information
  19. * Unpacking The Macroblock Motion Vectors
  20. * Unpacking The DCT Coefficients
  21. * Reversing The DC Prediction
  22. * Reconstructing The Frame
  23. * Theora Specification
  24. * Appendix A: Quantization Matrices And Scale Factors
  25. * Appendix B: Macroblock Coding Mode Alphabets
  26. * Appendix C: DCT Coefficient VLC Tables
  27. * Appendix D: The VP3 IDCT
  28. * Acknowledgements
  29. * References
  30. * Changelog
  31. Introduction
  32. ------------
  33. A company named On2 (http://www.on2.com) created a video codec named
  34. VP3. Eventually, they decided to open source it. Like any body of code
  35. that was produced on a deadline, the source code was not particularly
  36. clean or well-documented. This makes it difficult to understand the
  37. fundamental operation of the codec.
  38. This document describes the VP3 bitstream format and decoding process at
  39. a higher level than source code.
  40. Underlying Coding Concepts
  41. --------------------------
  42. In order to understand the VP3 coding method it is necessary to
  43. understand the individual steps in the process. Like many multimedia
  44. compression algorithms VP3 does not consist of a single coding method.
  45. Rather, it uses a chain of methods to achieve compression.
  46. If you are acquainted with the MPEG video clique then many of VP3's
  47. coding concepts should look familiar as well. What follows is a list of
  48. the coding methods used in VP3 and a brief description of each.
  49. * Discrete Cosine Transform (DCT): This is a magical mathematical
  50. function that takes a group of numbers and turns it into another group
  51. of numbers. The transformed group of numbers exhibits some curious
  52. properties. Notably, larger numbers are concentrated in certain areas of
  53. the transformed group.
  54. A video codec like VP3 often operates on 8x8 blocks of numbers. When
  55. these 8x8 blocks are transformed using a DCT the larger numbers occur
  56. mostly in the up and left areas of the block with the largest number
  57. occurring as the first in the block (up-left corner). This number is
  58. called the DC coefficient. The other 63 numbers are called the AC
  59. coefficients.
  60. The DCT and its opposite operation, the inverse DCT, require a lot of
  61. multiplications. Much research and experimentation is focused of
  62. optimizing this phase of the coding/decoding process.
  63. * Quantization: This coding step tosses out information by essentially
  64. dividing a number to be coded by a factor and throwing away the
  65. remainder. The inverse process (dequantization) involves multiplying by
  66. the same factor to obtain a number that is close enough to the original.
  67. * Run Length Encoding (RLE): The concept behind RLE is to shorten runs
  68. of numbers that are the same. For example, the string "88888" is encoded
  69. as (5, 8), indicating a run of 5 '8' numbers. In VP3 (and MPEG/JPEG),
  70. RLE is used to record the number of zero-value coefficients that occur
  71. before a non-zero coefficient. For example:
  72. 0 0 0 0 5 0 2 0 0 0 9
  73. is encoded as:
  74. (4, 5), (1, 2), (3, 9)
  75. This indicates that a run of 4 zeroes is followed by a coefficient of 5;
  76. then a run of 1 zero is followed by 2; then a run of 3 zeroes is
  77. followed by 9.
  78. * Zigzag Ordering: After transforming and quantizing a block of samples,
  79. the samples are not in an optimal order for run length encoding. Zigzag
  80. ordering rearranges the samples to put more zeros between non-zero
  81. samples.
  82. * Differential (or Delta) Pulse Code Modulation (DPCM): 1 + 1 = 2. Got
  83. that? Seriously, that is what DPCM means. Rather than encoding absolute
  84. values, encode the differences between successive values. For example:
  85. 82 84 81 80 86 88 85
  86. Can be delta-encoded as:
  87. 82 +2 -3 -1 +6 +2 -3
  88. Most of the numbers turn into smaller numbers which require less
  89. information to encode.
  90. * Motion Compensation: Simply, this coding method specifies that a block
  91. from a certain position in the previous frame is to be copied into a new
  92. position in the current frame. This technique is often combined with DCT
  93. and DPCM coding, as well as fractional pixel motion.
  94. * Entropy Coding (a.k.a. Huffman Coding): This is the process of coding
  95. frequently occurring symbols with fewer bits than symbols that are not
  96. likely to occur as frequently.
  97. * Variable Length Run Length Booleans: An initial Boolean bit is
  98. extracted from the bitstream. A variable length code (VLC) is extracted
  99. from the bitstream and converted to a count. This count indicates that
  100. the next (count) elements are to be set to the Boolean value.
  101. Afterwards, the Boolean value is toggled, the next VLC is extracted and
  102. converted to a count, and the process continues until all elements are
  103. set to either 0 or 1.
  104. * YUV Colorspace: Like many modern video codecs, VP3 operates on a YUV
  105. colorspace rather than a RGB colorspace. Specifically, VP3 uses YUV
  106. 4:2:0, alias YUV420P, YV12. Note: Throughout the course of this
  107. document, the U and V planes (a.k.a., Cb and Cr planes) will be
  108. collectively referred to as C planes (color or chrominance planes).
  109. * Frame Types: VP3 has intra-coded frames, a.k.a. intraframes, I-frames,
  110. or keyframes. VP3 happens to call these golden frames. VP3 has
  111. interframes, a.k.a. predicted frames or P-frames. These frames can use
  112. information from either the previous interframe or from the previous
  113. golden frame.
  114. VP3 Overview
  115. ------------
  116. The first thing to understand about the VP3 coding method is that it
  117. encodes all 3 planes upside down. That is, the data is encoded from
  118. bottom-to-top rather than top-to-bottom as is done with many video
  119. codecs.
  120. VP3 codes a video frame by first breaking each of the 3 planes (Y, U,
  121. and V) into a series of 8x8 blocks called fragments. VP3 also has a
  122. notion of superblocks. Superblocks encapsulate 16 fragments arranged in
  123. a 4x4 matrix. Each plane has its own set of superblocks. Further, VP3
  124. also uses the notion of macroblocks which is the same as that found in
  125. JPEG/MPEG. One macroblock encompasses 4 blocks from the Y plane arranged
  126. in a 2x2 matrix, 1 block from the U plane, and 1 block from the V plane.
  127. While a fragment or a superblock applies to 1 and only 1 plane, a
  128. macroblock extends over all 3 planes.
  129. VP3 compresses golden frames by transforming each fragment with a
  130. discrete cosine transform. Each transformed sample is then quantized and
  131. the DC coefficient is reduced via DPCM using a combination of DC
  132. coefficients from surrounding fragments as predictors. Then, each
  133. fragment's DC coefficient is entropy-coded in the output bitstream,
  134. followed by each fragment's first AC coefficient, then each second AC
  135. coefficient, and so on.
  136. An interframe, naturally, is more complicated. While there is only one
  137. coding mode available for a golden frame (intra coding), there are 8
  138. coding modes that the VP3 coder can choose from for interframe
  139. macroblocks. Intra coding as seen in the keyframe is still available.
  140. The rest of the modes involve encoding a fragment diff, either from the
  141. previous frame or the golden frame, from the same coordinate or from the
  142. same coordinate plus a motion vector. All of the macroblock coding modes
  143. and motion vectors are encoded in an interframe bitstream.
  144. VP3 Chunk Format
  145. ----------------
  146. The high-level format of a compressed VP3 frame is laid out as:
  147. * chunk header
  148. * block coding information
  149. * macroblock coding mode information
  150. * motion vectors
  151. * DC coefficients
  152. * 1st AC coefficients
  153. * 2nd AC coefficients
  154. * ...
  155. * 63rd AC coefficients
  156. Decoding The Frame Header
  157. -------------------------
  158. The chunk header always contains at least 1 byte which has the following
  159. format:
  160. bit 7: 0 = golden frame, 1 = interframe
  161. bit 6: unused
  162. bits 5-0: Quality index (0..63)
  163. Further, if the frame is a golden frame, there are 2 more bytes in the
  164. header:
  165. byte 0: version byte 0
  166. byte 1:
  167. bits 7-3: VP3 version number (stored)
  168. bit 2: key frame coding method (0 = DCT key frame, only type
  169. supported)
  170. bits 1-0: unused, spare bits
  171. All frame headers are encoded with a quality index. This 6-bit value is
  172. used to index into 2 dequantizer scaling tables, 1 for DC values and 1
  173. for AC values. Each of the 3 dequantization tables is modified per these
  174. scaling values.
  175. Initializing The Quantization Matrices
  176. --------------------------------------
  177. VP3 has three static matrices for quantizing and dequantizing fragments.
  178. One matrix is for quantizing golden frame Y fragments, one matrix is for
  179. quantizing golden frame C fragments, and one matrix is for quantizing both
  180. golden frame and interframe Y or C fragments. While these matrices are
  181. static, they are adjusted according to quality index coded in the header.
  182. The quality index is an index into 2 64-element tables:
  183. dc_scale_factor[] and ac_scale_factor[]. Each quantization factor from
  184. each of the three quantization matrices is adjusted by the appropriate
  185. scale factor according to this formula:
  186. base quantizer * scale factor
  187. quantizer = -----------------------------
  188. 100
  189. where scale factor =
  190. dc_scale_factor[quality_index] for DC dequantizer
  191. ac_scale_factor[quality_index] for AC dequantizer
  192. The quantization matrices need to be recalculated at the beginning of a
  193. frame decode if the current frame's quality index is different from the
  194. previous frame's quality index.
  195. See Appendix A for the complete VP3 quantization matrices and scale factor
  196. tables.
  197. As an example, this is the base quantization matrix for golden frame Y
  198. fragments:
  199. 16 11 10 16 24 40 51 61
  200. 12 12 14 19 26 58 60 55
  201. 14 13 16 24 40 57 69 56
  202. 14 17 22 29 51 87 80 62
  203. 18 22 37 58 68 109 103 77
  204. 24 35 55 64 81 104 113 92
  205. 49 64 78 87 103 121 120 101
  206. 72 92 95 98 112 100 103 99
  207. If a particular coded frame specifies a quality index of 54. Element 54
  208. of the dc_scale_factor table is 20, thus:
  209. 16 * 20
  210. DC coefficient quantizer = ------- = 3
  211. 100
  212. Element 54 of the ac_scale_factor table is 24. The AC coefficient
  213. quantizers are each scaled using this factor, e.g.:
  214. 11 * 24
  215. ------- = 2
  216. 100
  217. 100 * 24
  218. -------- = 24
  219. 100
  220. [not complete; still need to explain how these quantizers are saturated
  221. and scaled with respect to the DCT process]
  222. Hilbert Coding Pattern
  223. ----------------------
  224. VP3 uses a Hilbert pattern to code fragments within a superblock. A
  225. Hilbert pattern is a recursive pattern that can grow quite complicated.
  226. The coding pattern that VP3 uses is restricted to this pattern subset,
  227. where each fragment in a superblock is represented by a 'X':
  228. X -> X X -> X
  229. | ^
  230. v |
  231. X <- X X <- X
  232. | ^
  233. v |
  234. X X -> X X
  235. | ^ | ^
  236. v | v |
  237. X -> X X -> X
  238. As an example of this pattern, consider a plane that is 256 samples wide
  239. and 64 samples high. Each fragment row will be 32 fragments wide. The
  240. first superblock in the plane will be comprised of these 16 fragments:
  241. 0 1 2 3 ... 31
  242. 32 33 34 35 ... 63
  243. 64 65 66 67 ... 95
  244. 96 97 98 99 ... 127
  245. The order in which these 16 fragments are coded is:
  246. 0 | 0 1 14 15
  247. 32 | 3 2 13 12
  248. 64 | 4 7 8 11
  249. 96 | 5 6 9 10
  250. All of the image coding information, including the block coding status
  251. and modes, the motion vectors, and the DCT coefficients, are all coded
  252. and decoded using this pattern. Thus, it is rather critical to have the
  253. pattern and all of its corner cases handled correctly. In the above
  254. example, if the bottom row and left column were not present due to the
  255. superblock being in a corner, the pattern proceeds as if the missing
  256. fragments were present, but the missing fragments are omitted in the
  257. final coding list. The coding order would be:
  258. 0, 1, 2, 3, 4, 7, 8, 13, 14
  259. Unpacking The Block Coding Information
  260. --------------------------------------
  261. After unpacking the frame header, the decoder unpacks the block coding
  262. information. The only information determined in this phase is whether a
  263. particular superblock and its fragments are coded in the current frame
  264. or unchanged from the previous frame. The actual coding method is
  265. determined in the next phase.
  266. If the frame is a golden frame then every superblock, macroblock, and
  267. fragment is marked as coded.
  268. If the frame is an interframe, then the block coding information must be
  269. decoded. This is the phase where a decoder will build a list of coded
  270. fragments for which coding mode, motion vector, and DCT coefficient data
  271. must be decoded.
  272. First, a list of partially-coded superblocks is unpacked from the
  273. stream. This list is coded as a series of variable-length run length
  274. codes (VLRLC). First, the code is initialized by reading the next bit in
  275. the stream. Then, while there are still superblocks remaining in the
  276. list, fetch a VLC from the stream according to this table:
  277. Codeword Run Length
  278. 0 1
  279. 10x 2-3
  280. 110x 4-5
  281. 1110xx 6-9
  282. 11110xxx 10-17
  283. 111110xxxx 18-33
  284. 111111xxxxxxxxxxxx 34-4129
  285. For example, a VLC of 1101 represents a run length of 5. If the VLRLC
  286. was initialized to 1, then the next 5 superblocks would be set to 1,
  287. indicating that they are partially coded in the current frame. Then the
  288. bit value is toggled to 0, another VLC is fetched from the stream and
  289. the process continues until each superblock has been marked either
  290. partially coded (1) or not (0).
  291. If any of the superblocks were marked as not partially coded in the
  292. previous step, then a list of fully-coded superblocks is unpacked next
  293. using the same VLRLC as the list of partially-coded superblocks.
  294. Initialize the VLRLC with the next bit in the stream. For each
  295. superblock that was not marked as partially coded, mark it with either a
  296. 0 or 1 according to the current VLRLC. By the end of this step, each
  297. superblock will be marked as either not coded, partially coded, or fully
  298. coded.
  299. Let's work through an example with an image frame that is 256x64 pixels.
  300. This means that the Y plane contains 4x2 superblocks and each of the C
  301. planes contains 2 superblocks each. The superblocks are numbered as
  302. follows:
  303. Y: 0 1 2 3 U: 8 9
  304. 4 5 6 7 V: 10 11
  305. This is the state of the bitstream:
  306. 1100011001101
  307. Which is interpreted as:
  308. initial 2 1's 1 0 4 1's 5 0's
  309. 1 100 0 1100 1101
  310. Superblocks 0-1 and 3-6 are marked as partially coded. Since there were
  311. blocks that were not marked, proceed to unpack the list of fully-coded
  312. superblocks. This is the state of the bitstream:
  313. 1101101
  314. Which is interpreted as:
  315. initial 3 1's 3 0's
  316. 1 101 100
  317. Superblocks 2, 7, and 8 are marked as fully coded while superblocks 9,
  318. 10, and 11 are marked as not coded.
  319. If any of the superblocks were marked as partially coded, the next data
  320. in the bitstream will define which fragments inside each partially-coded
  321. superblock are coded. This is the first place where the Hilbert pattern
  322. comes into play.
  323. For each partially-coded superblock, iterate through each fragment
  324. according to the Hilbert pattern. Use the VLRLC method, only with a
  325. different table, to determine which fragments are coded. The VLRLC table
  326. for fragment coding runs is:
  327. Codeword Run Length
  328. 0x 1-2
  329. 10x 3-4
  330. 110x 5-6
  331. 1110xx 7-10
  332. 11110xx 11-14
  333. 11111xxxx 15-30
  334. Continuing with the contrived example, superblocks 0 and 1 are both
  335. partially coded. This is the state of the bitstream:
  336. 0011001111010001111010...(not complete)
  337. Which is interpreted as:
  338. initial 2 0's 3 1's 13 0's 1 1 13 0's
  339. 0 01 100 1111010 00 1111010 ...
  340. This indicates that fragments 2-4 in superblock 0 are coded, while
  341. fragments 0, 1, and 5-15 are not. Note that the run of 12 0's cascades
  342. over into the next fragment, indicating that fragment 0 of superblock 1
  343. is not coded. Fragment 1 of superblock 1 is coded, while the rest of the
  344. superblock's fragments are not coded. The example ends there (a real
  345. bitstream should have enough data to describe all of the partially-coded
  346. superblocks). Superblock 2 is fully coded which means all 16 fragments
  347. are coded. Thus, superblocks 0-2 have the following coded fragments:
  348. 0 | x x x x x x x x 0 1 14 15
  349. 32 | 3 2 x x x 2 x x 3 2 13 12
  350. 64 | 4 x x x x x x x 4 7 8 11
  351. 96 | x x x x x x x x 5 6 9 10
  352. This is a good place to generate the list of coded fragment numbers for
  353. this frame. In this case, the list will begin as:
  354. 33 32 64 37 8 9 41 40 72 104 105 73 ...
  355. and so on through the remaining 8 fragments of superblock 2 and onto the
  356. fragments for the remaining superblocks that are either fully or
  357. partially coded.
  358. Unpacking The Macroblock Coding Mode Information
  359. ------------------------------------------------
  360. After unpacking the block coding information, the decoder unpacks the
  361. macroblock coding mode information. This process is simple when
  362. decoding a golden frame-- since the only possible decoding mode is INTRA,
  363. no macroblock coding mode information is transmitted. However, in an
  364. interframe, each coded macroblock is encoded with one of 8 methods:
  365. 0, INTER_NO_MV:
  366. current fragment =
  367. (fragment from previous frame @ same coordinates) +
  368. (DCT-encoded residual)
  369. 1, INTRA:
  370. current fragment = DCT-encoded block, just like in a golden frame
  371. 2, INTER_PLUS_MV:
  372. current fragment =
  373. (fragment from previous frame @ (same coords + motion vector)) +
  374. (DCT-encoded residual)
  375. 3, INTER_LAST_MV:
  376. same as INTER_PLUS_MV but using the last motion vector decoded from
  377. the bitstream
  378. 4, INTER_PRIOR_LAST;
  379. same as INTER_PLUS_MV but using the second-to-last motion vector
  380. decoded from the bitstream
  381. 5, USING_GOLDEN:
  382. same as INTER_NO_MV but referencing the golden frame instead of
  383. previous interframe
  384. 6, GOLDEN_MV:
  385. same as INTER_PLUS_MV but referencing the golden frame instead of
  386. previous interframe
  387. 7, INTER_FOURMV:
  388. same as INTER_PLUS_MV except that each of the 4 Y fragments gets its
  389. own motion vector, and the U and V fragments share the same motion
  390. vector which is the average of the 4 Y fragment vectors
  391. The MB coding mode information is encoded using one of 8 alphabets. The
  392. first 3 bits of the MB coding mode stream indicate which of the 8
  393. alphabets, 0..7, to use to decode the MB coding information in this frame.
  394. The reason for the different alphabets is to minimize the number of bits
  395. needed to encode this section of information. Each alphabet arranges the
  396. coding modes in a different order, indexing the 8 modes into 8 index
  397. slots. Index 0 is encoded with 1 bit (0), index 1 is encoded with 2 bits
  398. (10), index 2 is encoded with 3 bits (110), and so on up to indices 6 and
  399. 7 which are encoded with 6 bits each (1111110 and 1111111, respectively):
  400. index encoding
  401. ----- --------
  402. 0 0
  403. 1 10
  404. 2 110
  405. 3 1110
  406. 4 11110
  407. 5 111110
  408. 6 1111110
  409. 7 1111111
  410. For example, the coding modes are arranged in alphabet 1 as follows:
  411. index coding mode
  412. ----- -----------
  413. 0 MODE_INTER_LAST_MV
  414. 1 MODE_INTER_PRIOR_LAST
  415. 2 MODE_INTER_PLUS_MV
  416. 3 MODE_INTER_NO_MV
  417. 4 MODE_INTRA
  418. 5 MODE_USING_GOLDEN,
  419. 6 MODE_GOLDEN_MV
  420. 7 MODE_INTER_FOURMV
  421. This alphabet arrangement is designed for frames in which motion vectors
  422. based off of the previous interframe dominate.
  423. When unpacking MB coding mode information for a frame, the decoder first
  424. reads 3 bits from the stream to determine the alphabet. In this example,
  425. the 3 bits would be 001 to indicate alphabet 1. Consider this contrived
  426. bitstream following the alphabet number:
  427. 1010000011000011111110...
  428. The bits are read as follows:
  429. 10 10 0 0 0 0 110 0 0 0 1111111 0
  430. index: 1 1 0 0 0 0 2 0 0 0 7 0
  431. This arrangement of indices translates to this series of coding modes:
  432. index coding mode
  433. ----- -----------
  434. 1 MODE_INTER_PRIOR_LAST
  435. 1 MODE_INTER_PRIOR_LAST
  436. 0 MODE_INTER_LAST_MV
  437. 0 MODE_INTER_LAST_MV
  438. 0 MODE_INTER_LAST_MV
  439. 0 MODE_INTER_LAST_MV
  440. 2 MODE_INTER_PLUS_MV
  441. 0 MODE_INTER_LAST_MV
  442. 0 MODE_INTER_LAST_MV
  443. 0 MODE_INTER_LAST_MV
  444. 7 MODE_INTER_FOURMV
  445. 0 MODE_INTER_LAST_MV
  446. There are 6 pre-defined alphabets. Consult Appendix B for the complete
  447. alphabets. What happens if none of the 6 pre-defined alphabets fit? The
  448. VP3 encoder can choose to use alphabet 0 which indicates a custom
  449. alphabet. The 3-bit coding mode numbers for each index, 0..7, are stored
  450. after the alphabet number in the bitstream. For example, the sequence:
  451. 000 111 110 101 100 011 010 001 000
  452. would indicate coding alphabet 0 (custom alphabet), index 0 corresponds to
  453. coding mode 7 (INTER_FOURMV), index 1 corresponds to coding mode 6
  454. (GOLDEN_MV), and so on down to index 7 which would correspond to coding
  455. mode 0 (INTER_NO_MV).
  456. There is one more possible alphabet: Alphabet 7. This alphabet is
  457. reserved for when there is such a mixture of coding modes used in a frame
  458. that using any variable-length coding mode would result in more bits than
  459. a fixed-length representation. When alphabet 7 is specified, the decoder
  460. reads 3 bits at a time from the bitstream, and uses those directly as the
  461. macroblock coding modes.
  462. To recap, this is the general algorithm for decoding macroblock coding
  463. mode information:
  464. if (golden frame)
  465. all frames are intracoded, there is no MB coding mode information
  466. else
  467. read 3 bits from bitstream to determine alphabet
  468. if alphabet = 0
  469. this is a custom alphabet, populate index table with 8 3-bit coding
  470. modes read from bitstream
  471. foreach coded macroblock, unpack a coding mode:
  472. if alphabet = 7
  473. read 3 bits from the bitstream as the coding mode for the
  474. macroblock
  475. else
  476. read a VLC from the bitstream
  477. use the decoded VLC value to index into the coding mode alphabet
  478. selected for this frame and assign the indexed coding mode to
  479. this macroblock
  480. Unpacking The Macroblock Motion Vectors
  481. ---------------------------------------
  482. After unpacking the macroblock coding mode information, the decoder
  483. unpacks the macroblock motion vectors. This phase essentially assigns a
  484. motion vector to each of the 6 constituent fragments of any coded
  485. macroblock that requires motion vectors.
  486. If the frame is a golden frame then there is no motion compensation and
  487. no motion vectors are encoded in the bitstream.
  488. If the frame is an interframe, the next bit is read from the bitstream
  489. to determine the vector entropy coding method used. If the coding method
  490. is zero then all of the vectors will be unpacked using a VLC method. If
  491. the coding method is 1 then all of the vectors will be unpacked using a
  492. fixed length method.
  493. The VLC unpacking method reads 3 bits from the bitstream. These 3 bits
  494. comprise a number ranging from 0..7 which indicate the next action:
  495. 0, MV component = 0
  496. 1, MV component = 1
  497. 2, MV component = -1
  498. 3, MV component = 2, read next bit for sign
  499. 4, MV component = 3, read next bit for sign
  500. 5, MV component = 4 + (read next 2 bits), read next bit for sign
  501. range: (4..7, -4..-7)
  502. 6, MV component = 8 + (read next 3 bits), read next bit for sign
  503. range: (8..15, -8..-15)
  504. 7, MV component = 16 + (read next 4 bits), read next bit for sign
  505. range: (16..31, -16..-31)
  506. The fixed length vector unpacking method simply reads the next 5 bits
  507. from the bitstream, reads the next bit for sign, and calls the whole
  508. thing a motion vector component. This gives a range of (-31..31), which
  509. is the same range as the VLC method.
  510. For example, consider the following contrived motion vector bitstream:
  511. 000001011011111000...
  512. The stream is read as:
  513. 0 (000 010) (110 111 1 100 0)
  514. The first bit indicates the entropy method which, in this example, is
  515. variable length as opposed to fixed length. The next 3 bits are 0 which
  516. indicate a X MV component of 0. The next 3 bits are 2 which indicate a Y
  517. MV component of -1. The first motion vector encoded in this stream is
  518. (0, -1). The next 3 bits are 6 which indicate 8 + next 3 bits (7) with
  519. another bit indicating sign (1 in this case, which is negative). Thus,
  520. the X MV component is -15. The next 3 bits are 4 which indicate a Y MV
  521. component of 3 with one more bit for the sign (0 is positive). So the
  522. second motion vector encoded in this stream is (-15, 3).
  523. As an example of the fixed-length entropy method, consider the following
  524. contrived bitstream:
  525. 1010101101010...
  526. The stream is read as:
  527. 1 01010 1 10101 0
  528. The first bit indicates the fixed length entropy method. The first 5 bits
  529. are 10 followed by a negative sign bit. The next 5 bits are 21 followed by
  530. a positive sign bit. The first motion vector in this stream is (-10, 21).
  531. During this phase of the decoding process, it is traditional to assign all
  532. motion vectors for all coded macroblocks that require them, whether they
  533. are unpacked from the motion vector bitstream or copied from previous
  534. coded macroblocks. It is necessary to track the motion vectors for both
  535. the previous macroblock as well as the next-to-last (prior) macroblock.
  536. The general algorithm for this phase is as follows:
  537. foreach coded macroblock
  538. last MV = 0
  539. prior last MV = 0
  540. if coding mode = MODE_INTER_PLUS_MV or MODE_GOLDEN_MV
  541. read current MV pair from the bitstream and set all fragment motion
  542. vectors to that pair
  543. prior last MV = last MV
  544. last MV = current MV
  545. if coding mode = MODE_INTER_FOURMV
  546. read MV for first Y fragment in macroblock
  547. read MV for second Y fragment in macroblock
  548. read MV for third Y fragment in macroblock
  549. read MV for fourth Y fragment in macroblock
  550. set U & V fragment motion vectors to average of 4 Y vectors,
  551. calculated as follows:
  552. if sum of all 4 X motion components is positive, the X
  553. motion component for the U & V fragments is (sum + 2) / 4,
  554. otherwise, it is (sum - 2) / 4; repeat the same process for the
  555. Y components
  556. prior last MV = last MV
  557. last MV = MV for fourth Y fragment from this macroblock
  558. if coding mode = MODE_INTER_LAST_MV
  559. motion vectors for this macroblock are the same as last MV; note
  560. that in this case, the last MV remains the last MV and the prior
  561. last MV remains the prior last MV
  562. if coding mode = MODE_INTER_PRIOR_LAST
  563. motion vectors for this macroblock are the same as prior last MV
  564. prior last MV = last MV
  565. last MV = current MV (effectively, swap last and prior last vectors)
  566. Unpacking The DCT Coefficients
  567. ------------------------------
  568. After unpacking the macroblock motion vectors, the decoder unpacks the
  569. fragment DCT coefficient data. Each coded fragment has 64 DCT
  570. coefficients. Some of the coefficients will be non-zero. Many of the
  571. coefficients will, or should be 0 as this is where the coding method
  572. derives much of its compression.
  573. During this phase, the decoder will be unpacking DCT coefficients, zero
  574. runs, and end-of-block (EOB) codes. The decoder unpacks the the DC
  575. coefficients for all fragments, then all of the first AC coefficients,
  576. and so on until all of the 64 DCT coefficients are unpacked from the
  577. bitstream.
  578. To obtain the DCT coefficients, the decoder unpacks a series of VLCs
  579. from the bitstream which turn into a series of tokens ranging from
  580. 0..31. Each of these tokens specifies which action to take next. VP3
  581. defines 80 different 32-element histograms for VLC decoding:
  582. 16 histograms for DC token decoding
  583. 16 histograms for group 1 AC token decoding
  584. 16 histograms for group 2 AC token decoding
  585. 16 histograms for group 3 AC token decoding
  586. 16 histograms for group 4 AC token decoding
  587. The decoder fetches 4 bits from the bitstream that will be used to
  588. select a DC histogram and 4 bits that will be used to select 4 AC
  589. histograms, one for each AC group.
  590. The meaning of each of the 32 possible tokens follows. 'EB' stands for
  591. extra bits read from bitstream directly after the VLC token:
  592. 0, DCT_EOB_TOKEN
  593. set the current block to EOB, meaning that the block is marked as being
  594. fully unpacked
  595. 1, DCT_EOB_PAIR_TOKEN
  596. set the next 2 blocks to EOB
  597. 2. DCT_EOB_TRIPLE_TOKEN
  598. set the next 3 blocks to EOB
  599. 3, DCT_REPEAT_RUN_TOKEN
  600. set the next (2 EBs + 4) blocks to EOB
  601. 4, DCT_REPEAT_RUN2_TOKEN
  602. set the next (3 EBs + 8) blocks to EOB
  603. 5, DCT_REPEAT_RUN3_TOKEN
  604. set the next (4 EBs + 16) blocks to EOB
  605. 6, DCT_REPEAT_RUN4_TOKEN
  606. set the next (12 EBs) blocks to EOB
  607. 7, DCT_SHORT_ZRL_TOKEN
  608. skip (3 EBs + 1) positions in the output matrix
  609. 8, DCT_ZRL_TOKEN
  610. skip (6 EBs + 1) positions in the output matrix
  611. 9, ONE_TOKEN
  612. output 1 as coefficient
  613. 10, MINUS_ONE_TOKEN
  614. output -1 as coefficient
  615. 11, TWO_TOKEN
  616. output 2 as coefficient
  617. 12, MINUS_TWO_TOKEN
  618. output -2 as coefficient
  619. 13, 14, 15, 16, LOW_VAL_TOKENS
  620. next EB determines coefficient sign; coeff = DCT_VAL_CAT2_MIN (3) +
  621. (token - 13) (this gives a range of +/- 3..6)
  622. 17, DCT_VAL_CATEGORY3
  623. next EB determines coefficient sign; coeff = DCT_VAL_CAT3_MIN (7) + next
  624. EB (this gives a range of +/- 7..8)
  625. 18, DCT_VAL_CATEGORY4
  626. next EB determines coefficient sign; coeff = DCT_VAL_CAT4_MIN (9) + next
  627. 2 EBs (this gives a range of +/- 9..12)
  628. 19, DCT_VAL_CATEGORY5
  629. next EB determines coefficient sign; coeff = DCT_VAL_CAT5_MIN (13) +
  630. next 3 EBs (this gives a range of +/- 13..20)
  631. 20, DCT_VAL_CATEGORY6
  632. next EB determines coefficient sign; coeff = DCT_VAL_CAT6_MIN (21) +
  633. next 4 EBs (this gives a range of +/- 21..36)
  634. 21, DCT_VAL_CATEGORY7
  635. next EB determines coefficient sign; coeff = DCT_VAL_CAT7_MIN (37) +
  636. next 5 EBs (this gives a range of +/- 37..68)
  637. 22, DCT_VAL_CATEGORY8
  638. next EB determines coefficient sign; coeff = DCT_VAL_CAT8_MIN (69) +
  639. next 9 EBs (this gives a range of +/- 69..580)
  640. 23, 24, 25, 26, 27, DCT_RUN_CATEGORY1
  641. coefficient of +/- 1 preceded by a number of 0s; next EB determines sign
  642. of coefficient; skip (token - 22) 0s in the output matrix before
  643. placing the final coefficient (this gives a range of 1..5 0s)
  644. 28, DCT_RUN_CATEGORY1B
  645. coefficient of +/- 1 preceded by a number of 0s; next EB determines sign
  646. of coefficient; skip (next 2 EBs + 6) 0s in the output matrix before
  647. placing the final coefficient (this gives a range of 6..9 0s)
  648. 29, DCT_RUN_CATEGORY1C
  649. coefficient of +/- 1 preceded by a number of 0s; next EB determines sign
  650. of coefficient; skip (next 3 EBs + 10) 0s in the output matrix before
  651. placing the final coefficient (this gives a range of 10..17 0s)
  652. 30, DCT_RUN_CATEGORY2
  653. coefficient of +/- 2..3 preceded by a single zero; next EB determines
  654. sign of coefficient; coefficient = (next EB + 2)
  655. 31, DCT_RUN_CATEGORY2B (not specifically named in VP3 source)
  656. coefficient of +/- 2..3 preceded by 2 or 3 0s; next EB determines
  657. sign of coefficient; coefficient = (next EB + 2); skip (next EB + 2) 0s
  658. before placing coefficient in output matrix
  659. Note: EOB runs can, and often do, cross threshold stages and plane
  660. boundaries. For example, a decoder may have decoded all of the AC #2
  661. coefficients for all fragments and still have an EOB run of 2. That
  662. means that during the AC #3 decode process, the first 2 coded fragments
  663. that are not already EOB will be set to EOB.
  664. Let's work through a highly contrived example to illustrate the
  665. coefficient decoding process.
  666. [not finished]
  667. When the decoder is finished unpacking the DCT coefficients, the entire
  668. encoded VP3 frame bitstream should be consumed.
  669. Reversing The DC Prediction
  670. ---------------------------
  671. Now that all of the DCT coefficient data has been unpacked, the DC
  672. coefficients need to be fully reconstructed before the IDCT can be
  673. performed.
  674. VP3 uses a somewhat involved process for DC prediction which uses up to
  675. four DC coefficients from surrounding fragments. For each fragment to be
  676. transformed with the IDCT, the DC coefficient is predicted from weighted
  677. sum of the DC coefficients in the left (l), up-left (ul), up (u), and
  678. up-right (ur) fragments, if they are coded (not unchanged from the
  679. previous frame) in a compatible frame (current, previous, or golden).
  680. In a golden frame, the prediction is quite straightforward since all
  681. fragments will be coded. A fragment's DC prediction will fall into 1 of
  682. 5 groups:
  683. abbbbbbbbb
  684. cdddddddde
  685. cdddddddde
  686. cdddddddde
  687. cdddddddde
  688. * Group a is the top left corner fragment. There is nothing to predict
  689. from. This DC coefficient has a lot of energy and requires many bits to
  690. code.
  691. * Group b is the remainder of the top row of fragments. These fragments
  692. can only predict from the left fragment.
  693. * Group c is the left column of fragments, not including the top left
  694. fragment. These fragments have the top and top-right fragments from
  695. which to predict.
  696. * Group d is the main body of fragments. These fragments have access to
  697. all 4 predictors.
  698. * Group e is the right column of fragments, not including the top right
  699. fragment. These fragments can predict from the left, up-left and up
  700. fragments.
  701. The process of reversing prediction for interframes grows more complex.
  702. First, the decoder must evaluate which candidate fragments (l, ul, u, or
  703. ur) are available for as predictors. Then, it can only use fragments
  704. that are coded within the same frame (current, previous, or golden).
  705. Further, there are auxiliary predictors for each frame type that are
  706. initialized to 0 at the start of each video frame decode operation. The
  707. decoder falls back on these auxiliary predictors when it can not find
  708. any valid candidate predictors for the current fragment.
  709. To work through some examples, consider the following notation, e.g.:
  710. ul-C = up-left fragment, coded in the current frame
  711. u-P = up fragment, coded as a motion residual from the previous frame
  712. ur-C = up-right fragment, coded in the current frame
  713. l-G = left fragment, coded as a motion residual from the golden frame
  714. x-P = current fragment where DC prediction is being performed, coded
  715. as a motion residual from the previous frame
  716. This is a simple case:
  717. ul-C u-C ur-C
  718. l-C x-C
  719. The current fragment predicts from all four of the candidate fragments
  720. since they are coded in the same frame.
  721. ul-P u-C ur-C
  722. l-P x-P
  723. The current fragment predicts from the left and up-left fragments.
  724. ul-C u-P ur-G
  725. l-P x-G
  726. The current fragment predicts from the up-right fragment.
  727. ul-C u-C ur-C
  728. l-C x-G
  729. The current fragment does not predict from any of the candidate
  730. fragments since the current fragment is a motion residual from the
  731. golden frame. Rather, add the auxiliary golden frame predictor to the
  732. current fragment's DC coefficient. Save the new DC coefficient as the
  733. new golden frame auxiliary DC predictor.
  734. If the decoder only finds one valid candidate predictor, then it is used
  735. by itself. When the decoder finds multiple valid candidate fragments
  736. from which to predict DC, it applies a weighting function to the
  737. surrounding fragments' DC coefficients. The following table presents all
  738. 16 possible combinations of available/not available predictors and what
  739. to do in each case:
  740. ul u ur l
  741. -- -- -- --
  742. 0 0 0 0 no predictors available:
  743. use the last predictor saved for the frame type
  744. (either intra, inter, or golden)
  745. 0 0 0 1 left predictor available:
  746. pred = l.dc
  747. 0 0 1 0 up-right predictor available:
  748. pred = ur.dc
  749. 0 0 1 1 up-right, left predictors available:
  750. pred = (53 * ur.dc) + (75 * l.dc)
  751. --------------------------
  752. 128
  753. 0 1 0 0 up predictor available:
  754. pred = u.dc
  755. 0 1 0 1 up, left predictors available:
  756. pred = (u.dc + l.dc)
  757. -------------
  758. 2
  759. 0 1 1 0 up, up-right predictors available:
  760. discard up-right predictor
  761. pred = u.dc
  762. 0 1 1 1 up, up-right, left predictors available:
  763. discard up predictor
  764. pred = (53 * ur.dc) + (75 * l.dc)
  765. --------------------------
  766. 128
  767. 1 0 0 0 up-left predictor available:
  768. pred = ul.dc
  769. 1 0 0 1 up-left, left predictors available:
  770. discard up-left predictor
  771. pred = l.dc
  772. 1 0 1 0 up-left, up-right predictors available:
  773. pred = (ul.dc + ur.dc)
  774. ---------------
  775. 2
  776. 1 0 1 1 up-left, up-right, left predictors available:
  777. discard up-left predictor
  778. pred = (53 * ur.dc) + (75 * l.dc)
  779. --------------------------
  780. 128
  781. 1 1 0 0 up-left, up predictors available:
  782. discard up-left
  783. pred = u.dc
  784. 1 1 0 1 up-left, up, left predictors available:
  785. pred = (-26 * ul.dc + 29 * u.dc + 29 * l.dc)
  786. -------------------------------------
  787. 32
  788. 1 1 1 0 up-left, up, up-right predictors available:
  789. pred = (3 * ul.dc + 10 * u.dc + 3 * ur.dc)
  790. -----------------------------------
  791. 16
  792. 1 1 1 1 all 4 predictors available:
  793. discard up-right predictor
  794. pred = (-26 * ul.dc + 29 * u.dc + 29 * l.dc)
  795. -------------------------------------
  796. 32
  797. Note that this final prediction case ([ul u l]) risks outranging. The
  798. difference of the predicted DC is checked against u.dc, l.dc, and ul.dc,
  799. in that order, and if the difference is greater than 128 in any case,
  800. the predictor is assigned as that DC coefficient. In pseudocode:
  801. if (ABSOLUTE_VALUE(pred - u.dc) > 128)
  802. pref = u.dc
  803. else if (ABSOLUTE_VALUE(pred - l.dc) > 128)
  804. pref = l.dc
  805. else if (ABSOLUTE_VALUE(pred - ul.dc) > 128)
  806. pref = ul.dc
  807. The predicted value is, at long last, added to the fragment's decoded DC
  808. coefficient. Finally, the new DC coefficient is saved as the frame
  809. type's auxiliary predictor. For example, if this fragment is coded as a
  810. motion residual from the previous frame, save the fragment's DC
  811. coefficient as the previous frame auxiliary predictor.
  812. [still need to mention precise rounding considerations, a.k.a, the
  813. HIGHTBITDUPPED() macro]
  814. Reconstructing The Frame
  815. ------------------------
  816. rough outline:
  817. - foreach fragment:
  818. - if motion vector
  819. - copy motion fragment from appropriate frame into current frame
  820. (don't forget to account for unrestricted motion vectors)
  821. - dequantize fragment coefficients
  822. - run coefficients through inverse DCT
  823. - if INTRA coded fragment
  824. - output transformed coefficients
  825. - else
  826. - apply transformed residual to motion fragment
  827. [not finished]
  828. Theora Specification
  829. --------------------
  830. The Theora project leverages the VP3 codec into a new video coding
  831. system. The algorithm and bitstream format are the same as VP3 with a
  832. few minor differences:
  833. 1) The frame orientation is reversed-- VP3 is coded from bottom to top
  834. while Theora video is coded from top to bottom.
  835. [nope-- only true in the first few alpha releases; final Theora spec will
  836. be upside-down, the same as VP3]
  837. 2) Variable histograms-- VP3 uses a hardcoded set of histograms for DCT
  838. coefficient coding (described in section "Unpacking The DCT
  839. Coefficients"). Theora packs the histogram information in the header of
  840. the transport format (which is meant to be Ogg, but can probably be
  841. coerced into a variety of other multimedia container formats).
  842. 3) Variable quantization-- As with the histograms, Theora codes the
  843. quantization tables and quality thresholds (described in section
  844. "Initializing The Quantization Matrices") into the header.
  845. 4) [special VLRLC case for encoding unusually large runs of blocks;
  846. necessary for HD resolutions]
  847. [still need coding format of histogram and quantizer information]
  848. Appendix A: VP31 Quantization Matrices And Scale Factors
  849. --------------------------------------------------------
  850. The following quantization matrices and scale factor tables are hardcoded
  851. into the VP31 coding standard. These tables can vary according to the
  852. setup information transported along with a Theora file.
  853. Base quantization matrix for golden frame Y fragments (note that this
  854. is the same as JPEG):
  855. 16 11 10 16 24 40 51 61
  856. 12 12 14 19 26 58 60 55
  857. 14 13 16 24 40 57 69 56
  858. 14 17 22 29 51 87 80 62
  859. 18 22 37 58 68 109 103 77
  860. 24 35 55 64 81 104 113 92
  861. 49 64 78 87 103 121 120 101
  862. 72 92 95 98 112 100 103 99
  863. Base quantization matrix for golden frame C fragments (note that this
  864. is the same as JPEG):
  865. 17 18 24 47 99 99 99 99
  866. 18 21 26 66 99 99 99 99
  867. 24 26 56 99 99 99 99 99
  868. 47 66 99 99 99 99 99 99
  869. 99 99 99 99 99 99 99 99
  870. 99 99 99 99 99 99 99 99
  871. 99 99 99 99 99 99 99 99
  872. 99 99 99 99 99 99 99 99
  873. Base quantization matrix for interframe Y and C fragments:
  874. 16 16 16 20 24 28 32 40
  875. 16 16 20 24 28 32 40 48
  876. 16 20 24 28 32 40 48 64
  877. 20 24 28 32 40 48 64 64
  878. 24 28 32 40 48 64 64 64
  879. 28 32 40 48 64 64 64 96
  880. 32 40 48 64 64 64 96 128
  881. 40 48 64 64 64 96 128 128
  882. DC coefficient scale factor table:
  883. 220 200 190 180 170 170 160 160
  884. 150 150 140 140 130 130 120 120
  885. 110 110 100 100 90 90 90 80
  886. 80 80 70 70 70 60 60 60
  887. 60 50 50 50 50 40 40 40
  888. 40 40 30 30 30 30 30 30
  889. 30 20 20 20 20 20 20 20
  890. 20 10 10 10 10 10 10 10
  891. AC coefficient scale factor table:
  892. 500 450 400 370 340 310 285 265
  893. 245 225 210 195 185 180 170 160
  894. 150 145 135 130 125 115 110 107
  895. 100 96 93 89 85 82 75 74
  896. 70 68 64 60 57 56 52 50
  897. 49 45 44 43 40 38 37 35
  898. 33 32 30 29 28 25 24 22
  899. 21 19 18 17 15 13 12 10
  900. Appendix B: Macroblock Coding Mode Alphabets
  901. --------------------------------------------
  902. These are the 6 pre-defined alphabets used to decode macroblock coding
  903. mode information:
  904. Alphabet 1:
  905. index coding mode
  906. ----- -----------
  907. 0 MODE_INTER_LAST_MV
  908. 1 MODE_INTER_PRIOR_LAST
  909. 2 MODE_INTER_PLUS_MV
  910. 3 MODE_INTER_NO_MV
  911. 4 MODE_INTRA
  912. 5 MODE_USING_GOLDEN,
  913. 6 MODE_GOLDEN_MV
  914. 7 MODE_INTER_FOURMV
  915. Alphabet 2:
  916. index coding mode
  917. ----- -----------
  918. 0 MODE_INTER_LAST_MV
  919. 1 MODE_INTER_PRIOR_LAST
  920. 2 MODE_INTER_NO_MV
  921. 3 MODE_INTER_PLUS_MV
  922. 4 MODE_INTRA
  923. 5 MODE_USING_GOLDEN
  924. 6 MODE_GOLDEN_MV
  925. 7 MODE_INTER_FOURMV
  926. Alphabet 3:
  927. index coding mode
  928. ----- -----------
  929. 0 MODE_INTER_LAST_MV
  930. 1 MODE_INTER_PLUS_MV
  931. 2 MODE_INTER_PRIOR_LAST
  932. 3 MODE_INTER_NO_MV
  933. 4 MODE_INTRA
  934. 5 MODE_USING_GOLDEN
  935. 6 MODE_GOLDEN_MV
  936. 7 MODE_INTER_FOURMV
  937. Alphabet 4:
  938. index coding mode
  939. ----- -----------
  940. 0 MODE_INTER_LAST_MV
  941. 1 MODE_INTER_PLUS_MV
  942. 2 MODE_INTER_NO_MV
  943. 3 MODE_INTER_PRIOR_LAST
  944. 4 MODE_INTRA
  945. 5 MODE_USING_GOLDEN
  946. 6 MODE_GOLDEN_MV
  947. 7 MODE_INTER_FOURMV
  948. Alphabet 5:
  949. index coding mode
  950. ----- -----------
  951. 0 MODE_INTER_NO_MV
  952. 1 MODE_INTER_LAST_MV
  953. 2 MODE_INTER_PRIOR_LAST
  954. 3 MODE_INTER_PLUS_MV
  955. 4 MODE_INTRA
  956. 5 MODE_USING_GOLDEN
  957. 6 MODE_GOLDEN_MV
  958. 7 MODE_INTER_FOURMV
  959. Alphabet 6:
  960. index coding mode
  961. ----- -----------
  962. 0 MODE_INTER_NO_MV
  963. 1 MODE_USING_GOLDEN
  964. 2 MODE_INTER_LAST_MV
  965. 3 MODE_INTER_PRIOR_LAST
  966. 4 MODE_INTER_PLUS_MV
  967. 5 MODE_INTRA
  968. 6 MODE_GOLDEN_MV
  969. 7 MODE_INTER_FOURMV
  970. Appendix C: DCT Coefficient VLC Tables
  971. --------------------------------------
  972. - VP31 tables are hardcoded
  973. - Theora tables are transported with video stream
  974. [not finished]
  975. Appendix D: The VP3 IDCT
  976. ------------------------
  977. [not finished]
  978. Acknowledgements
  979. ----------------
  980. Thanks to Michael Niedermayer (michaelni at gmx dot at) for peer review,
  981. corrections, and recommendations for improvement.
  982. Dan Miller (dan at on2 dot com) for clarifications on pieces of the
  983. format.
  984. Timothy B. Terriberry (tterribe at vt dot edu) for clarification about the
  985. differences between VP3 and Theora, detailed explanation of motion
  986. vector mechanics.
  987. References
  988. ----------
  989. Tables necessary for decoding VP3:
  990. http://mplayerhq.hu/cgi-bin/cvsweb.cgi/~checkout~/ffmpeg/libavcodec/vp3data.h?content-type=text/x-cvsweb-markup&cvsroot=FFMpeg
  991. Official VP3 site:
  992. http://www.vp3.com/
  993. Theora, based on VP3:
  994. http://www.theora.org/
  995. On2, creators of the VP3 format:
  996. http://www.on2.com/
  997. ChangeLog
  998. ---------
  999. v0.5: December 8, 2004
  1000. - reworked section "Reversing The DC Prediction" to include a tabular
  1001. representation of all 16 prediction modes
  1002. v0.4: March 2, 2004
  1003. - renamed and expanded section "Initializing The Quantization Matrices"
  1004. - outlined section "Reconstructing The Frame"
  1005. - moved Theora Differences Appendix to its own section entitled "Theora
  1006. Specification"
  1007. - added Appendix: Quantization Matrices And Scale Factors
  1008. - added Appendix: DCT Coefficient VLC Tables
  1009. v0.3: February 29, 2004
  1010. - expanded section "Unpacking The Macroblock Coding Mode Information"
  1011. - expanded section "Unpacking The Macroblock Motion Vectors"
  1012. - added Appendix: Macroblock Coding Mode Alphabets
  1013. v0.2: October 9, 2003
  1014. - expanded section "Reversing the DC Prediction"
  1015. - added Appendix: Theora Differences
  1016. v0.1: June 17, 2003
  1017. - initial release, nowhere near complete