draft-egge-videocodec-tdlt.xml 39 KB


  1. <?xml version="1.0" encoding="utf-8"?>
  2. <!DOCTYPE rfc SYSTEM 'rfc2629.dtd'>
  3. <?rfc toc="yes" symrefs="yes" ?>
  4. <?rfc compact="yes"?>
  5. <rfc ipr="noDerivativesTrust200902" category="info" docName="draft-egge-videocodec-tdlt-01">
  6. <front>
  7. <title abbrev="TDLT">Time Domain Lapped Transforms for Video Coding</title>
  8. <author initials="N.E." surname="Egge" fullname="Nathan E. Egge">
  9. <organization>Mozilla Corporation</organization>
  10. <address>
  11. <postal>
  12. <street>650 Castro Street</street>
  13. <city>Mountain View</city>
  14. <region>CA</region>
  15. <code>94041</code>
  16. <country>USA</country>
  17. </postal>
  18. <email>negge@dgql.org</email>
  19. </address>
  20. </author>
  21. <author initials="T.B." surname="Terriberry" fullname="Timothy B. Terriberry">
  22. <organization>Mozilla Corporation</organization>
  23. <address>
  24. <postal>
  25. <street>650 Castro Street</street>
  26. <city>Mountain View</city>
  27. <region>CA</region>
  28. <code>94041</code>
  29. <country>USA</country>
  30. </postal>
  31. <phone>+1 650 903-0800</phone>
  32. <email>tterribe@xiph.org</email>
  33. </address>
  34. </author>
  35. <date day="9" month="March" year="2015"/>
  36. <area>RAI</area>
  37. <abstract>
  38. <t>
  39. This proposes the use of Time Domain Lapped Transforms (TDLT) as the transform
  40. step for video coding.
  41. </t>
  42. </abstract>
  43. </front>
  44. <middle>
  45. <section anchor="intro" title="Introduction">
  46. <t>
  47. This draft outlines a proposal to adapt the Time-Domain Lapped Transforms
  48. (TDLT) for use in video coding.
  49. Lapped transforms were proposed for video coding at least as as far back as
  50. 1989&nbsp;<xref target="Malv89"/>.
  51. Like the loop filters more commonly found in recent video coding standards,
  52. TDLTs use a post-processing filter that runs between block edges to reduce or
  53. eliminate blocking artifacts.
  54. Unlike a loop filter, the TDLT filter is invertible, allowing the encoder to
  55. run the inverse filter on the input video.
  56. This decorrelates blocks before they are passed through a normal block
  57. transform and quantization step, improving coding gain (which helps in both
  58. smooth and highly textured areas), in addition to reducing blocking artifacts.
  59. </t>
  60. </section>
  61. <!--section anchor="terminology" title="Terminology">
  62. <t>
  63. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
  64. "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
  65. interpreted as described in <xref target="RFC2119"/>.
  66. </t>
  67. </section-->
  68. <section anchor="tdlt_def" title="TDLT Defined">
  69. <t>
  70. The Time-Domain Lapped Transform can be viewed as a set of pre and post filters
  71. to an existing block-based DCT transform.
  72. The idea is to place an invertible filter along the block boundaries outside
  73. an existing block-based DCT encoder.
  74. </t>
  75. <!--figure anchor="filter_diagram" title="Filter Diagram" align="center"-->
  76. <figure align="center">
  77. <artwork align="center"><![CDATA[
  78. +----+ +----+
  79. | | Q | |
  80. +----+|DCT | u |iDCT|+----+
  81. | || | a | || |
  82. | P |+----+ n +----+|P^-1|
  83. | || | t | || |
  84. +----+|DCT | i |iDCT|+----+
  85. | || | z | || |
  86. | P |+----+ a +----+|P^-1|
  87. | || | t | || |
  88. +----+|DCT | i |iDCT|+----+
  89. | | o | |
  90. +----+ n +----+
  91. ]]></artwork>
  92. </figure>
  93. <t>
  94. The pre-filter P operates in the time domain, processing block boundaries and
  95. removing inter-block correlation.
  96. The blocks are then transformed by the DCT into the frequency domain, where
  97. the resulting coefficients are quantized and encoded.
  98. When decoding, the inverse operator P^-1 is applied as a post-filter to the
  99. output of the inverse DCT.
  100. This has two benefits:
  101. <list style="numbers">
  102. <t>
  103. Quantization errors are spread over adjacent blocks via the post-filter
  104. P^-1, reducing blocking artifacts.
  105. This eliminates the need for a separate deblocking filter.
  106. </t>
  107. <t>
  108. The increased support region of the transform allows it to take advantage of
  109. inter-block correlation to achieve a higher coding gain than a non-overlapped
  110. DCT.
  111. This allows it to more effectively code both smooth and textured regions.
  112. </t>
  113. </list>
  114. </t>
  115. <t>
  116. The pre-filter P is defined in <xref target="Tran01"/> as follows:
  117. <!--figure anchor="prefilter" title="Pre-Filter Operation" align="center"-->
  118. <figure align="center">
  119. <artwork align="center"><![CDATA[
  120. 1 [ I J ][ I 0 ][ I J ]
  121. P = --- [ ][ ][ ]
  122. 2 [ J -I ][ 0 V ][ J -I ]
  123. ]]></artwork>
  124. </figure>
  125. Here I is the identity matrix and J is the "reversal matrix", obtained by
  126. simply re-ordering the rows of the identity matrix in reverse order.
  127. The V matrix is a free parameter, and as long as V is invertible, this filter
  128. structure guarantees perfect reconstruction, linear phase, and
  129. biorthogonality.
  130. If V is orthogonal, then the overall transform is also orthogonal instead of
  131. just biorthogonal.
  132. </t>
  133. <t>
  134. For the case of the 4x8 TDLT, we use the following invertible matrix for V:
  135. <!--figure anchor="rotation" title="Type-IV approximation for 4x8 TDLT" align="center"-->
  136. <figure align="center">
  137. <artwork align="center"><![CDATA[
  138. [ 1 q_0 ][ 1 0 ][ s_0 0 ]
  139. V = [ ][ ][ ]
  140. [ 0 1 ][ p_0 1 ][ 0 s_1 ]
  141. ]]></artwork>
  142. </figure>
  143. Thus for the 4x8 case, the pre-filter and post-filter are completely described
  144. by the four parameters q_0, p_0, s_0, and s_1.
  145. In general, any invertible V matrix may be used.
  146. However, factoring V into a series of lifting steps ensures that it can be
  147. implemented efficiently, and can reduce the number of parameters required by
  148. the optimization process, since the full flexibility of an arbitrary
  149. invertible matrix is not required to achieve good coding gain.
  150. <xref target="Tran01"/> proposes two reduced-parameter factorizations, dubbed
  151. Type&nbsp;III and Type&nbsp;IV.
  152. These are identical in the 4x8 case, but for larger transforms the differ in
  153. the order that the p_i and q_i steps are applied: interleaved for
  154. Type&nbsp;III and ascending and then descending order for Type&nbsp;IV.
  155. While Type&nbsp;III appears to give slightly higher coding gain when
  156. unconstrained, when coupled with the ramp constraint discussed below and the
  157. constraint that all coefficients be dyadic rationals, the number of feasible
  158. solutions is much smaller than with Type&nbsp;IV.
  159. The increased number of feasible solutions allows Type&nbsp;IV transforms to
  160. achieve higher coding gains than Type&nbsp;III when these constraints are
  161. imposed.
  162. This definition easily extends to the 8x16 and 16x32 TDLT case with similar
  163. parameterizations.
  164. In general, we use the Type&nbsp;IV factorization
  165. from&nbsp;<xref target="Tran01"/>.
  166. For a V matrix of size M, this has (M-1) p_i and (M-1) q_i parameters, and
  167. M s_i parameters.
  168. For a transform of size Nx2N, this gives a total of 1.5N-2 parameters.
  169. This is also the number of lifting steps that must be performed to implement
  170. the V portion of the pre- and post-filters.
  171. </t>
  172. </section>
  173. <section anchor="metrics" title="Lapped-Transform Selection">
  174. <t>
  175. We would like to find good candidate transform coefficients that perform well
  176. within a video coding framework. There are several metrics we can use for
  177. evaluating pre-filter parameters. Including
  178. </t>
  179. <t>
  180. <list style="numbers">
  181. <t>Coding Gain - how well energy is compacted into only a few coefficients</t>
  182. <t>Side Band Attenuation - how much energy from frequencies outside the passband leaks into each basis function</t>
  183. <t>Transform Width - how wide are the basis functions and how much ringing they will cause</t>
  184. <t>Orthogonality - how linearly independent the basis functions are</t>
  185. </list>
  186. </t>
  187. <t>
  188. Of these, the most important by far is coding gain as it allows us to directly
  189. measure the improvement in bits between different candidate transforms. At
  190. high bit rates using an efficient quantizer, every 6.02&nbsp;dB improvement in
  191. coding gain saves a bit of entropy per coefficient.
  192. </t>
  193. <section anchor="coding_gain" title="Coding Gain">
  194. <t>
  195. Coding gain is a useful metric for comparing different candidate transforms.
  196. Roughly speaking, it is the measure of how well energy is compacted into only
  197. a few coefficients.
  198. The formula for coding gain of the lapped transform can be found in
  199. <xref target="Terr12"/>.
  200. Using an AR(1) model with r=0.95, we have
  201. <figure align="center">
  202. <artwork align="center"><![CDATA[
  203. / 1 \
  204. C_g = 10*Log_10 | ----------------------------------------- |
  205. \ Prod_i((G*AR(1)*G^T)[i,i]*(H^T*H)[i,i]) /
  206. ]]></artwork>
  207. </figure>
  208. </t>
  209. <t>
  210. where G is the analysis filter of the lapped transform:
  211. <figure align="center">
  212. <artwork align="center"><![CDATA[
  213. [ ][ P 0 ]
  214. G = [ 0 DCT 0 ][ ]
  215. [ ][ 0 P ]
  216. ]]></artwork>
  217. </figure>
  218. and H is the synthesis filter of the lapped transform:
  219. <figure align="center">
  220. <artwork align="center"><![CDATA[
  221. [ P^-1 0 ][ 0 ]
  222. H = [ ][ iDCT ]
  223. [ 0 P^-1 ][ 0 ]
  224. ]]></artwork>
  225. </figure>
  226. </t>
  227. <t>
  228. In <xref target="Terr12"/> the coding gain of the non-lapped DCT is compared
  229. with the optimal non-lapped Karhunen–Loève transform for the same AR(1) model
  230. with r=0.95.
  231. <figure align="center">
  232. <artwork align="center">
  233. 4 point 8 point 16 point
  234. +-----------+-----------+-----------+
  235. DCT | 7.5701 dB | 8.8259 dB | 9.4555 dB |
  236. KLT | 7.5825 dB | 8.8462 dB | 9.4781 dB |
  237. +-----------+-----------+-----------+
  238. </artwork>
  239. </figure>
  240. Similarly, in <xref target="Tran01"/> the coding gain of the TDLT using fast
  241. factorizations with real coefficients produced by unconstrained optimization
  242. are
  243. <figure align="center">
  244. <artwork align="center"><![CDATA[
  245. 4x8 8x16 16x32
  246. +-----------+-----------+-----------+
  247. Type III TDLT | 8.6349 dB | 9.6115 dB | 9.9496 dB |
  248. Type IV TDLT | 8.6349 dB | 9.6005 dB | 9.9057 dB |
  249. +-----------+-----------+-----------+
  250. ]]></artwork>
  251. </figure>
  252. </t>
  253. </section>
  254. <!--section anchor="sba" title="Side-Band Attenuation">
  255. </section-->
  256. <section anchor="transform_width" title="Transform Width">
  257. <t>
  258. In general, the wider the transform, the higher the coding gain: a 16-point
  259. DCT will always have a higher coding gain than a 4-point DCT.
  260. In the case of lapped transform, the width of the transform is more than just
  261. counting the number of points, it involves the shape of the basis functions.
  262. At equal coding gain, a narrower transform is better because it causes a
  263. smaller amount of ringing around edges.
  264. We define the width of the transform as
  265. <figure align="center">
  266. <artwork align="center"><![CDATA[
  267. 1/4
  268. / sum_ij ( H[i,j]^2 * (j-N+1/2)^4 ) \
  269. w = C * | ---------------------------------- | ,
  270. \ sum_ij ( H[i,j]^2 ) /
  271. ]]></artwork>
  272. </figure>
  273. where C=2.991 is a constant calibrated such that the width of the 1024-point
  274. non-overlapped DCT is equal to 1024.
  275. </t>
  276. </section>
  277. <!--section anchor="orthogonality" title="Orthogonality">
  278. </section-->
  279. </section>
  280. <section anchor="search" title="Optimal Transform Coefficients">
  281. <t>
  282. Of the four metrics described in <xref target="metrics"/> we chose to optimize
  283. our transform parameters for the highest coding gain.
  284. </t>
  285. <t>
  286. To avoid the use of floating point operations, we use dyadic rationals to
  287. represent the parameters of our TDLT.
  288. These are the p's, q's and s's that describe the V matrix in the pre-filter.
  289. We chose a base of 2^6 because it offered enough resolution to find good
  290. approximations of the optimal values for the p's, q's, and s's and still
  291. allowed us to fit the results of multiplications in a 16 bit word.
  292. Increasing the base to 2^8 improves the achievable coding gain of the 4x8
  293. transform by less than 0.002&nbsp;dB.
  294. On the other hand, dropping it even one bit to 2^5 lowers the coding gain by
  295. 0.037&nbsp;dB.
  296. </t>
  297. <section anchor="brute" title="Exhaustive Search">
  298. <t>
  299. For the smaller lapped transforms, it is possible to simply do an exhaustive
  300. search and check all possible transform candidates to find the one with the
  301. best coding gain.
  302. The limitation that the p's, q's, and s's all be dyadic rationals allows us to
  303. simply enumerate all reasonable values.
  304. Additional constraints allowed us to further reduce the search space.
  305. Because the p's and q's are liftings steps that represent rotations in the plane
  306. their, values are between -1.0 and 1.0.
  307. Likewise the limitation that the pre- and post-filter steps be reversible
  308. requires that the scale factors be greater than or equal 1.0, otherwise
  309. information would be lost during the transform.
  310. Finally, all things equal we prefer smaller scale factors as it makes quantizing
  311. and encoding the coefficients cheaper.
  312. We thus cap the scale factors at 2.0.
  313. Based on some limited experimentation, scale factors larger than this do not
  314. appear to produce useful transforms according to our metrics, anyway.
  315. </t>
  316. <t>
  317. With a dyadic rational base of 2^6, the number of possible candidates to
  318. consider is
  319. <figure align="left">
  320. <artwork align="left"><![CDATA[
  321. |C| = (2*(2^6)+1)^(|p|+|q|) * (2^6+1)^|s|
  322. = (2*(2^6)+1)^(2*(N/2-1)) * (2^6+1)^(N/2)
  323. ]]></artwork>
  324. </figure>
  325. Thus for the transform sizes we are interested in, the number of candidates is
  326. tractable only for the 4x8 case:
  327. <figure align="left">
  328. <artwork align="left"><![CDATA[
  329. N |C|
  330. +-----+------------------+
  331. 4x8 TDLT | 4 | 68161536 |
  332. 8x16 TDLT | 8 | 7.731400 * 10^19 |
  333. 16x32 TDLT | 16 | 9.947082 * 10^43 |
  334. +-----+------------------+
  335. ]]></artwork>
  336. </figure>
  337. </t>
  338. <t>
  339. An exhaustive search for parameters that give the optimal coding gain for the
  340. 4x8 TDLT are below:
  341. <figure align="left">
  342. <artwork align="left"><![CDATA[
  343. +-----+--------+ +-----+--------+ +-----+--------+
  344. | p_0 | -11/64 | | q_0 | 36/64 | | s_0 | 91/64 |
  345. +-----+--------+ +-----+--------+ | s_1 | 85/64 |
  346. +-----+--------+
  347. ]]></artwork>
  348. </figure>
  349. </t>
  350. </section>
  351. <section anchor="stochastic" title="Stochastic Search">
  352. <t>
  353. For the larger lapped transforms, doing an exhaustive search is not possible.
  354. Instead we formulate the optimization problem as an integer programming problem
  355. and use a robust industrial solver to find optimal integer values for the
  356. p's, q's, and s's.
  357. </t>
  358. <t>
  359. For the 8x16 TDLT, the parameters are below:
  360. <figure align="left">
  361. <artwork align="left"><![CDATA[
  362. +-----+--------+ +-----+--------+ +-----+--------+
  363. | p_0 | -23/64 | | q_0 | 48/64 | | s_0 | 90/64 |
  364. | p_1 | -18/64 | | q_1 | 34/64 | | s_1 | 73/64 |
  365. | p_2 | -6/64 | | q_2 | 20/64 | | s_2 | 72/64 |
  366. +-----+--------+ +-----+--------+ | s_3 | 75/64 |
  367. +-----+--------+
  368. ]]></artwork>
  369. </figure>
  370. </t>
  371. <t>
  372. For the 16x32 TDLT, the parameters are below:
  373. <figure align="left">
  374. <artwork align="left"><![CDATA[
  375. +-----+--------+ +-----+--------+ +-----+--------+
  376. | p_0 | -24/64 | | q_0 | 50/64 | | s_0 | 90/64 |
  377. | p_1 | -23/64 | | q_1 | 40/64 | | s_1 | 74/64 |
  378. | p_2 | -17/64 | | q_2 | 31/64 | | s_2 | 73/64 |
  379. | p_3 | -12/64 | | q_3 | 22/64 | | s_3 | 71/64 |
  380. | p_4 | -14/64 | | q_4 | 18/64 | | s_4 | 67/64 |
  381. | p_5 | -13/64 | | q_5 | 16/64 | | s_5 | 67/64 |
  382. | p_6 | -7/64 | | q_6 | 11/64 | | s_6 | 67/64 |
  383. +-----+--------+ +-----+--------+ | s_7 | 72/64 |
  384. +-----+--------+
  385. ]]></artwork>
  386. </figure>
  387. </t>
  388. <t>
  389. In order to confirm that the integer approximations found are in fact optimal,
  390. we can compare them with the optimal real valued coding gains for the three
  391. lapped-transforms we are proposing.
  392. In <xref target="Tran01"/> a numeric solver was used to find optimal values for
  393. a Type&nbsp;IV lapped transform.
  394. <figure align="left">
  395. <artwork align="left"><![CDATA[
  396. 4x8 8x16 16x32
  397. +------------+------------+------------+
  398. Real Valued | 8.6349 dB | 9.6005 dB | 9.9057 dB |
  399. Approximate | 8.63473 dB | 9.60021 dB | 9.89338 dB |
  400. +------------+------------+------------+
  401. Loss | 0.00017 dB | 0.00029 dB | 0.01232 dB |
  402. +------------+------------+------------+
  403. ]]></artwork>
  404. </figure>
  405. </t>
  406. </section>
  407. <section anchor="ramp" title="Ramp Constraint">
  408. <t>
  409. It is also possible to constrain the lapped transform so that it is
  410. (1,2)-regular&nbsp;<xref target="DT03"/>, i.e., that it has one vanishing
  411. moment in the analysis filter and two vanishing moments in the synthesis
  412. filter.
  413. This allows the synthesis filter to reconstruct any piecewise linear function
  414. solely from the DC coefficients.
  415. This causes the shape of the DC basis function to be a symmetric linear ramp.
  416. This can be particularly useful when it matches the shape of other windowing
  417. functions used in the codec.
  418. For example, a linear window is commonly used with Overlapped Block Motion
  419. Compensation (OBMC), which is one possible approach for avoiding blocking
  420. artifacts in the motion-compensation stage of the codec.
  421. More vanishing moments are possible, allowing reconstruction of piecewise
  422. quadratic or even higher-order functions, but these require additional overlap
  423. stages.
  424. </t>
  425. <t>
  426. This regularity can be enforced solely by enforcing a series of constraints on
  427. the scale factors, s_i.
  428. <figure align="center">
  429. <artwork align="center"><![CDATA[
  430. s_0 = N*(1 - q_0)
  431. N / \
  432. s_i = ------- * | (q_{i-1} - 1)*p_{i-1} - q_i | , for i > 0
  433. 2*i + 1 \ /
  434. ]]></artwork>
  435. </figure>
  436. Since 2*i + 1 is odd, but we want s_i to be a dyadic rational value, the
  437. remainder of the expression must be evenly divisible by (2*i+1).
  438. A similar set of constraints can be derived for Type&nbsp;III, but they involve
  439. more of the p's and q's per s_i value, and thus have far fewer admissible
  440. solutions when coupled with the dyadic rational constraint.
  441. </t>
  442. <t>
  443. The additional restrictions described above greatly reduce the number of
  444. combinations to consider, both because there are fewer parameters (the s_i's
  445. can no longer be chosen independently) and because there are fewer
  446. combinations of parameter values which produce dyadic rational coefficients.
  447. With these constraints, the number of combinations is small enough that an
  448. exhaustive search is now tractable for the 8x16 TDLT.
  449. <figure align="left">
  450. <artwork align="left"><![CDATA[
  451. N |C|
  452. +-----+-----------+
  453. 4x8 TDLT | 4 | 442 |
  454. 8x16 TDLT | 8 | 331677320 |
  455. +-----+-----------+
  456. ]]></artwork>
  457. </figure>
  458. An exhaustive search for parameters that give the optimal coding gain under the
  459. ramp and dyadic rational constraints for the 4x8 and 8x16 TDLT are below:
  460. <figure align="left">
  461. <artwork align="left"><![CDATA[
  462. +-----+--------+ +-----+--------+ +-----+--------+
  463. | p_0 | -16/64 | | q_0 | 41/64 | | s_0 | 92/64 |
  464. +-----+--------+ +-----+--------+ | s_1 | 93/64 |
  465. +-----+--------+
  466. ]]></artwork>
  467. </figure>
  468. <figure align="left">
  469. <artwork align="left"><![CDATA[
  470. +-----+--------+ +-----+--------+ +-----+--------+
  471. | p_0 | -24/64 | | q_0 | 53/64 | | s_0 | 88/64 |
  472. | p_1 | -20/64 | | q_1 | 40/64 | | s_1 | 75/64 |
  473. | p_2 | -4/64 | | q_2 | 24/64 | | s_2 | 76/64 |
  474. +-----+--------+ +-----+--------+ | s_3 | 76/64 |
  475. +-----+--------+
  476. ]]></artwork>
  477. </figure>
  478. </t>
  479. <t>
  480. Unfortunately, in the 16x32 TDLT case the number of combinations is still not
  481. tractable, even with these additional constraints.
  482. Again, we use an integer programming model to solve for the integer parameters
  483. that optimize coding gain in this context.
  484. <figure align="left">
  485. <artwork align="left"><![CDATA[
  486. +-----+--------+ +-----+--------+ +-----+--------+
  487. | p_0 | -32/64 | | q_0 | 59/64 | | s_0 | 80/64 |
  488. | p_1 | -28/64 | | q_1 | 53/64 | | s_1 | 72/64 |
  489. | p_2 | -24/64 | | q_2 | 46/64 | | s_2 | 73/64 |
  490. | p_3 | -32/64 | | q_3 | 41/64 | | s_3 | 68/64 |
  491. | p_4 | -24/64 | | q_4 | 35/64 | | s_4 | 72/64 |
  492. | p_5 | -13/64 | | q_5 | 24/64 | | s_5 | 74/64 |
  493. | p_6 | -2/64 | | q_6 | 12/64 | | s_6 | 74/64 |
  494. +-----+--------+ +-----+--------+ | s_7 | 70/64 |
  495. +-----+--------+
  496. ]]></artwork>
  497. </figure>
  498. </t>
  499. <t>
  500. <figure align="left">
  501. <artwork align="left"><![CDATA[
  502. 4x8 8x16 16x32
  503. +------------+------------+------------+
  504. Dyadic | 8.63473 dB | 9.60021 dB | 9.89338 dB |
  505. Ramp + Dyadic | 8.59886 dB | 9.56161 dB | 9.78294 dB |
  506. +------------+------------+------------+
  507. Loss | 0.03587 dB | 0.0386 dB | 0.11044 dB |
  508. +------------+------------+------------+
  509. ]]></artwork>
  510. </figure>
  511. </t>
  512. </section>
  513. </section>
  514. <section anchor="intra_prediction" title="Intra Prediction">
  515. <t>
  516. Since the final pixel values of a block are not available until after the
  517. post-filter runs, they cannot be used to predict neighboring blocks.
  518. There are a number of possible solutions to this.
  519. For example, one could simply use pixels from outside the overlap region.
  520. However, as these pixels are farther away, they are poorer predictors, and the
  521. extra distance reduces the range of prediction directions which have enough
  522. neighbors available to form an adequate
  523. extrapolation&nbsp;<xref target="OP11"/>.
  524. </t>
  525. <t>
  526. An alternate approach is to perform the prediction in the frequency domain.
  527. Initial experiments suggest that this is just as effective as prediction in the
  528. time domain, and has similar computational
  529. requirements&nbsp;<xref target="Egge13"/>.
  530. However, because the frequency domain coefficients of a neighboring block are
  531. impacted both by what size DCT was used, and the lapping across all four of
  532. its edges, directional predictors can only be so good.
  533. At low rates, this meant more bits were spent correcting an incorrect predictor
  534. than were saved by coding only a directional mode.
  535. </t>
  536. <t>
  537. A signal free technique was developed for doing limited intra prediction in
  538. the frequency domain when using lapped transforms.
  539. Note that when the spatial prediction mode is exactly horizontal or vertical,
  540. applying the filters described in this draft along the orthogonal direction
  541. is the identity.
  542. Thus it is possible to look at the horizontal coefficients of the neighboring
  543. block to the left, and the vertical energy of the neighboring block above and
  544. simply use the coefficients where the energy is larger.
  545. When this technique is coupled with a quantization and coefficient coder that
  546. makes signaling no predictor cheap&nbsp;<xref target="Vali15"/>, this becomes
  547. an effective frequency domain intra predictor.
  548. </t>
  549. <t>
  550. Finally, a technique was developed for intra predicting chroma frequency domain
  551. coefficients from decoded coincident luma
  552. coefficients&nbsp;<xref target="Egge15"/>.
  553. While this technique does not strictly require the use of lapped transforms,
  554. because the block size extent (and thus the lapping region) for both the
  555. chroma and luma planes is the same, the use of a lapped transform does not
  556. change the effectiveness of this technique.
  557. </t>
  558. </section>
  559. <section anchor="motion_comp" title="Motion Compensation">
  560. <t>
  561. There have been several lapped transform proposals that perform block-by-block
  562. motion compensation by simply expanding the size of the prediction region for
  563. each block&nbsp;<xref target="TT01"/>,&nbsp;<xref target="OPT11"/>.
  564. However, in addition to increasing the amount of motion-compensated prediction
  565. pixels that must be computed by a factor of four, this also increases the
  566. number of applications of the pre- and post-filter by a factor of four, since
  567. this must now be done separately for each block, using the motion-compensated
  568. frame difference for that block.
  569. </t>
  570. <t>
  571. An alternate approach is simply perform motion compensation of the frame in a
  572. completely separate step, prior to any transform, using any method
  573. desired&nbsp;<xref target="Terr15"/>.
  574. The lapping can then be applied to this motion-compensated prediction,
  575. producing per-block predictors.
  576. This still allows the prediction mode (inter, intra, bi-prediction, etc.) to be
  577. chosen on a block-by-block basis.
  578. It also interacts well with other techniques designed to operate in the
  579. frequency domain, such as the Pyramid Vector Quantization (PVQ) proposed
  580. elsewhere.
  581. </t>
  582. <t>
  583. The downside is that motion estimation in the encoder needs to be performed for
  584. regions slightly beyond the current block.
  585. However, this is already required by blocking-artifact-free motion compensation
  586. techniques, such as Overlapped Block Motion Compensation (OBMC).
  587. Experience with OBMC has shown that an encoder can mostly ignore look-ahead and
  588. still get acceptable results, unlike other techniques, such as control-grid
  589. interpolation (CGI).
  590. </t>
  591. </section>
  592. <section anchor="multiple_blocks" title="Multiple Block Sizes">
  593. <t>
  594. Multiple block size support is important for lapped transforms, since the
  595. larger support region increases their susceptibility to ringing artifacts
  596. compared to a non-overlapped transform with the same number of coefficients
  597. (though it is greatly reduced compared to a non-overlapped transform with a
  598. support region of the same size).
  599. </t>
  600. <section anchor="variable_lapping" title="Variable Sized Lapping">
  601. <t>
  602. The most obvious approach is to require that the size of the overlap filter be
  603. constrained by the smallest block adjacent to a given edge.
  604. This requires some amount of look-ahead in the encoder, but has the benefit of
  605. using the largest lapping possible in regions where all blocks are the same
  606. size while not introducing discontinuities where blocks of different sizes
  607. meet.
  608. Note that this has an effect on the coding syntax, as the block size decision
  609. for the block below the one being coded must made and communicated to the
  610. decoder prior to coding.
  611. Using this convention no additional information need to be communicated other
  612. than the block size decision to completely describe how the variable sized
  613. lapping should be applied.
  614. </t>
  615. <t>
  616. Consider an example image that is 32x32 with the following block size
  617. decisions.
  618. We apply the lapping recursively to blocks of 32x32 at a time until we reach
  619. a block that is not subdivided into smaller blocks.
  620. At each step in the recursions, we apply a filter vertically across the
  621. block edges that run left to right splitting the block in half.
  622. We then apply a horizontal filter across the block edges that split the block
  623. in half top to bottom.
  624. </t>
  625. <figure align="left">
  626. <artwork align="left"><![CDATA[
  627. +-------+-------+---------------+-------------------------------+
  628. | | | | |
  629. | | | | |
  630. | | | | |
  631. +-------+-------+ | |
  632. | | | | |
  633. | | | | |
  634. | | | | |
  635. +-------+-------+---------------+ |
  636. | | | |
  637. | | | |
  638. | | | |
  639. | | | |
  640. | | | |
  641. | | | |
  642. | | | |
  643. +---------------+-------+-------+-------------------------------+
  644. | | | | |
  645. | | | | |
  646. | | | | |
  647. | +-------+-------+ |
  648. | | | | |
  649. | | | | |
  650. | | | | |
  651. +---------------+-------+-------+ |
  652. | | | |
  653. | | | |
  654. | | | |
  655. | | | |
  656. | | | |
  657. | | | |
  658. | | | |
  659. +-------------------------------+-------------------------------+
  660. ]]></artwork>
  661. <postamble>Block size decision for a 32x32 frame.</postamble>
  662. </figure>
  663. <figure align="left">
  664. <artwork align="left"><![CDATA[
  665. +-------+-------+---------------+-------------------------------+
  666. | | | | |
  667. | | | | |
  668. | | | | |
  669. +-------+-------+ | |
  670. | | | | |
  671. | | | | |
  672. | | | | |
  673. +-------+-------+---------------+ |
  674. | | |X X X X X X X X X X X X X X X X|
  675. | | |X X X X X X X X X X X X X X X X|
  676. | | |X X X X X X X X X X X X X X X X|
  677. | | |X X X X X X X X X X X X X X X X|
  678. |X X X X X X X X| |X X X X X X X X X X X X X X X X|
  679. |X X X X X X X X| |X X X X X X X X X X X X X X X X|
  680. |X X X X X X X X|X X X X X X X X|X X X X X X X X X X X X X X X X|
  681. +X-X-X-X-X-X X-X+X-X-X-X+X-X-X-X+X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X+
  682. |X X X X X X X X|X X X X|X X X X|X X X X X X X X X X X X X X X X|
  683. |X X X X X X X X|X X X X|X X X X|X X X X X X X X X X X X X X X X|
  684. |X X X X X X X X| | |X X X X X X X X X X X X X X X X|
  685. |X X X X X X X X+-------+-------+X X X X X X X X X X X X X X X X|
  686. | | | |X X X X X X X X X X X X X X X X|
  687. | | | |X X X X X X X X X X X X X X X X|
  688. | | | |X X X X X X X X X X X X X X X X|
  689. +---------------+-------+-------+X X X X X X X X X X X X X X X X|
  690. | | | |
  691. | | | |
  692. | | | |
  693. | | | |
  694. | | | |
  695. | | | |
  696. | | | |
  697. +-------------------------------+-------------------------------+
  698. ]]></artwork>
  699. <postamble>Apply the filter vertically across the horizontal internal
  700. edge.</postamble>
  701. </figure>
  702. <figure align="left">
  703. <artwork align="left"><![CDATA[
  704. +-------+-------+---------------+-------------------------------+
  705. | | | X X X X|X X X X |
  706. | | | X X X X|X X X X |
  707. | | | X X X X|X X X X |
  708. +-------+-------+ X X X X|X X X X |
  709. | | | X X X X|X X X X |
  710. | | | X X X X|X X X X |
  711. | | | X X X X|X X X X |
  712. +-------+-------+--------X-X-X-X+X X X X |
  713. | | X X X X|X X X X |
  714. | | X X X X|X X X X |
  715. | | X X X X|X X X X |
  716. | | X X X X|X X X X |
  717. | | X X X X|X X X X |
  718. | | X X X X|X X X X |
  719. | | X X X X|X X X X |
  720. +---------------+-------+X-X-X-X+X-X-X-X------------------------+
  721. | | | X X|X X |
  722. | | | X X|X X |
  723. | | | X X|X X |
  724. | +-------+----X-X+X X |
  725. | | | X X|X X |
  726. | | | X X|X X |
  727. | | | X X|X X |
  728. +---------------+-------+----X-X+X X |
  729. | | X X|X X |
  730. | | X X|X X |
  731. | | X X|X X |
  732. | | X X|X X |
  733. | | X X|X X |
  734. | | X X|X X |
  735. | | X X|X X |
  736. +----------------------------X-X+X-X----------------------------+
  737. ]]></artwork>
  738. <postamble>Apply the filter horizontally across the vertical internal
  739. edge.</postamble>
  740. </figure>
  741. <t>
  742. The filters are then applied recursively in this manner to the four quadrants
  743. of the block.
  744. By applying the filters recursively this way, we have prevented any
  745. discontinuities from appearing where block is split but its neighbor is not.
  746. </t>
  747. </section>
  748. <section anchor="fixed_lapping" title="Fixed Sized Lapping">
  749. <t>
  750. One of the challenges using variable sized lapping is that changing the block
  751. size decision (either splitting a block into four blocks a quarter as big, or
  752. merging four blocks into one four times the size) can have an impact on the
  753. coding performance outside the block considered.
  754. This makes computing the optimal block size decision for a frame computationally
  755. difficult as traditional rate-distortion optimization (RDO) algorithms exploit
  756. this locality to iteratively improve an initial decision.
  757. </t>
  758. <t>
  759. One way to simplify the problem is to assume a fixed sized lapping across the
  760. entire image.
  761. If only the 4-point filter is used across block boundaries, then it is
  762. possible to compare the distortion of an 8x8 block with that of four 4x4
  763. blocks by simply computing the mean squared error (MSE) of the 64 spatial
  764. domain coefficients after applying the inverse lapped transform.
  765. Because changing the block size decision, and thus the interior lapping has no
  766. impact on the lap decision on the border of the 8x8 block, then just looking
  767. at the rate and distortion of the interior coefficients is sufficient.
  768. </t>
  769. <t>
  770. This approach has does not leverage the additional coding gain and deblocking
  771. achieved by using larger lapping filters but may make up for this by allowing
  772. computationally cheap block size decision heuristics in real-time encoding
  773. environments.
  774. </t>
  775. </section>
  776. </section>
  777. <section title="IANA Considerations">
  778. <t>
  779. This document has no actions for IANA.
  780. </t>
  781. </section>
  782. <section title="Security Considerations">
  783. <t>
  784. This draft has no security considerations.
  785. </t>
  786. </section>
  787. <section anchor="Acknowledgments" title="Acknowledgments">
  788. <t>
  789. Thanks to Greg Maxwell and Jean-Marc Valin for their assistance in the
  790. experimentation and other valuable contributions to this document.
  791. </t>
  792. </section>
  793. </middle>
  794. <back>
  795. <!--references title="Normative References">
  796. <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>
  797. </references-->
  798. <references title="Informative References">
  799. <reference anchor="DT03">
  800. <front>
  801. <title>Regularity-Constrained Pre- and Post-Filtering for Block DCT Based Systems</title>
  802. <author initials="W." surname="Dai" fullname="Wei Dai"/>
  803. <author initials="T.D." surname="Tran" fullname="Trac D. Tran"/>
  804. <date month="October" year="2003"/>
  805. </front>
  806. <seriesInfo name="IEEE Transactions on Signal Processing" value="51(10):2568--2581"/>
  807. </reference>
  808. <reference anchor="Egge13" target="http://people.xiph.org/~unlord/Daala-Intra.pdf">
  809. <front>
  810. <title>Intra-Prediction in Daala</title>
  811. <author initials="N.E." surname="Egge"/>
  812. <date month="October" year="2013"/>
  813. </front>
  814. </reference>
  815. <reference anchor="Egge15" target="http://people.xiph.org/~unlord/spie_cfl.pdf">
  816. <front>
  817. <title>Predicting Chroma from Luma with Frequency Domain Intra Prediction</title>
  818. <author initials="N.E." surname="Egge"/>
  819. <author initials="J.M." surname="Valin"/>
  820. <date month="February" year="2015"/>
  821. </front>
  822. </reference>
  823. <reference anchor="Malv89" target="http://research.microsoft.com/apps/pubs/default.aspx?id=102073">
  824. <front>
  825. <title>The LOT: Transform Coding Without Blocking Effects</title>
  826. <author initials="H.S." surname="Malvar"/>
  827. <author initials="D.H." surname="Staelin"/>
  828. <date month="April" year="1989"/>
  829. </front>
  830. <seriesInfo name="IEEE Transactions on Acoustics, Speech, and Signal Processing" value=""/>
  831. </reference>
  832. <reference anchor="OP11">
  833. <front>
  834. <title>Intra-Frame Prediction with Lapped Transforms for Image Coding</title>
  835. <author initials="R.G." surname="de Oliveria" fullname="Rafael G. de Oliveria"/>
  836. <author initials="B." surname="Pesquet-Popescu" fullname="Beatrice Pesquet-Popescu"/>
  837. <date month="May" year="2011"/>
  838. </front>
  839. <seriesInfo name="Proc. of the 36th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'11)" value="pp. 805--808"/>
  840. </reference>
  841. <reference anchor="OPT11">
  842. <front>
  843. <title>Inter Prediction Using Lapped Transforms for Advanced Video Coding</title>
  844. <author initials="R.G." surname="de Oliveria" fullname="Rafael G. de Oliveria"/>
  845. <author initials="B." surname="Pesquet-Popescu" fullname="Beatrice Pesquet-Popescu"/>
  846. <author initials="M." surname="Trocan" fullname="Maria Trocan"/>
  847. <date month="September" year="2011"/>
  848. </front>
  849. <seriesInfo name="Proc. of the 18th IEEE International Conference on Image Processing (ICIP'11)" value="pp. 3705--3708"/>
  850. </reference>
  851. <reference anchor="Tran01">
  852. <front>
  853. <title>Lapped Transform via Time-Domain Pre- and Post-Processing</title>
  854. <author initials="T.D." surname="Tran"/>
  855. <date month="October" year="2001"/>
  856. </front>
  857. <seriesInfo name="IEEE Transactions on Signal Processing" value=""/>
  858. </reference>
  859. <reference anchor="TT01">
  860. <front>
  861. <title>Lapped Transform Based Video Coding</title>
  862. <author initials="T.D." surname="Tran" fullname="Trac D. Tran"/>
  863. <author initials="C." surname="Tu" fullname="Chengjie Tu"/>
  864. <date month="July" year="2001"/>
  865. </front>
  866. <seriesInfo name="Proc. of the 24th SPIE Conference on Applications of Digital Image Processing" value="vol. 4472, pp. 319--333"/>
  867. </reference>
  868. <reference anchor="Terr12">
  869. <front>
  870. <title>Introduction to Video Coding Part 1: Transform Coding</title>
  871. <author initials="T.B." surname="Terriberry"/>
  872. <date month="February" year="2012"/>
  873. </front>
  874. </reference>
  875. <reference anchor="Terr15" target="https://people.xiph.org/~tterribe/daala/vbsobmc.pdf">
  876. <front>
  877. <title>Adaptive Motion Compensation Without Blocking Artifacts</title>
  878. <author initials="T.B." surname="Terriberry"/>
  879. <date month="February" year="2015"/>
  880. </front>
  881. </reference>
  882. <reference anchor="Vali15" target="http://jmvalin.ca/video/spie_pvq.pdf">
  883. <front>
  884. <title>Perceptual Vector Quantization for Video Coding</title>
  885. <author initials="J.M." surname="Valin"/>
  886. <date month="February" year="2015"/>
  887. </front>
  888. </reference>
  889. </references>
  890. </back>
  891. </rfc>