video_pvq.lyx 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694
  1. #LyX 2.1 created this file. For more info see http://www.lyx.org/
  2. \lyxformat 474
  3. \begin_document
  4. \begin_header
  5. \textclass article
  6. \use_default_options true
  7. \maintain_unincluded_children false
  8. \language english
  9. \language_package default
  10. \inputencoding auto
  11. \fontencoding global
  12. \font_roman default
  13. \font_sans default
  14. \font_typewriter default
  15. \font_math auto
  16. \font_default_family default
  17. \use_non_tex_fonts false
  18. \font_sc false
  19. \font_osf false
  20. \font_sf_scale 100
  21. \font_tt_scale 100
  22. \graphics default
  23. \default_output_format default
  24. \output_sync 0
  25. \bibtex_command default
  26. \index_command default
  27. \paperfontsize default
  28. \use_hyperref false
  29. \papersize default
  30. \use_geometry false
  31. \use_package amsmath 1
  32. \use_package amssymb 1
  33. \use_package cancel 1
  34. \use_package esint 1
  35. \use_package mathdots 1
  36. \use_package mathtools 1
  37. \use_package mhchem 1
  38. \use_package stackrel 1
  39. \use_package stmaryrd 1
  40. \use_package undertilde 1
  41. \cite_engine basic
  42. \cite_engine_type default
  43. \biblio_style plain
  44. \use_bibtopic false
  45. \use_indices false
  46. \paperorientation portrait
  47. \suppress_date false
  48. \justification true
  49. \use_refstyle 1
  50. \index Index
  51. \shortcut idx
  52. \color #008000
  53. \end_index
  54. \secnumdepth 3
  55. \tocdepth 3
  56. \paragraph_separation indent
  57. \paragraph_indentation default
  58. \quotes_language english
  59. \papercolumns 1
  60. \papersides 1
  61. \paperpagestyle default
  62. \tracking_changes false
  63. \output_changes false
  64. \html_math_output 0
  65. \html_css_as_file 0
  66. \html_be_strict false
  67. \end_header
  68. \begin_body
  69. \begin_layout Title
  70. Energy Preservation in PVQ-Based Video Coding
  71. \end_layout
  72. \begin_layout Author
  73. Jean-Marc Valin
  74. \end_layout
  75. \begin_layout Abstract
  76. This paper starts with an introduction and ends with a conclusion
  77. \end_layout
  78. \begin_layout Section
  79. Introduction
  80. \end_layout
  81. \begin_layout Standard
  82. This mini-paper describes a proposal for adapting the CELT energy conservation
  83. principle to video coding based on a pyramid vector quantizer (PVQ).
  84. One potential advantage of conserving energy of the AC coefficients in
  85. video coding is preserving textures rather than low-passing them.
  86. Also, by introducing a fixed-resolution PVQ-type quantizer, we automatically
  87. gain a simple activity masking model.
  88. \end_layout
  89. \begin_layout Standard
  90. The main challenge of adapting this scheme to video is that we have a good
  91. prediction (the reference frame), so we are essentially starting from a
  92. point that is already on the PVQ hyper-sphere, rather than at the origin
  93. like in CELT.
  94. Other challenges are the introduction of a quantization matrix and the
  95. fact that we want the reference (motion predicted) data to perfectly correspond
  96. to one of the entries in our codebook.
  97. \end_layout
  98. \begin_layout Section
  99. Pyramid Vector Quantization and Spherical Quantization
  100. \end_layout
  101. \begin_layout Standard
  102. A pyramid vector quantization (PVQ) codebook [Fisher] of dimension
  103. \begin_inset Formula $N$
  104. \end_inset
  105. is constructed as the sum of
  106. \begin_inset Formula $K$
  107. \end_inset
  108. signed unit pulses:
  109. \begin_inset Formula
  110. \[
  111. \mathbf{y}\in\mathbb{Z}^{N}:\sum_{i=0}^{N-1}\left|y_{i}\right|=K\ .
  112. \]
  113. \end_inset
  114. In the Opus codec [RFC6716], the PVQ codebook is used in the context of
  115. spherical quantization, with the gain encoded separately from a unit-norm
  116. vector derived from a PVQ codeword
  117. \begin_inset Formula $\mathbf{y}/\left\Vert \mathbf{y}\right\Vert $
  118. \end_inset
  119. .
  120. \end_layout
  121. \begin_layout Section
  122. Application to still image coding
  123. \end_layout
  124. \begin_layout Standard
  125. In order to apply PVQ-based gain-shape quantization to DCT coefficients,
  126. we need to first divide the coefficient into
  127. \emph on
  128. bands
  129. \emph default
  130. .
  131. For 4x4 blocks, a single band is used for all AC coefficients, while 8x8
  132. and 16x16 blocks are split into 4 and 7 bands, respectively (Fig.
  133. XX).
  134. \end_layout
  135. \begin_layout Section
  136. Encoder
  137. \end_layout
  138. \begin_layout Standard
  139. Let vector
  140. \begin_inset Formula $\mathbf{x}_{d}$
  141. \end_inset
  142. denote the (pre-normalization) DCT band to be coded in the current block
  143. and vector
  144. \begin_inset Formula $\mathbf{r}_{d}$
  145. \end_inset
  146. denote the corresponding reference after motion compensation, the encoder
  147. computes and encodes the
  148. \begin_inset Quotes eld
  149. \end_inset
  150. band gain
  151. \begin_inset Quotes erd
  152. \end_inset
  153. \begin_inset Formula
  154. \begin{equation}
  155. g=\sqrt{\mathbf{x}_{d}^{T}\mathbf{x}_{d}}\,.\label{eq:band-energy}
  156. \end{equation}
  157. \end_inset
  158. \end_layout
  159. \begin_layout Standard
  160. The normalized band is computed as
  161. \begin_inset Formula
  162. \begin{equation}
  163. \mathbf{x}=\frac{\mathbf{x}_{d}}{g}\,,\label{eq:normalized-x}
  164. \end{equation}
  165. \end_inset
  166. with the normalized reference
  167. \begin_inset Formula $\mathbf{r}$
  168. \end_inset
  169. similarly computed based on
  170. \begin_inset Formula $\mathbf{r}_{d}$
  171. \end_inset
  172. .
  173. The encoder then finds the position and sign of the maximum value in
  174. \begin_inset Formula $\mathbf{r}$
  175. \end_inset
  176. \begin_inset Formula
  177. \begin{align}
  178. m & =\underset{i}{\mathrm{argmax}}\left|r_{i}\right|\label{eq:reflection-argmax}\\
  179. s & =\mathrm{sgn}\left(r_{m}\right)\label{eq:reflection-sign}
  180. \end{align}
  181. \end_inset
  182. and computes the Householder reflection that reflects
  183. \begin_inset Formula $\mathbf{r}$
  184. \end_inset
  185. to
  186. \begin_inset Formula $-s\mathbf{e}_{m}$
  187. \end_inset
  188. .
  189. The reflection vector is given by
  190. \begin_inset Formula
  191. \begin{equation}
  192. \mathbf{v}=\mathbf{r}+s\mathbf{e}_{m}\,.\label{eq:reflection-vector}
  193. \end{equation}
  194. \end_inset
  195. The encoder reflects the normalized band to find
  196. \begin_inset Formula
  197. \begin{equation}
  198. \mathbf{z}=\mathbf{x}-\frac{2}{\mathbf{v}^{T}\mathbf{v}}\mathbf{v}\left(\mathbf{v}^{T}\mathbf{x}\right)\,.\label{eq:reflection}
  199. \end{equation}
  200. \end_inset
  201. \end_layout
  202. \begin_layout Standard
  203. The similarity between the current band and the reference band is represented
  204. by the angle (assuming no quantization)
  205. \end_layout
  206. \begin_layout Standard
  207. \begin_inset Formula
  208. \begin{equation}
  209. \theta=\arccos\frac{-sz_{m}}{\left\Vert \mathbf{z}\right\Vert }\ .\label{eq:unquant-theta}
  210. \end{equation}
  211. \end_inset
  212. Let
  213. \begin_inset Formula $N$
  214. \end_inset
  215. be the number of dimensions in
  216. \begin_inset Formula $\mathbf{x}$
  217. \end_inset
  218. and
  219. \begin_inset Formula $K$
  220. \end_inset
  221. be the number of pulses in our codebooks, we search for the codebook entry
  222. \begin_inset Formula
  223. \begin{equation}
  224. q=\underset{i}{\mathrm{argmax}}\frac{\mathbf{p}_{i}^{T}\left(\mathbf{z}-z_{m}\mathbf{e}_{m}\right)}{\sqrt{\mathbf{p}_{i}^{T}\mathbf{p}_{i}}}\,,\label{eq:quantization}
  225. \end{equation}
  226. \end_inset
  227. where
  228. \begin_inset Formula $\mathbf{p}_{i}$
  229. \end_inset
  230. is the
  231. \begin_inset Formula $i^{th}$
  232. \end_inset
  233. combination of magnitudes and signs that satisfies
  234. \begin_inset Formula $\left\Vert \mathbf{p}_{i}\right\Vert _{L1}=K$
  235. \end_inset
  236. .
  237. \end_layout
  238. \begin_layout Section
  239. Decoder
  240. \end_layout
  241. \begin_layout Standard
  242. The decoder starts by decoding the codebook entry
  243. \begin_inset Formula $\mathbf{p}_{q}$
  244. \end_inset
  245. and uses it to reconstruct the unit-norm reflected band as
  246. \begin_inset Formula
  247. \begin{equation}
  248. \hat{\mathbf{z}}=-s\cos\hat{\theta}\mathbf{e}_{m}+\sin\hat{\theta}\frac{\mathbf{p}_{q}}{\sqrt{\mathbf{p}_{q}^{T}\mathbf{p}_{q}}}\,.\label{eq:reconstruct}
  249. \end{equation}
  250. \end_inset
  251. Because the decoder has access to exactly the same reference as the encoder,
  252. it is able to apply
  253. \begin_inset CommandInset ref
  254. LatexCommand eqref
  255. reference "eq:reflection-argmax"
  256. \end_inset
  257. -
  258. \begin_inset CommandInset ref
  259. LatexCommand eqref
  260. reference "eq:reflection-vector"
  261. \end_inset
  262. to obtain the same
  263. \begin_inset Formula $\mathbf{v}$
  264. \end_inset
  265. as used in the encoder.
  266. The decoded normalized band is
  267. \begin_inset Formula
  268. \begin{equation}
  269. \hat{\mathbf{x}}=\hat{\mathbf{z}}-\frac{2}{\mathbf{v}^{T}\mathbf{v}}\mathbf{v}\left(\mathbf{v}^{T}\hat{\mathbf{x}}\right)\,.\label{eq:decoder-reflection}
  270. \end{equation}
  271. \end_inset
  272. \end_layout
  273. \begin_layout Standard
  274. The renormalized band is computed by taking into account the quantization
  275. resolution:
  276. \begin_inset Formula
  277. \begin{equation}
  278. \hat{\mathbf{x}}_{d}=\hat{g}\hat{\mathbf{x}}\,.\label{eq:decoded-band}
  279. \end{equation}
  280. \end_inset
  281. \end_layout
  282. \begin_layout Section
  283. Coding Resolution
  284. \end_layout
  285. \begin_layout Standard
  286. It is desirable for a single quality parameter to control
  287. \begin_inset Formula $K$
  288. \end_inset
  289. and the resolution of gain and angle.
  290. That quality parameter should also take into account activity masking to
  291. some extent.
  292. According to Jason Garrett-Glaser, x264's activity masking uses a resolution
  293. proportional to
  294. \begin_inset Formula $g^{2\alpha}$
  295. \end_inset
  296. , with
  297. \begin_inset Formula $\alpha=0.173$
  298. \end_inset
  299. .
  300. We can derive a scalar quantizer based on quantization index
  301. \begin_inset Formula $\gamma$
  302. \end_inset
  303. that follows this resolution:
  304. \begin_inset Formula
  305. \begin{align}
  306. \frac{d\hat{g}}{d\hat{\gamma}} & =Q\hat{g}^{2\alpha}\nonumber \\
  307. \hat{g}^{-2\alpha}d\hat{g} & =Qd\hat{\gamma}\nonumber \\
  308. \frac{\hat{g}^{1-2\alpha}}{1-2\alpha} & =Q\hat{\gamma}\nonumber \\
  309. \hat{g} & =\left(\left(1-2\alpha\right)Q\hat{\gamma}\right)^{1/\left(1-2\alpha\right)}\nonumber \\
  310. \hat{g} & =Q_{g}\hat{\gamma}^{1/\left(1-2\alpha\right)}\label{eq:gain-scalar-quantization}
  311. \end{align}
  312. \end_inset
  313. where
  314. \begin_inset Formula $Q_{g}=\left(\left(1-2\alpha\right)Q\right)^{1/\left(1-2\alpha\right)}$
  315. \end_inset
  316. is the companded gain resolution and
  317. \begin_inset Quotes eld
  318. \end_inset
  319. master
  320. \begin_inset Quotes erd
  321. \end_inset
  322. quality parameter.
  323. Let
  324. \begin_inset Formula $\beta=1/\left(1-2\alpha\right)$
  325. \end_inset
  326. , the MSE-optimal angle quantization resolution is the one that matches
  327. the gain resolution.
  328. Considering the distance to be approximately equal to the arc distance,
  329. we have
  330. \begin_inset Formula
  331. \begin{equation}
  332. Q_{\theta}=\frac{d\hat{g}/d\hat{\gamma}}{\hat{g}}=\frac{Q_{g}\beta\hat{\gamma}^{\beta-1}}{Q_{g}\hat{\gamma}^{\beta}}=\frac{\beta}{\hat{\gamma}}\ .\label{eq:theta-quantization-step}
  333. \end{equation}
  334. \end_inset
  335. Alternatively, the number of quantization
  336. \emph on
  337. steps
  338. \emph default
  339. for
  340. \begin_inset Formula $\theta$
  341. \end_inset
  342. is
  343. \begin_inset Formula
  344. \[
  345. S_{\theta}=\frac{\pi\hat{\gamma}}{2\beta}\ .
  346. \]
  347. \end_inset
  348. \end_layout
  349. \begin_layout Subsection
  350. Setting
  351. \begin_inset Formula $K$
  352. \end_inset
  353. \end_layout
  354. \begin_layout Standard
  355. Using an
  356. \emph on
  357. i.i.d.
  358. \emph default
  359. Laplace source normalized to unit norm, we simulated quantization with
  360. different values of
  361. \begin_inset Formula $N$
  362. \end_inset
  363. and
  364. \begin_inset Formula $K$
  365. \end_inset
  366. .
  367. The resulting distortion is shown in Fig.
  368. \begin_inset space ~
  369. \end_inset
  370. \begin_inset CommandInset ref
  371. LatexCommand ref
  372. reference "fig:PVQ-distortion-N-K"
  373. \end_inset
  374. .
  375. The asymptotic distortion for large values of
  376. \begin_inset Formula $K$
  377. \end_inset
  378. is approximately
  379. \begin_inset Formula
  380. \[
  381. D_{pvq}=\frac{\left(N-1\right)^{2}+C_{K}\left(N-1\right)}{24K^{2}}\ ,
  382. \]
  383. \end_inset
  384. with
  385. \begin_inset Formula $C_{K}=4.2.$
  386. \end_inset
  387. \end_layout
  388. \begin_layout Standard
  389. \begin_inset Float figure
  390. wide false
  391. sideways false
  392. status open
  393. \begin_layout Plain Layout
  394. \begin_inset ERT
  395. status open
  396. \begin_layout Plain Layout
  397. \backslash
  398. centering{
  399. \end_layout
  400. \end_inset
  401. \begin_inset Graphics
  402. filename dist_vs_K.eps
  403. lyxscale 50
  404. width 70col%
  405. \end_inset
  406. \begin_inset ERT
  407. status open
  408. \begin_layout Plain Layout
  409. }
  410. \end_layout
  411. \end_inset
  412. \end_layout
  413. \begin_layout Plain Layout
  414. \begin_inset Caption Standard
  415. \begin_layout Plain Layout
  416. PVQ distortion as a function of
  417. \begin_inset Formula $N$
  418. \end_inset
  419. and
  420. \begin_inset Formula $K$
  421. \end_inset
  422. for a Laplace-distributed source
  423. \begin_inset CommandInset label
  424. LatexCommand label
  425. name "fig:PVQ-distortion-N-K"
  426. \end_inset
  427. \end_layout
  428. \end_inset
  429. \end_layout
  430. \end_inset
  431. \end_layout
  432. \begin_layout Standard
  433. The distortion due to scalar quantization of the gain is (asymptotically)
  434. \begin_inset Formula
  435. \begin{align*}
  436. D_{g} & =\frac{1}{12}\left(d\hat{g}/d\hat{\gamma}\right)^{2}\\
  437. & =\frac{\beta^{2}Q_{g}^{2}\hat{\gamma}^{2\beta-2}}{12}\ .
  438. \end{align*}
  439. \end_inset
  440. To achieve uniform distortion along all dimensions, the distortion due to
  441. the
  442. \begin_inset Formula $N-2$
  443. \end_inset
  444. PVQ degrees of freedom must be
  445. \begin_inset Formula $N-2$
  446. \end_inset
  447. times greater than that due to quantizing the gain, so
  448. \end_layout
  449. \begin_layout Standard
  450. \begin_inset Formula
  451. \begin{align*}
  452. \left(N-2\right)D_{g} & =\left(\hat{g}\sin\hat{\theta}\right)^{2}D_{pvq}\\
  453. \frac{\left(N-2\right)\beta^{2}Q_{g}^{2}\hat{\gamma}^{2\beta-2}}{12} & =\left(Q_{g}\hat{\gamma}^{\beta}\sin\hat{\theta}\right)^{2}\frac{\left(N-2\right)^{2}+C_{K}\left(N-2\right)}{24K^{2}}\\
  454. \left(N-2\right)\beta^{2} & =\frac{\hat{\gamma}^{2}\sin^{2}\hat{\theta}\left[\left(N-2\right)^{2}+C_{K}\left(N-2\right)\right]}{2K^{2}}\\
  455. K & =\frac{\hat{\gamma}\sin\hat{\theta}}{\beta}\sqrt{\frac{N+C_{K}-2}{2}}
  456. \end{align*}
  457. \end_inset
  458. In this way, we can avoid having to signal
  459. \begin_inset Formula $K$
  460. \end_inset
  461. because it is determined only by
  462. \begin_inset Formula $\hat{\gamma}$
  463. \end_inset
  464. and
  465. \begin_inset Formula $\hat{\theta}$
  466. \end_inset
  467. , both of which are available to the decoder.
  468. \end_layout
  469. \begin_layout Section
  470. Bi-Prediction
  471. \end_layout
  472. \begin_layout Standard
  473. We can use this scheme for bi-prediction by introducing a second
  474. \begin_inset Formula $\theta$
  475. \end_inset
  476. parameter.
  477. For the case of two (normalized) reference frames
  478. \begin_inset Formula $\mathbf{r}_{1}$
  479. \end_inset
  480. and
  481. \begin_inset Formula $\mathbf{r}_{2}$
  482. \end_inset
  483. , we introduce
  484. \begin_inset Formula $\mathbf{s}_{1}=\left(\mathbf{r}_{1}+\mathbf{r}_{2}\right)/2$
  485. \end_inset
  486. and
  487. \begin_inset Formula $\mathbf{s}_{2}=\left(\mathbf{r}_{1}-\mathbf{r}_{2}\right)/2$
  488. \end_inset
  489. .
  490. We start by using
  491. \begin_inset Formula $\mathbf{s}_{1}$
  492. \end_inset
  493. as a reference, apply the Householder reflection to both
  494. \begin_inset Formula $\mathbf{x}$
  495. \end_inset
  496. and
  497. \begin_inset Formula $\mathbf{s}_{2}$
  498. \end_inset
  499. , and evaluate
  500. \begin_inset Formula $\theta_{1}$
  501. \end_inset
  502. .
  503. From there, we derive a second Householder reflection from the reflected
  504. version of
  505. \begin_inset Formula $\mathbf{s}_{2}$
  506. \end_inset
  507. and apply it to
  508. \begin_inset Formula $\mathbf{x}_{r}$
  509. \end_inset
  510. .
  511. The result is that the
  512. \begin_inset Formula $\theta_{2}$
  513. \end_inset
  514. parameter controls how the current image compares to the two reference
  515. images.
  516. It should even be possible to use this in the case where the two references
  517. are before the frame being encoded, i.e.
  518. P frames based on two parents.
  519. This might help for fades.
  520. \end_layout
  521. \begin_layout Section
  522. Conclusion
  523. \end_layout
  524. \begin_layout Standard
  525. While it seems like a good idea, we're still experimenting with the details.
  526. \end_layout
  527. \end_body
  528. \end_document