a1-encapsulation-ogg.tex 6.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186
  1. % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
  2. %!TEX root = Vorbis_I_spec.tex
  3. % $Id$
  4. \section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg}
  5. \subsection{Overview}
  6. This document describes using Ogg logical and physical transport
  7. streams to encapsulate Vorbis compressed audio packet data into file
  8. form.
  9. The \xref{vorbis:spec:intro} provides an overview of the construction
  10. of Vorbis audio packets.
  11. The \href{oggstream.html}{Ogg
  12. bitstream overview} and \href{framing.html}{Ogg logical
  13. bitstream and framing spec} provide detailed descriptions of Ogg
  14. transport streams. This specification document assumes a working
  15. knowledge of the concepts covered in these named backround
  16. documents. Please read them first.
  17. \subsubsection{Restrictions}
  18. The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
  19. streams use Ogg transport streams in degenerate, unmultiplexed
  20. form only. That is:
  21. \begin{itemize}
  22. \item
  23. A meta-headerless Ogg file encapsulates the Vorbis I packets
  24. \item
  25. The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links).
  26. \item
  27. The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
  28. \end{itemize}
  29. This is not to say that it is not currently possible to multiplex
  30. Vorbis with other media types into a multi-stream Ogg file. At the
  31. time this document was written, Ogg was becoming a popular container
  32. for low-bitrate movies consisting of DivX video and Vorbis audio.
  33. However, a 'Vorbis I audio file' is taken to imply Vorbis audio
  34. existing alone within a degenerate Ogg stream. A compliant 'Vorbis
  35. audio player' is not required to implement Ogg support beyond the
  36. specific support of Vorbis within a degenrate Ogg stream (naturally,
  37. application authors are encouraged to support full multiplexed Ogg
  38. handling).
  39. \subsubsection{MIME type}
  40. The MIME type of Ogg files depend on the context. Specifically, complex
  41. multimedia and applications should use \literal{application/ogg},
  42. while visual media should use \literal{video/ogg}, and audio
  43. \literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear
  44. in any of those types. RTP encapsulated Vorbis should use
  45. \literal{audio/vorbis} + \literal{audio/vorbis-config}.
  46. \subsection{Encapsulation}
  47. Ogg encapsulation of a Vorbis packet stream is straightforward.
  48. \begin{itemize}
  49. \item
  50. The first Vorbis packet (the identification header), which
  51. uniquely identifies a stream as Vorbis audio, is placed alone in the
  52. first page of the logical Ogg stream. This results in a first Ogg
  53. page of exactly 58 bytes at the very beginning of the logical stream.
  54. \item
  55. This first page is marked 'beginning of stream' in the page flags.
  56. \item
  57. The second and third vorbis packets (comment and setup
  58. headers) may span one or more pages beginning on the second page of
  59. the logical stream. However many pages they span, the third header
  60. packet finishes the page on which it ends. The next (first audio) packet
  61. must begin on a fresh page.
  62. \item
  63. The granule position of these first pages containing only headers is zero.
  64. \item
  65. The first audio packet of the logical stream begins a fresh Ogg page.
  66. \item
  67. Packets are placed into ogg pages in order until the end of stream.
  68. \item
  69. The last page is marked 'end of stream' in the page flags.
  70. \item
  71. Vorbis packets may span page boundaries.
  72. \item
  73. The granule position of pages containing Vorbis audio is in units
  74. of PCM audio samples (per channel; a stereo stream's granule position
  75. does not increment at twice the speed of a mono stream).
  76. \item
  77. The granule position of a page represents the end PCM sample
  78. position of the last packet \emph{completed} on that
  79. page. The 'last PCM sample' is the last complete sample returned by
  80. decode, not an internal sample awaiting lapping with a
  81. subsequent block. A page that is entirely spanned by a single
  82. packet (that completes on a subsequent page) has no granule
  83. position, and the granule position is set to '-1'.
  84. Note that the last decoded (fully lapped) PCM sample from a packet
  85. is not necessarily the middle sample from that block. If, eg, the
  86. current Vorbis packet encodes a "long block" and the next Vorbis
  87. packet encodes a "short block", the last decodable sample from the
  88. current packet be at position (3*long\_block\_length/4) -
  89. (short\_block\_length/4).
  90. \item
  91. The granule (PCM) position of the first page need not indicate
  92. that the stream started at position zero. Although the granule
  93. position belongs to the last completed packet on the page and a
  94. valid granule position must be positive, by
  95. inference it may indicate that the PCM position of the beginning
  96. of audio is positive or negative.
  97. \begin{itemize}
  98. \item
  99. A positive starting value simply indicates that this stream begins at
  100. some positive time offset, potentially within a larger
  101. program. This is a common case when connecting to the middle
  102. of broadcast stream.
  103. \item
  104. A negative value indicates that
  105. output samples preceeding time zero should be discarded during
  106. decoding; this technique is used to allow sample-granularity
  107. editing of the stream start time of already-encoded Vorbis
  108. streams. The number of samples to be discarded must not exceed
  109. the overlap-add span of the first two audio packets.
  110. \end{itemize}
  111. In both of these cases in which the initial audio PCM starting
  112. offset is nonzero, the second finished audio packet must flush the
  113. page on which it appears and the third packet begin a fresh page.
  114. This allows the decoder to always be able to perform PCM position
  115. adjustments before needing to return any PCM data from synthesis,
  116. resulting in correct positioning information without any aditional
  117. seeking logic.
  118. \begin{note}
  119. Failure to do so should, at worst, cause a
  120. decoder implementation to return incorrect positioning information
  121. for seeking operations at the very beginning of the stream.
  122. \end{note}
  123. \item
  124. A granule position on the final page in a stream that indicates
  125. less audio data than the final packet would normally return is used to
  126. end the stream on other than even frame boundaries. The difference
  127. between the actual available data returned and the declared amount
  128. indicates how many trailing samples to discard from the decoding
  129. process.
  130. \end{itemize}