123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186 |
- % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
- %!TEX root = Vorbis_I_spec.tex
- % $Id$
- \section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg}
- \subsection{Overview}
- This document describes using Ogg logical and physical transport
- streams to encapsulate Vorbis compressed audio packet data into file
- form.
- The \xref{vorbis:spec:intro} provides an overview of the construction
- of Vorbis audio packets.
- The \href{oggstream.html}{Ogg
- bitstream overview} and \href{framing.html}{Ogg logical
- bitstream and framing spec} provide detailed descriptions of Ogg
- transport streams. This specification document assumes a working
- knowledge of the concepts covered in these named backround
- documents. Please read them first.
- \subsubsection{Restrictions}
- The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
- streams use Ogg transport streams in degenerate, unmultiplexed
- form only. That is:
- \begin{itemize}
- \item
- A meta-headerless Ogg file encapsulates the Vorbis I packets
- \item
- The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links).
- \item
- The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
- \end{itemize}
- This is not to say that it is not currently possible to multiplex
- Vorbis with other media types into a multi-stream Ogg file. At the
- time this document was written, Ogg was becoming a popular container
- for low-bitrate movies consisting of DivX video and Vorbis audio.
- However, a 'Vorbis I audio file' is taken to imply Vorbis audio
- existing alone within a degenerate Ogg stream. A compliant 'Vorbis
- audio player' is not required to implement Ogg support beyond the
- specific support of Vorbis within a degenrate Ogg stream (naturally,
- application authors are encouraged to support full multiplexed Ogg
- handling).
- \subsubsection{MIME type}
- The MIME type of Ogg files depend on the context. Specifically, complex
- multimedia and applications should use \literal{application/ogg},
- while visual media should use \literal{video/ogg}, and audio
- \literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear
- in any of those types. RTP encapsulated Vorbis should use
- \literal{audio/vorbis} + \literal{audio/vorbis-config}.
- \subsection{Encapsulation}
- Ogg encapsulation of a Vorbis packet stream is straightforward.
- \begin{itemize}
- \item
- The first Vorbis packet (the identification header), which
- uniquely identifies a stream as Vorbis audio, is placed alone in the
- first page of the logical Ogg stream. This results in a first Ogg
- page of exactly 58 bytes at the very beginning of the logical stream.
- \item
- This first page is marked 'beginning of stream' in the page flags.
- \item
- The second and third vorbis packets (comment and setup
- headers) may span one or more pages beginning on the second page of
- the logical stream. However many pages they span, the third header
- packet finishes the page on which it ends. The next (first audio) packet
- must begin on a fresh page.
- \item
- The granule position of these first pages containing only headers is zero.
- \item
- The first audio packet of the logical stream begins a fresh Ogg page.
- \item
- Packets are placed into ogg pages in order until the end of stream.
- \item
- The last page is marked 'end of stream' in the page flags.
- \item
- Vorbis packets may span page boundaries.
- \item
- The granule position of pages containing Vorbis audio is in units
- of PCM audio samples (per channel; a stereo stream's granule position
- does not increment at twice the speed of a mono stream).
- \item
- The granule position of a page represents the end PCM sample
- position of the last packet \emph{completed} on that
- page. The 'last PCM sample' is the last complete sample returned by
- decode, not an internal sample awaiting lapping with a
- subsequent block. A page that is entirely spanned by a single
- packet (that completes on a subsequent page) has no granule
- position, and the granule position is set to '-1'.
- Note that the last decoded (fully lapped) PCM sample from a packet
- is not necessarily the middle sample from that block. If, eg, the
- current Vorbis packet encodes a "long block" and the next Vorbis
- packet encodes a "short block", the last decodable sample from the
- current packet be at position (3*long\_block\_length/4) -
- (short\_block\_length/4).
- \item
- The granule (PCM) position of the first page need not indicate
- that the stream started at position zero. Although the granule
- position belongs to the last completed packet on the page and a
- valid granule position must be positive, by
- inference it may indicate that the PCM position of the beginning
- of audio is positive or negative.
- \begin{itemize}
- \item
- A positive starting value simply indicates that this stream begins at
- some positive time offset, potentially within a larger
- program. This is a common case when connecting to the middle
- of broadcast stream.
- \item
- A negative value indicates that
- output samples preceeding time zero should be discarded during
- decoding; this technique is used to allow sample-granularity
- editing of the stream start time of already-encoded Vorbis
- streams. The number of samples to be discarded must not exceed
- the overlap-add span of the first two audio packets.
- \end{itemize}
- In both of these cases in which the initial audio PCM starting
- offset is nonzero, the second finished audio packet must flush the
- page on which it appears and the third packet begin a fresh page.
- This allows the decoder to always be able to perform PCM position
- adjustments before needing to return any PCM data from synthesis,
- resulting in correct positioning information without any aditional
- seeking logic.
- \begin{note}
- Failure to do so should, at worst, cause a
- decoder implementation to return incorrect positioning information
- for seeking operations at the very beginning of the stream.
- \end{note}
- \item
- A granule position on the final page in a stream that indicates
- less audio data than the final packet would normally return is used to
- end the stream on other than even frame boundaries. The difference
- between the actual available data returned and the declared amount
- indicates how many trailing samples to discard from the decoding
- process.
- \end{itemize}
|