123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217 |
- <?xml version="1.0" standalone="no"?>
- <!DOCTYPE appendix PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
- "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
- ]>
- <appendix id="vorbis-over-ogg">
- <appendixinfo>
- <releaseinfo>
- $Id$
- </releaseinfo>
- </appendixinfo>
- <title>Embedding Vorbis into an Ogg stream</title>
- <section>
- <title>Overview</title>
- <para>
- This document describes using Ogg logical and physical transport
- streams to encapsulate Vorbis compressed audio packet data into file
- form.</para>
- <para>
- The <xref linkend="vorbis-spec-intro"/> provides an overview of the construction
- of Vorbis audio packets.</para>
- <para>
- The <ulink url="oggstream.html">Ogg
- bitstream overview</ulink> and <ulink url="framing.html">Ogg logical
- bitstream and framing spec</ulink> provide detailed descriptions of Ogg
- transport streams. This specification document assumes a working
- knowledge of the concepts covered in these named backround
- documents. Please read them first.</para>
- <section><title>Restrictions</title>
- <para>
- The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
- streams use Ogg transport streams in degenerate, unmultiplexed
- form only. That is:
- <itemizedlist>
- <listitem><simpara>
- A meta-headerless Ogg file encapsulates the Vorbis I packets
- </simpara></listitem>
- <listitem><simpara>
- The Ogg stream may be chained, i.e. contain multiple, contigous logical streams (links).
- </simpara></listitem>
- <listitem><simpara>
- The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
- </simpara></listitem>
- </itemizedlist>
- </para>
- <para>
- This is not to say that it is not currently possible to multiplex
- Vorbis with other media types into a multi-stream Ogg file. At the
- time this document was written, Ogg was becoming a popular container
- for low-bitrate movies consisting of DivX video and Vorbis audio.
- However, a 'Vorbis I audio file' is taken to imply Vorbis audio
- existing alone within a degenerate Ogg stream. A compliant 'Vorbis
- audio player' is not required to implement Ogg support beyond the
- specific support of Vorbis within a degenrate Ogg stream (naturally,
- application authors are encouraged to support full multiplexed Ogg
- handling).
- </para>
- </section>
- <section><title>MIME type</title>
- <para>
- The MIME type of Ogg files depend on the context. Specifically, complex
- multimedia and applications should use <literal>application/ogg</literal>,
- while visual media should use <literal>video/ogg</literal>, and audio
- <literal>audio/ogg</literal>. Vorbis data encapsulated in Ogg may appear
- in any of those types. RTP encapsulated Vorbis should use
- <literal>audio/vorbis</literal> + <literal>audio/vorbis-config</literal>.
- </para>
- </section>
- </section>
- <section>
- <title>Encapsulation</title>
- <para>
- Ogg encapsulation of a Vorbis packet stream is straightforward.</para>
- <itemizedlist>
- <listitem><simpara>
- The first Vorbis packet (the identification header), which
- uniquely identifies a stream as Vorbis audio, is placed alone in the
- first page of the logical Ogg stream. This results in a first Ogg
- page of exactly 58 bytes at the very beginning of the logical stream.
- </simpara></listitem>
- <listitem><simpara>
- This first page is marked 'beginning of stream' in the page flags.
- </simpara></listitem>
- <listitem><simpara>
- The second and third vorbis packets (comment and setup
- headers) may span one or more pages beginning on the second page of
- the logical stream. However many pages they span, the third header
- packet finishes the page on which it ends. The next (first audio) packet
- must begin on a fresh page.
- </simpara></listitem>
- <listitem><simpara>
- The granule position of these first pages containing only headers is zero.
- </simpara></listitem>
- <listitem><simpara>
- The first audio packet of the logical stream begins a fresh Ogg page.
- </simpara></listitem>
- <listitem><simpara>
- Packets are placed into ogg pages in order until the end of stream.
- </simpara></listitem>
- <listitem><simpara>
- The last page is marked 'end of stream' in the page flags.
- </simpara></listitem>
- <listitem><simpara>
- Vorbis packets may span page boundaries.
- </simpara></listitem>
- <listitem><simpara>
- The granule position of pages containing Vorbis audio is in units
- of PCM audio samples (per channel; a stereo stream's granule position
- does not increment at twice the speed of a mono stream).
- </simpara></listitem>
- <listitem><simpara>
- The granule position of a page represents the end PCM sample
- position of the last packet <emphasis>completed</emphasis> on that
- page. The 'last PCM sample' is the last complete sample returned by
- decode, not an internal sample awaiting lapping with a
- subsequent block. A page that is entirely spanned by a single
- packet (that completes on a subsequent page) has no granule
- position, and the granule position is set to '-1'. </simpara>
- <simpara>
- Note that the last decoded (fully lapped) PCM sample from a packet
- is not necessarily the middle sample from that block. If, eg, the
- current Vorbis packet encodes a "long block" and the next Vorbis
- packet encodes a "short block", the last decodable sample from the
- current packet be at position (3*long_block_length/4) -
- (short_block_length/4).
- </simpara>
- </listitem>
- <listitem>
- <simpara>
- The granule (PCM) position of the first page need not indicate
- that the stream started at position zero. Although the granule
- position belongs to the last completed packet on the page and a
- valid granule position must be positive, by
- inference it may indicate that the PCM position of the beginning
- of audio is positive or negative.
- </simpara>
-
- <itemizedlist>
- <listitem><simpara>
- A positive starting value simply indicates that this stream begins at
- some positive time offset, potentially within a larger
- program. This is a common case when connecting to the middle
- of broadcast stream.
- </simpara></listitem>
- <listitem><simpara>
- A negative value indicates that
- output samples preceeding time zero should be discarded during
- decoding; this technique is used to allow sample-granularity
- editing of the stream start time of already-encoded Vorbis
- streams. The number of samples to be discarded must not exceed
- the overlap-add span of the first two audio packets.
- </simpara></listitem>
- </itemizedlist>
-
- <simpara>
- In both of these cases in which the initial audio PCM starting
- offset is nonzero, the second finished audio packet must flush the
- page on which it appears and the third packet begin a fresh page.
- This allows the decoder to always be able to perform PCM position
- adjustments before needing to return any PCM data from synthesis,
- resulting in correct positioning information without any aditional
- seeking logic.
- </simpara>
-
- <note><simpara>
- Failure to do so should, at worst, cause a
- decoder implementation to return incorrect positioning information
- for seeking operations at the very beginning of the stream.
- </simpara></note>
- </listitem>
-
- <listitem><simpara>
- A granule position on the final page in a stream that indicates
- less audio data than the final packet would normally return is used to
- end the stream on other than even frame boundaries. The difference
- between the actual available data returned and the declared amount
- indicates how many trailing samples to discard from the decoding
- process.
- </simpara></listitem>
- </itemizedlist>
- </section>
- </appendix>
- <!-- end appendix on Vorbis encapsulation in Ogg -->
|