123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595 |
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- <html>
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
- <title>Ogg Documentation</title>
- <style type="text/css">
- body {
- margin: 0 18px 0 18px;
- padding-bottom: 30px;
- font-family: Verdana, Arial, Helvetica, sans-serif;
- color: #333333;
- font-size: .8em;
- }
- a {
- color: #3366cc;
- }
- img {
- border: 0;
- }
- #xiphlogo {
- margin: 30px 0 16px 0;
- }
- #content p {
- line-height: 1.4;
- }
- h1, h1 a, h2, h2 a, h3, h3 a {
- font-weight: bold;
- color: #ff9900;
- margin: 1.3em 0 8px 0;
- }
- h1 {
- font-size: 1.3em;
- }
- h2 {
- font-size: 1.2em;
- }
- h3 {
- font-size: 1.1em;
- }
- li {
- line-height: 1.4;
- }
- #copyright {
- margin-top: 30px;
- line-height: 1.5em;
- text-align: center;
- font-size: .8em;
- color: #888888;
- clear: both;
- }
- .caption {
- color: #000000;
- background-color: #aabbff;
- margin: 1em;
- margin-left: 2em;
- margin-right: 2em;
- padding: 1em;
- padding-bottom: 0em;
- overflow: hidden;
- }
- .caption p {
- clear: none;
- }
- .caption img {
- display: block;
- margin: 0px;
- margin-left: auto;
- margin-right: auto;
- margin-bottom: 1.5em;
- background-color: #ffffff;
- padding: 10px;
- }
-
- #thepage {
- margin-left: auto;
- margin-right: auto;
- width: 840px;
- }
- </style>
- </head>
- <body>
- <div id="thepage">
- <div id="xiphlogo">
- <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
- </div>
- <h1>Ogg bitstream overview</h1>
- <p>This document serves as starting point for understanding the design
- and implementation of the Ogg container format. If you're new to Ogg
- or merely want a high-level technical overview, start reading here.
- Other documents linked from the <a href="index.html">index page</a>
- give distilled technical descriptions and references of the container
- mechanisms. This document is intended to aid understanding.
- <h2>Container format design points</h2>
- <p>Ogg is intended to be a simplest-possible container, concerned only
- with framing, ordering, and interleave. It can be used as a stream delivery
- mechanism, for media file storage, or as a building block toward
- implementing a more complex, non-linear container (for example, see
- the <a href="skeleton.html">Skeleton</a> or <a
- href="http://en.wikipedia.org/wiki/Annodex">Annodex/CMML</a>).
- <p>The Ogg container is not intended to be a monolithic
- 'kitchen-sink'. It exists only to frame and deliver in-order stream
- data and as such is vastly simpler than most other containers.
- Elementary and multiplexed streams are both constructed entirely from a
- single building block (an Ogg page) comprised of eight fields
- totalling twenty-eight bytes (the page header) a list of packet lengths
- (up to 255 bytes) and payload data (up to 65025 bytes). The structure
- of every page is the same. There are no optional fields or alternate
- encodings.
- <p>Stream and media metadata is contained in Ogg and not built into
- the Ogg container itself. Metadata is thus compartmentalized and
- layered rather than part of a monolithic design, an especially good
- idea as no two groups seem able to agree on what a complete or
- complete-enough metadata set should be. In this way, the container and
- container implementation are isolated from unnecessary metadata design
- flux.
- <h3>Streaming</h3>
- <p>The Ogg container is primarily a streaming format,
- encapsulating chronological, time-linear mixed media into a single
- delivery stream or file. The design is such that an application can
- always encode and/or decode all features of a bitstream in one pass
- with no seeking and minimal buffering. Seeking to provide optimized
- encoding (such as two-pass encoding) or interactive decoding (such as
- scrubbing or instant replay) is not disallowed or discouraged, however
- no container feature requires nonlinear access of the bitstream.
- <h3>Variable Bit Rate, Variable Payload Size</h3>
- <p>Ogg is designed to contain any size data payload with bounded,
- predictable efficiency. Ogg packets have no maximum size and a
- zero-byte minimum size. There is no restriction on size changes from
- packet to packet. Variable size packets do not require the use of any
- optional or additional container features. There is no optimal
- suggested packet size, though special consideration was paid to make
- sure 50-200 byte packets were no less efficient than larger packet
- sizes. The original design criteria was a 2% overhead at 50 byte
- packets, dropping to a maximum working overhead of 1% with larger
- packets, and a typical working overhead of .5-.7% for most practical
- uses.
- <h3>Simple pagination</h3>
- <p>Ogg is a byte-aligned container with no context-dependent, optional
- or variable-length fields. Ogg requires no repacking of codec data.
- The page structure is written out in-line as packet data is submitted
- to the streaming abstraction. In addition, it is possible to
- implement both Ogg mux and demux as MT-hot zero-copy abstractions (as
- is done in the Tremor sourcebase).
- <h3>Capture</h3>
- <p>Ogg is designed for efficient and immediate stream capture with
- high confidence. Although packets have no size limit in Ogg, pages
- are a maximum of just under 64kB meaning that any Ogg stream can be
- captured with confidence after seeing 128kB of data or less [worst
- case; typical figure is 6kB] from any random starting point in the
- stream.
- <h3>Seeking</h3>
- <p>Ogg implements simple coarse- and fine-grained seeking by design.
- <p>Coarse seeking may be performed by simply 'moving the tone arm' to a
- new position and 'dropping the needle'. Rapid capture with
- accompanying timecode from any location in an Ogg file is guaranteed
- by the stream design. From the acquisition of the first timecode,
- all data needed to play back from that time code forward is ahead of
- the stream cursor.
- <p>Ogg implements full sample-granularity seeking using an
- interpolated bisection search built on the capture and timecode
- mechanisms used by coarse seeking. As above, once a search finds
- the desired timecode, all data needed to play back from that time code
- forward is ahead of the stream cursor.
- <p>Both coarse and fine seeking use the page structure and sequencing
- inherent to the Ogg format. All Ogg streams are fully seekable from
- creation; seekability is unaffected by truncation or missing data, and
- is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor
- heuristic.
- <p>Seeking without use of an index is a major point of the Ogg
- design. There two primary reasons why Ogg transport forgoes an index:
-
- <ol>
- <li>An index is only marginally useful in Ogg for the complexity
- added; it adds no new functionality and seldom improves performance
- noticeably. Empirical testing shows that indexless interpolation
- search does not require many more seeks in practice than using an
- index would.
- <li>'Optional' indexes encourage lazy implementations that can seek
- only when indexes are present, or that implement indexless seeking
- only by building an internal index after reading the entire file
- beginning to end. This has been the fate of other containers that
- specify optional indexing.
- </ol>
- <p>In addition, it must be possible to create an Ogg stream in a
- single pass. Although an optional index can simply be tacked on the
- end of the created stream, some software groups object to
- end-positioned indexes and claim to be unwilling to support indexes
- not located at the stream beginning.
- <p><i>All this said, it's become clear that an optional index is a
- demanded feature. For this reason, the <a
- href="http://wiki.xiph.org/Ogg_Index">OggSkeleton now defines a
- proposed index.</a></i>
- <h3>Simple multiplexing</h3>
- <p>Ogg multiplexes streams by interleaving pages from multiple elementary streams into a
- multiplexed stream in time order. The multiplexed pages are not
- altered. Muxing an Ogg AV stream out of separate audio,
- video and data streams is akin to shuffling several decks of cards
- together into a single deck; the cards themselves remain unchanged.
- Demultiplexing is similarly simple (as the cards are marked).
- <p>The goal of this design is to make the mux/demux operation as
- trivial as possible to allow live streaming systems to build and
- rebuild streams on the fly with minimal CPU usage and no additional
- storage or latency requirements.
- <h3>Continuous and Discontinuous Media</h3>
- <p>Ogg streams belong to one of two categories, "Continuous" streams and
- "Discontinuous" streams.
- <p>A stream that provides a gapless, time-continuous media type with a
- fine-grained timebase is considered to be 'Continuous'. A continuous
- stream should never be starved of data. Examples of continuous data
- types include broadcast audio and video.
- <p>A stream that delivers data in a potentially irregular pattern or
- with widely spaced timing gaps is considered to be 'Discontinuous'. A
- discontinuous stream may be best thought of as data representing
- scattered events; although they happen in order, they are typically
- unconnected data often located far apart. One example of a
- discontinuous stream types would be captioning such as <a
- href="http://wiki.xiph.org/OggKate">Ogg Kate</a>. Although it's
- possible to design captions as a continuous stream type, it's most
- natural to think of captions as widely spaced pieces of text with
- little happening between.
- <p>The fundamental reason for distinction between continuous and
- discontinuous streams concerns buffering.
- <h3>Buffering</h3>
- <p>A continuous stream is, by definition, gapless. Ogg buffering is based
- on the simple premise of never allowing an active continuous stream
- to starve for data during decode; buffering works ahead until all
- continuous streams in a physical stream have data ready and no further.
- <p>Discontinuous stream data is not assumed to be predictable. The
- buffering design takes discontinuous data 'as it comes' rather than
- working ahead to look for future discontinuous data for a potentially
- unbounded period. Thus, the buffering process makes no attempt to fill
- discontinuous stream buffers; their pages simply 'fall out' of the
- stream when continuous streams are handled properly.
- <p>Buffering requirements in this design need not be explicitly
- declared or managed in the encoded stream. The decoder simply reads as
- much data as is necessary to keep all continuous stream types gapless
- and no more, with discontinuous data processed as it arrives in the
- continuous data. Buffering is implicitly optimal for the given
- stream. Because all pages of all data types are stamped with absolute
- timing information within the stream, inter-stream synchronization
- timing is always maintained without the need for explicitly declared
- buffer-ahead hinting.
- <h3>Codec metadata</h3>
- <p>Ogg does not replicate codec-specific metadata into the mux layer
- in an attempt to make the mux and codec layer implementations 'fully
- separable'. Things like specific timebase, keyframing strategy, frame
- duration, etc, do not appear in the Ogg container. The mux layer is,
- instead, expected to query a codec through a centralized interface,
- left to the implementation, for this data when it is needed.
- <p>Though modern design wisdom usually prefers to predict all possible
- needs of current and future codecs then embed these dependencies and
- the required metadata into the container itself, this strategy
- increases container specification complexity, fragility, and rigidity.
- The mux and codec code becomes more independent, but the
- specifications become logically less independent. A codec can't do
- what a container hasn't already provided for. Novel codecs are harder
- to support, and you can do fewer useful things with the ones you've
- already got (eg, try to make a good splitter without using any codecs.
- Such a splitter is limited to splitting at keyframes only, or building
- yet another new mechanism into the container layer to mark what frames
- to skip displaying).
- <p>Ogg's design goes the opposite direction, where the specification
- is to be as simple, easy to understand, and 'proofed' against novel
- codecs as possible. When an Ogg mux layer requires codec-specific
- information, it queries the codec (or a codec stub). This trades a
- more complex implementation for a simpler, more flexible
- specification.
- <h3>Stream structure metadata</h3>
- <p>The Ogg container itself does not define a metadata system for
- declaring the structure and interrelations between multiple media
- types in a muxed stream. That is, the Ogg container itself does not
- specify data like 'which steam is the subtitle stream?' or 'which
- video stream is the primary angle?'. This metadata still exists, but
- is stored by the Ogg container rather than being built into the Ogg
- container itself. Xiph specifies the 'Skeleton' metadata format for Ogg
- streams, but this decoupling of container and stream structure
- metadata means it is possible to use Ogg with any metadata
- specification without altering the container itself, or without stream
- structure metadata at all.
- <h3>Frame accurate absolute position</h3>
- <p>Every Ogg page is stamped with a 64 bit 'granule position' that
- serves as an absolute timestamp for mux and seeking. A few nifty
- little tricks are usually also embedded in the granpos state, but
- we'll leave those aside for the moment (strictly speaking, they're
- part of each codec's mapping, not Ogg).
- <p>As previously mentioned above, granule positions are mapped into
- absolute timestamps by the codec, rather than being a hard timestamp.
- This allows maximally efficient use of the available 64 bits to
- address every sample/frame position without approximation while
- supporting new and previously unknown timebase encodings without
- needing to extend or update the mux layer. When a codec needs a novel
- timebase, it simply brings the code for that mapping along with it.
- This is not a theoretical curiosity; new, wholly novel timebases were
- deployed with the adoption of both Theora and Dirac. "Rolling INTRA"
- (keyframeless video) also benefits from novel use of the granule
- position.
- <h2>Ogg stream arrangement</h2>
- <h3>Packets, pages, and bitstreams</h3>
- <p>Ogg codecs place raw compressed data into <em>packets</em>.
- Packets are octet payloads containing the data needed for a single
- decompressed unit, eg, one video frame. Packets have no maximum size
- and may be zero length. They do not generally have any framing
- information; strung together, the unframed packets form a <em>logical
- bitstream</em> of codec data with no internal landmarks.
- <div class="caption">
- <img src="packets.png">
- <p> Packets of raw codec data are not typically internally framed.
- When they are strung together into a stream without any container to
- provide framing, they lose their individual boundaries. Seek and
- capture are not possible within an unframed stream, and for many
- codecs with variable length payloads and/or early-packet termination
- (such as Vorbis), it may become impossible to recover the original
- frame boundaries even if the stream is scanned linearly from
- beginning to end.
- </div>
- <p>Logical bitstream packets are grouped and framed into Ogg pages
- along with a unique stream <em>serial number</em> to produce a
- <em>physical bitstream</em>. An <em>elementary stream</em> is a
- physical bitstream containing only a single logical bitstream. Each
- page is a self contained entity, although a packet may be split and
- encoded across one or more pages. The page decode mechanism is
- designed to recognize, verify and handle single pages at a time from
- the overall bitstream.
- <div class="caption">
- <img src="pages.png">
- <p> The primary purpose of a container is to provide framing for raw
- packets, marking the packet boundaries so the exact packets can be
- retrieved for decode later. The container also provides secondary
- functions such as capture, timestamping, sequencing, stream
- identification and so on. Not all of these functions are represented in the diagram.
- <p>In the Ogg container, pages do not necessarily contain
- integer numbers of packets. Packets may span across page boundaries
- or even multiple pages. This is necessary as pages have a maximum
- possible size in order to provide capture guarantees, but packet
- size is unbounded.
- </div>
- <p><a href="framing.html">Ogg Bitstream Framing</a> specifies
- the page format of an Ogg bitstream, the packet coding process
- and elementary bitstreams in detail.
- <h3>Multiplexed bitstreams</h3>
- <p>Multiple logical/elementary bitstreams can be combined into a single
- <em>multiplexed bitstream</em> by interleaving whole pages from each
- contributing elementary stream in time order. The result is a single
- physical stream that multiplexes and frames multiple logical streams.
- Each logical stream is identified by the unique stream serial number
- stamped in its pages. A physical stream may include a 'meta-header'
- (such as the <a href="skeleton.html">Ogg Skeleton</a>) comprising its
- own Ogg page at the beginning of the physical stream. A decoder
- recovers the original logical/elementary bitstreams out of the
- physical bitstream by taking the pages in order from the physical
- bitstream and redirecting them into the appropriate logical decoding
- entity.
- <div class="caption">
- <img src="multiplex1.png">
- <p>Multiple media types are mutliplexed into a single Ogg stream by
- interleaving the pages from each elementary physical stream.
- </div>
- <p><a href="ogg-multiplex.html">Ogg Bitstream Multiplexing</a> specifies
- proper multiplexing of an Ogg bitstream in detail.
- <h3>Chaining</h3>
- <p>Multiple Ogg physical bitstreams may be concatenated into a single new
- stream; this is <em>chaining</em>. The bitstreams do not overlap; the
- final page of a given logical bitstream is immediately followed by the
- initial page of the next.</p>
- <p>Each logical bitstream in a chain must have a unique serial number
- within the scope of the full physical bitstream, not only within a
- particular <em>link</em> or <em>segment</em> of the chain.</p>
- <h3>Continuous and discontinuous streams</h3>
- <p>Within Ogg, each stream must be declared (by the codec) to be
- continuous- or discontinuous-time. Most codecs treat all streams they
- use as either inherently continuous- or discontinuous-time, although
- this is not a requirement. A codec may, as part of its mapping, choose
- according to data in the initial header.
- <p>Continuous-time pages are stamped by end-time, discontinuous pages
- are stamped by begin-time. Pages in a multiplexed stream are
- interleaved in order of the time stamp regardless of stream type.
- Both continuous and discontinuous logical streams are used to seek
- within a physical stream, however only continuous streams are used to
- determine buffering depth; because discontinuous streams are stamped
- by start time, they will always 'fall out' at the proper time when
- buffering the continuous streams. See 'Examples' for an illustration
- of the buffering mechanism.
- <h2>Multiplexing Requirements</h2>
- <p>Multiplexing requirements within Ogg are straightforward. When
- constructing a single-link (unchained) physical bitstream consisting
- of multiple elementary streams:
- <ol>
- <li><p> The initial header for each stream appears in sequence, each
- header on a single page. All initial headers must appear with no
- intervening data (no auxiliary header pages or packets, no data pages
- or packets). Order of the initial headers is unspecified. The
- 'beginning of stream' flag is set on each initial header.
- <li><p> All auxiliary headers for all streams must follow. Order
- is unspecified. The final auxiliary header of each stream must flush
- its page.
- <li><p>Data pages for each stream follow, interleaved in time order.
- <li><p>The final page of each stream sets the 'end of stream' flag.
- Unlike initial pages, terminal pages for the logical bitstreams need
- not occur contiguously; indeed it may not be possible for them to do so.
- </oL>
- <p><p>Each grouped bitstream must have a unique serial number within the
- scope of the physical bitstream.</p>
- <h3>chaining and multiplexing</h3>
- <p>Multiplexed and/or unmultiplexed bitstreams may be chained
- consecutively. Such a physical bitstream obeys all the rules of both
- chained and multiplexed streams. Each link, when unchained, must
- stand on its own as a valid physical bitstream. Chained streams do
- not mix or interleave; a new segment may not begin until all streams
- in the preceding segment have terminated. </p>
- <h2>Codec Mapping Requirements</h2>
- <p>Each codec is allowed some freedom in deciding how its logical
- bitstream is encapsulated into an Ogg bitstream (even if it is a
- trivial mapping, eg, 'plop the packets in and go'). This is the
- codec's <em>mapping</em>. Ogg imposes a few mapping requirements
- on any codec.
- <ol>
- <li><p>The <a href="framing.html">framing specification</a> defines
- 'beginning of stream' and 'end of stream' page markers via a header
- flag (it is possible for a stream to consist of a single page). A
- correct stream always consists of an integer number of pages, an easy
- requirement given the variable size nature of pages.</p>
- <li><p>The first page of an elementary Ogg bitstream consists of a single,
- small 'initial header' packet that must include sufficient information
- to identify the exact CODEC type. From this initial header, the codec
- must also be able to determine its timebase and whether or not it is a
- continuous- or discontinuous-time stream. The initial header must fit
- on a single page. If a codec makes use of auxiliary headers (for
- example, Vorbis uses two auxiliary headers), these headers must follow
- the initial header immediately. The last header finishes its page;
- data begins on a fresh page.
- <p><p>As an example, Ogg Vorbis places the name and revision of the
- Vorbis CODEC, the audio rate and the audio quality into this initial
- header. Vorbis comments and detailed codec setup appears in the larger
- auxiliary headers.</p>
- <li><p>Granule positions must be translatable to an exact absolute
- time value. As described above, the mux layer is permitted to query a
- codec or codec stub plugin to perform this mapping. It is not
- necessary for an absolute time to be mappable into a single unique
- granule position value.
- <li><p>Codecs are not required to use a fixed duration-per-packet (for
- example, Vorbis does not). the mux layer is permitted to query a
- codec or codec stub plugin for the time duration of a packet.
- <li><p>Although an absolute time need not be translatable to a unique
- granule position, a codec must be able to determine the unique granule
- position of the current packet using the granule position of a
- preceeding packet.
- <li><p>Packets and pages must be arranged in ascending
- granule-position and time order.
- </ol>
- <h2>Examples</h2>
- <em>[More to come shortly; this section is currently being revised and expanded]</em>
- <p>Below, we present an example of a multiplexed and chained bitstream:</p>
- <p><img src="stream.png" alt="stream"/></p>
- <p>In this example, we see pages from five total logical bitstreams
- multiplexed into a physical bitstream. Note the following
- characteristics:</p>
- <ol>
- <li>Multiplexed bitstreams in a given link begin together; all of the
- initial pages must appear before any data pages. When concurrently
- multiplexed groups are chained, the new group does not begin until all
- the bitstreams in the previous group have terminated.</li>
- <li>The ordering of pages of concurrently multiplexed bitstreams is
- goverened by timestamp (not shown here); there is no regular
- interleaving order. Pages within a logical bitstream appear in
- sequence order.</li>
- </ol>
- <div id="copyright">
- The Xiph Fish Logo is a
- trademark (™) of Xiph.Org.<br/>
- These pages © 1994 - 2010 Xiph.Org. All rights reserved.
- </div>
- </div>
- </body>
- </html>
|