123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385 |
- <HTML><HEAD><TITLE>xiph.org: Ogg Vorbis documentation</TITLE>
- <BODY bgcolor="#ffffff" text="#202020" link="#006666" vlink="#000000">
- <nobr><a href="vorbis.html"><img src="white-ogg.png" border=0><img
- src="vorbisword2.png" border=0></a></nobr><p>
- <h1><font color=#000070>
- Ogg logical bitstream framing
- </font></h1>
- <em>Last update to this document: July 15, 1999</em><br>
- <h2>Ogg bitstreams</h2>
- Vorbis encodes short-time blocks of PCM data into raw packets of
- bit-packed data. These raw packets may be used directly by transport
- mechanisms that provide their own framing and packet-seperation
- mechanisms (such as UDP datagrams). For stream based storage (such as
- files) and transport (such as TCP streams or pipes), Vorbis uses the
- Ogg bitstream format to provide framing/sync, sync recapture
- after error, landmarks during seeking, and enough information to
- properly seperate data back into packets at the original packet
- boundaries without relying on decoding to find packet boundaries.<p>
- <h2>Design constraints for Ogg bitstreams</h2>
- <ol><li>True streaming; we must not need to seek to build a 100%
- complete bitstream.
- <li> Use no more than approximately 1-2% of bitstream bandwidth for
- packet boundary marking, high-level framing, sync and seeking.
- <li> Specification of absolute position within the original sample
- stream.
- <li> Simple mechanism to ease limited editing, such as a simplified
- concatenation mechanism.
- <li> Detection of corruption, recapture after error and direct, random
- access to data at arbitrary positions in the bitstream.
- </ol>
- <h2>Logical and Physical Bitstreams</h2>
- A <em>logical</em> Ogg bitstream is a contiguous stream of
- sequential pages belonging only to the logical bitstream. A
- <em>physical</em> Ogg bitstream is constructed from one or more
- than one logical Ogg bitstream (the simplest physical bitstream
- is simply a single logical bitstream). We describe below the exact
- formatting of an Ogg logical bitstream. Combining logical
- bitstreams into more complex physical bitstreams is described in the
- <a href="oggstream.html">Ogg bitstream overview</a>. The exact
- mapping of raw Vorbis packets into a valid Ogg Vorbis physical
- bitstream is described in <a href="vorbis-stream.html">Vorbis
- bitstream mapping</a>.
- <h2>Bitstream structure</h2>
- An Ogg stream is structured by dividing incoming packets into
- segments of up to 255 bytes and then wrapping a group of contiguous
- packet segments into a variable length page preceeded by a page
- header. Both the header size and page size are variable; the page
- header contains sizing information and checksum data to determine
- header/page size and data integrity.<p>
- The bitstream is captured (or recaptured) by looking for the beginning
- of a page, specifically the capture pattern. Once the capture pattern
- is found, the decoder verifies page sync and integrity by computing
- and comparing the checksum. At that point, the decoder can extract the
- packets themselves.<p>
- <h3>Packet segmentation</h3>
- Packets are logically divided into multiple segments before encoding
- into a page. Note that the segmentation and fragmentation process is a
- logical one; it's used to compute page header values and the original
- page data need not be disturbed, even when a packet spans page
- boundaries.<p>
- The raw packet is logically divided into [n] 255 byte segments and a
- last fractional segment of < 255 bytes. A packet size may well
- consist only of the trailing fractional segment, and a fractional
- segment may be zero length. These values, called "lacing values" are
- then saved and placed into the header segment table.<p>
- An example should make the basic concept clear:<p>
- <pre>
- <tt>
- raw packet:
- ___________________________________________
- |______________packet data__________________| 753 bytes
- lacing values for page header segment table: 255,255,243
- </tt>
- </pre>
- We simply add the lacing values for the total size; the last lacing
- value for a packet is always the value that is less than 255. Note
- that this encoding both avoids imposing a maximum packet size as well
- as imposing minimum overhead on small packets (as opposed to, eg,
- simply using two bytes at the head of every packet and having a max
- packet size of 32k. Small packets (<255, the typical case) are
- penalized with twice the segmentation overhead). Using the lacing
- values as suggested, small packets see the minimum possible
- byte-aligned overheade (1 byte) and large packets, over 512 bytes or
- so, see a fairly constant ~.5% overhead on encoding space.<p>
- Note that a lacing value of 255 implies that a second lacing value
- follows in the packet, and a value of < 255 marks the end of the
- packet after that many additional bytes. A packet of 255 bytes (or a
- multiple of 255 bytes) is terminated by a lacing value of 0:<p>
- <pre><tt>
- raw packet:
- _______________________________
- |________packet data____________| 255 bytes
- lacing values: 255, 0
- </tt></pre>
- Note also that a 'nil' (zero length) packet is not an error; it
- consists of nothing more than a lacing value of zero in the header.<p>
- <h3>Packets spanning pages</h3>
- Packets are not resticted to beginning and ending within a page,
- although individual segments are, by definition, required to do so.
- Packets are not restricted to a maximum size, although excessively
- large packets in the data stream are discouraged; the Ogg
- bitstream specification strongly recommends nominal page size of
- approximately 4-8kB (large packets are forseen as being useful for
- initialization data at the beginning of a logical bitstream).<p>
- After segmenting a packet, the encoder may decide not to place all the
- resulting segments into the current page; to do so, the encoder places
- the lacing values of the segments it wishes to belong to the current
- page into the current segment table, then finishes the page. The next
- page is begun with the first value in the segment table belonging to
- the next packet segment, thus continuing the packet (data in the
- packet body must also correspond properly to the lacing values in the
- spanned pages. The segment data in the first packet corresponding to
- the lacing values of the first page belong in that page; packet
- segments listed in the segment table of the following page must begin
- the page body of the subsequent page).<p>
- The last mechanic to spanning a page boundary is to set the header
- flag in the new page to indicate that the first lacing value in the
- segment table continues rather than begins a packet; a header flag of
- 0x01 is set to indicate a continued packet. Although mandatory, it
- is not actually algorithmically necessary; one could inspect the
- preceeding segment table to determine if the packet is new or
- continued. Adding the information to the packet_header flag allows a
- simpler design (with no overhead) that needs only inspect the current
- page header after frame capture. This also allows faster error
- recovery in the event that the packet originates in a corrupt
- preceeding page, implying that the previous page's segment table
- cannot be trusted.<p>
- Note that a packet can span an arbitrary number of pages; the above
- spanning process is repeated for each spanned page boundary. Also a
- 'zero termination' on a packet size that is an even multiple of 255
- must appear even if the lacing value appears in the next page as a
- zero-length continuation of the current packet. The header flag
- should be set to 0x01 to indicate that the packet spanned, even though
- the span is a nil case as far as data is concerned.<p>
- The encoding looks odd, but is properly optimized for speed and the
- expected case of the majority of packets being between 50 and 200
- bytes (note that it is designed such that packets of wildly different
- sizes can be handled within the model; placing packet size
- restrictions on the encoder would have only slightly simplified design
- in page generation and increased overall encoder complexity).<p>
- The main point behind tracking individual packets (and packet
- segments) is to allow more flexible encoding tricks that requiring
- explicit knowledge of packet size. An example is simple bandwidth
- limiting, implemented by simply truncating packets in the nominal case
- if the packet is arranged so that the least sensitive portion of the
- data comes last.<p>
- <h3>Page header</h3>
- The headering mechanism is designed to avoid copying and re-assembly
- of the packet data (ie, making the packet segmentation process a
- logical one); the header can be generated directly from incoming
- packet data. The encoder buffers packet data until it finishes a
- complete page at which point it writes the header followed by the
- buffered packet segments.<p>
- <h4>capture_pattern</h4>
- A header begins with a capture pattern that simplifies identifying
- pages; once the decoder has found the capture pattern it can do a more
- intensive job of verifying that it has in fact found a page boundary
- (as opposed to an inadvertant coincidence in the byte stream).<p>
- <pre><tt>
- byte value
- 0 0x4f 'O'
- 1 0x67 'g'
- 2 0x67 'g'
- 3 0x53 'S'
- </tt></pre>
- <h4>stream_structure_version</h4>
- The capture pattern is followed by the stream structure revision:
- <pre><tt>
- byte value
- 4 0x00
- </tt></pre>
-
- <h4>header_type_flag</h4>
-
- The header type flag identifies this page's context in the bitstream:
- <pre><tt>
- byte value
- 5 bitflags: 0x01: unset = fresh packet
- set = continued packet
- 0x02: unset = not first page of logical bitstream
- set = first page of logical bitstream (bos)
- 0x04: unset = not last page of logical bitstream
- set = last page of logical bitstream (eos)
- </tt></pre>
- <h4>PCM absolute position</h4>
- (This is packed in the same way the rest of Ogg data is packed;
- LSb of LSB first. Note that the 'position' data specifies a 'sample'
- number (eg, in a CD quality sample is four octets, 16 bits for left
- and 16 bits for right; in video it would be the frame number). The
- position specified is the total samples encoded after including all
- packets finished on this page (packets begun on this page but
- continuing on to thenext page do not count). The rationale here is
- that the position specified in the frame header of the last page
- tells how long the PCM data coded by the bitstream is. A truncated
- stream will still return the proper number of samples that can be
- decoded fully.
- <pre><tt>
- byte value
- 6 0xXX LSB
- 7 0xXX
- 8 0xXX
- 9 0xXX
- 10 0xXX
- 11 0xXX
- 12 0xXX
- 13 0xXX MSB
- </tt></pre>
- <h4>stream serial number</h4>
-
- Ogg allows for seperate logical bitstreams to be mixed at page
- granularity in a physical bitstream. The most common case would be
- sequential arrangement, but it is possible to interleave pages for
- two seperate bitstreams to be decoded concurrently. The serial
- number is the means by which pages physical pages are associated with
- a particular logical stream. Each logical stream must have a unique
- serial number within a physical stream:
- <pre><tt>
- byte value
- 14 0xXX LSB
- 15 0xXX
- 16 0xXX
- 17 0xXX MSB
- </tt></pre>
- <h4>page sequence no</h4>
- Page counter; lets us know if a page is lost (useful where packets
- span page boundaries).
- <pre><tt>
- byte value
- 18 0xXX LSB
- 19 0xXX
- 20 0xXX
- 21 0xXX MSB
- </tt></pre>
- <h4>page checksum</h4>
-
- 32 bit CRC value (direct algorithm, initial val and final XOR = 0,
- generator polynomial=0x04c11db7). The value is computed over the
- entire header (with the CRC field in the header set to zero) and then
- continued over the page. The CRC field is then filled with the
- computed value.<p>
- (A thorough discussion of CRC algorithms can be found in <a
- href="ftp://ftp.rocksoft.com/clients/rocksoft/papers/crc_v3.txt">"A
- Painless Guide to CRC Error Detection Algorithms"</a> by Ross
- Williams <a
- href="mailto:ross@guest.adelaide.edu.au">ross@guest.adelaide.edu.au</a>.)
- <pre><tt>
- byte value
- 22 0xXX LSB
- 23 0xXX
- 24 0xXX
- 25 0xXX MSB
- </tt></pre>
- <h4>page_segments</h4>
- The number of segment entries to appear in the segment table. The
- maximum number of 255 segments (255 bytes each) sets the maximum
- possible physical page size at 65307 bytes or just under 64kB (thus
- we know that a header corrupted so as destroy sizing/alignment
- information will not cause a runaway bitstream. We'll read in the
- page according to the corrupted size information that's guaranteed to
- be a reasonable size regardless, notice the checksum mismatch, drop
- sync and then look for recapture).<p>
- <pre><tt>
- byte value
- 26 0x00-0xff (0-255)
- </tt></pre>
- <h4>segment_table (containing packet lacing values)</h4>
- The lacing values for each packet segment physically appearing in
- this page are listed in contiguous order.
- <pre><tt>
- byte value
- 27 0x00-0xff (0-255)
- [...]
- n 0x00-0xff (0-255, n=page_segments+26)
- </tt></pre>
- Total page size is calculated directly from the known header size and
- lacing values in the segment table. Packet data segments follow
- immediately after the header.<p>
- Page headers typically impose a flat .25-.5% space overhead assuming
- nominal ~8k page sizes. The segmentation table needed for exact
- packet recovery in the streaming layer adds approximately .5-1%
- nominal assuming expected encoder behavior in the 44.1kHz, 128kbps
- stereo encodings.<p>
- <hr>
- <a href="http://www.xiph.org/">
- <img src="white-xifish.png" align=left border=0>
- </a>
- <font size=-2 color=#505050>
- Ogg is a <a href="http://www.xiph.org">Xiphophorus</a> effort to
- protect essential tenets of Internet multimedia from corporate
- hostage-taking; Open Source is the net's greatest tool to keep
- everyone honest. See <a href="http://www.xiph.org/about.html">About
- Xiphophorus</a> for details.
- <p>
- Ogg Vorbis is the first Ogg audio CODEC. Anyone may
- freely use and distribute the Ogg and Vorbis specification,
- whether in a private, public or corporate capacity. However,
- Xiphophorus and the Ogg project (xiph.org) reserve the right to set
- the Ogg/Vorbis specification and certify specification compliance.<p>
- Xiphophorus's Vorbis software CODEC implementation is distributed
- under the Lessr/Library GNU Public License. This does not restrict
- third parties from distributing independent implementations of Vorbis
- software under other licenses.<p>
- OggSquish, Vorbis, Xiphophorus and their logos are trademarks (tm) of
- <a href="http://www.xiph.org/">Xiphophorus</a>. These pages are
- copyright (C) 1994-2000 Xiphophorus. All rights reserved.<p>
- </body>
|