123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140 |
- Topic:
- Sample granularity editing of a Vorbis file; inferred arbitrary sample
- length starting offsets / PCM stream lengths
- Overview:
- Vorbis, like mp3, is a frame-based* audio compression where audio is
- broken up into discrete short time segments. These segments are
- 'atomic' that is, one must recover the entire short time segment from
- the frame packet; there's no way to recover only a part of the PCM time
- segment from part of the coded packet without expanding the entire
- packet and then discarding a portion of the resulting PCM audio.
- * In mp3, the data segment representing a given time period is called
- a 'frame'; the roughly equivalent Vorbis construct is a 'packet'.
- Thus, when we edit a Vorbis stream, the finest physical editing
- granularity is on these packet boundaries (the mp3 case is
- actually somewhat more complex and mp3 editing is more complicated
- than just snipping on a frame boundary because time data can be spread
- backward or forward over frames. In Vorbis, packets are all
- stand-alone). Thus, at the physical packet level, Vorbis is still
- limited to streams that contain an integral number of packets.
- However, Vorbis streams may still exactly represent and be edited to a
- PCM stream of arbitrary length and starting offset without padding the
- beginning or end of the decoded stream or requiring that the desired
- edit points be packet aligned. Vorbis makes use of Ogg stream
- framing, and this framing provides time-stamping data, called a
- 'granule position'; our starting offset and finished stream length may
- be inferred from correct usage of the granule position data.
- Time stamping mechanism:
- Vorbis packets are bundled into into Ogg pages (note that pages do not
- necessarily contain integral numbers of packets, but that isn't
- inportant in this discussion. More about Ogg framing can be found in
- ogg/doc/framing.html). Each page that contains a packet boundary is
- stamped with the absolute sample-granularity offset of the data, that
- is, 'complete samples-to-date' up to the last completed packet of that
- page. (The same mechanism is used for eg, video, where the number
- represents complete 2-D frames, and so on).
- (It's possible but rare for a packet to span more than two pages such
- that page[s] in the middle have no packet boundary; these packets have
- a granule position of '-1'.)
- This granule position mechaism in Ogg is used by Vorbis to indicate when the
- PCM data intended to be represented in a Vorbis segment begins a
- number of samples into the data represented by the first packet[s]
- and/or ends before the physical PCM data represented in the last
- packet[s].
- File length a non-integral number of frames:
- A file to be encoded in Vorbis will probably not encode into an
- integral number of packets; such a file is encoded with the last
- packet containing 'extra'* samples. These samples are not padding; they
- will be discarded in decode.
- *(For best results, the encoder should use extra samples that preserve
- the character of the last frame. Simply setting them to zero will
- introduce a 'cliff' that's hard to encode, resulting in spread-frame
- noise. Libvorbis extrapolates the last frame past the end of data to
- produce the extra samples. Even simply duplicating the last value is
- better than clamping the signal to zero).
- The encoder indicates to the decoder that the file is actually shorter
- than all of the samples ('original' + 'extra') by setting the granule
- position in the last page to a short value, that is, the last
- timestamp is the original length of the file discarding extra samples.
- The decoder will see that the number of samples it has decoded in the
- last page is too many; it is 'original' + 'extra', where the
- granulepos says that through the last packet we only have 'original'
- number of samples. The decoder then ignores the 'extra' samples.
- This behavior is to occur only when the end-of-stream bit is set in
- the page (indicating last page of the logical stream).
-
- Note that it not legal for the granule position of the last page to
- indicate that there are more samples in the file than actually exist,
- however, implementations should handle such an illegal file gracefully
- in the interests of robust programming.
- Beginning point not on integral packet boundary:
- It is possible that we will the PCM data represented by a Vorbis
- stream to begin at a position later than where the decoded PCM data
- really begins after an integral packet boundary, a situation analagous
- to the above description where the PCM data does not end at an
- integral packet boundary. The easiest example is taking a clip out of
- a larger Vorbis stream, and choosing a beginning point of the clip
- that is not on a packet boundary; we need to ignore a few samples to
- get the desired beginning point.
- The process of marking the desired beginning point is similar to
- marking an arbitrary ending point. If the encoder wishes sample zero
- to be some location past the actual beginning of data, it associates a
- 'short' granule position value with the completion of the second*
- audio packet. The granule position is associated with the second
- packet simply by making sure the second packet completes its page.
- *(We associate the short value with the second packet for two reasons.
- a) The first packet only primes the overlap/add buffer. No data is
- returned before decoding the second packet; this places the decision
- information at the point of decision. b) Placing the short value on
- the first packet would make the value negative (as the first packet
- normally represents position zero); a negative value would break the
- requirement that granule positions increase; the headers have
- position values of zero)
- The decoder sees that on the first page that will return
- data from the overlap/add queue, we have more samples than the granule
- position accounts for, and discards the 'surplus' from the beginning
- of the queue.
- Note that short granule values (indicating less than the actually
- returned about of data) are not legal in the Vorbis spec outside of
- indicating beginning and ending sample positions. However, decoders
- should, at minimum, tolerate inadvertant short values elsewhere in the
- stream (just as they should tolerate out-of-order/non-increasing
- granulepos values, although this too is illegal).
- Beginning point at arbitrary positive timestamp (no 'zero' sample):
- It's also possible that the granule position of the first page of an
- audio stream is a 'long value', that is, a value larger than the
- amount of PCM audio decoded. This implies only that we are starting
- playback at some point into the logical stream, a potentially common
- occurence in streaming applications where the decoder may be
- connecting into a live stream. The decoder should not treat the long
- value specially.
- A long value elsewhere in the stream would normally occur only when a
- page is lost or out of sequence, as indicated by the page's sequence
- number. A long value under any other situation is not legal, however
- a decoder should tolerate both possibilities.
|