draft-kerr-avt-vorbis-rtp-04.xml 34 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929
  1. <?xml version='1.0'?>
  2. <!DOCTYPE rfc SYSTEM 'rfc2629.dtd'>
  3. <?rfc toc="yes" ?>
  4. <rfc ipr="full3667" docName="RTP Payload Format for Vorbis Encoded Audio">
  5. <front>
  6. <title>draft-kerr-avt-vorbis-rtp-04</title>
  7. <author initials="P" surname="Kerr" fullname="Phil Kerr">
  8. <organization>Xiph.Org</organization>
  9. <address>
  10. <email>phil@plus24.com</email>
  11. <uri>http://www.xiph.org/</uri>
  12. </address>
  13. </author>
  14. <date day="31" month="December" year="2004" />
  15. <area>General</area>
  16. <workgroup>AVT Working Group</workgroup>
  17. <keyword>I-D</keyword>
  18. <keyword>Internet-Draft</keyword>
  19. <keyword>Vorbis</keyword>
  20. <keyword>RTP</keyword>
  21. <abstract>
  22. <t>This document describes a RTP payload format for transporting
  23. Vorbis encoded audio. It details the RTP encapsulation mechanism
  24. for raw Vorbis data and details the delivery mechanisms for the
  25. decoder probability model, referred to as a codebook, metadata
  26. and other setup information.</t>
  27. </abstract>
  28. <note title="Editors Note">
  29. <t>
  30. All references to RFC XXXX are to be replaced by references to the RFC number of this memo, when published.
  31. </t>
  32. </note>
  33. </front>
  34. <middle>
  35. <section anchor="Introduction" title="Introduction">
  36. <t>
  37. Vorbis is a general purpose perceptual audio codec intended to allow
  38. maximum encoder flexibility, thus allowing it to scale competitively
  39. over an exceptionally wide range of bitrates. At the high
  40. quality/bitrate end of the scale (CD or DAT rate stereo,
  41. 16/24 bits), it is in the same league as MPEG-2 and MPC. Similarly,
  42. the 1.0 encoder can encode high-quality CD and DAT rate stereo at
  43. below 48k bits/sec without resampling to a lower rate. Vorbis is
  44. also intended for lower and higher sample rates (from 8kHz
  45. telephony to 192kHz digital masters) and a range of channel
  46. representations (monaural, polyphonic, stereo, quadraphonic, 5.1,
  47. ambisonic, or up to 255 discrete channels).
  48. Vorbis encoded audio is generally encapsulated within an Ogg format
  49. bitstream <xref target="rfc3533"></xref>, which provides framing and synchronization. For the
  50. purposes of RTP transport, this layer is unnecessary, and so raw
  51. Vorbis packets are used in the payload.
  52. </t>
  53. <section anchor="Terminology" title="Terminology">
  54. <t>
  55. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  56. "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
  57. document are to be interpreted as described in RFC 2119 <xref target="rfc2119"></xref>.
  58. </t>
  59. </section>
  60. </section>
  61. <section anchor="Payload Format" title="Payload Format">
  62. <t>
  63. For RTP based transportation of Vorbis encoded audio the standard
  64. RTP header is followed by a 5 octet payload header, then the payload
  65. data. The payload headers are used to associate the Vorbis data with
  66. its associated decoding codebooks as well as indicating if the following packet
  67. contains fragmented Vorbis data and/or the the number of whole Vorbis
  68. data frames. The payload data contains the raw Vorbis bitstream
  69. information.
  70. </t>
  71. <section anchor="RTP Header" title="RTP Header">
  72. <artwork><![CDATA[
  73. 0 1 2 3
  74. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  75. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  76. |V=2|P|X| CC |M| PT | sequence number |
  77. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  78. | timestamp |
  79. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  80. | synchronization source (SSRC) identifier |
  81. +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
  82. | contributing source (CSRC) identifiers |
  83. | ... |
  84. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  85. ]]></artwork>
  86. <t>
  87. The RTP header begins with an octet of fields (V, P, X, and CC) to
  88. support specialized RTP uses (see <xref target="rfc3550"></xref> and <xref target="rfc3551"></xref> for details). For Vorbis RTP, the following values are used.
  89. </t>
  90. <t>
  91. Version (V): 2 bits</t><t>
  92. This field identifies the version of RTP. The version
  93. used by this specification is two (2).
  94. </t>
  95. <t>
  96. Padding (P): 1 bit</t><t>
  97. Padding MAY be used with this payload format according to
  98. section 5.1 of <xref target="rfc3550"></xref>.
  99. </t>
  100. <t>
  101. Extension (X): 1 bit</t><t>
  102. Always set to 0, as audio silence suppression is not used by
  103. the Vorbis codec.
  104. </t>
  105. <t>
  106. CSRC count (CC): 4 bits</t><t>
  107. The CSRC count is used in accordance with <xref target="rfc3550"></xref>.
  108. </t>
  109. <t>
  110. Marker (M): 1 bit</t><t>
  111. Set to zero. Audio silence suppression not used. This conforms
  112. to section 4.1 of <xref target="vorbis-spec-ref"></xref>.
  113. </t>
  114. <t>
  115. Payload Type (PT): 7 bits</t><t>
  116. An RTP profile for a class of applications is expected to assign
  117. a payload type for this format, or a dynamically allocated
  118. payload type SHOULD be chosen which designates the payload as
  119. Vorbis.
  120. </t>
  121. <t>
  122. Sequence number: 16 bits</t><t>
  123. The sequence number increments by one for each RTP data packet
  124. sent, and may be used by the receiver to detect packet loss and
  125. to restore packet sequence. This field is detailed further in
  126. <xref target="rfc3550"></xref>.
  127. </t>
  128. <t>
  129. Timestamp: 32 bits</t><t>
  130. A timestamp representing the sampling time of the first sample of
  131. the first Vorbis packet in the RTP packet. The clock frequency
  132. MUST be set to the sample rate of the encoded audio data and is
  133. conveyed out-of-band as a SDP attribute.
  134. </t>
  135. <t>
  136. SSRC/CSRC identifiers: </t><t>
  137. These two fields, 32 bits each with one SSRC field and a maximum
  138. of 16 CSRC fields, are as defined in <xref target="rfc3550"></xref>.
  139. </t>
  140. </section>
  141. <section anchor="Payload Header" title="Payload Header">
  142. <t>
  143. After the RTP Header section the following five octets are the Payload Header.
  144. This header is split into a number of bitfields detailing the format
  145. of the following Payload Data packets.
  146. </t>
  147. <artwork><![CDATA[
  148. 0 1 2 3
  149. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  150. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  151. | Codebook Ident |
  152. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  153. |C|F| T |# pkts.|
  154. +-+-+-+-+-+-+-+-+
  155. ]]></artwork>
  156. <t>
  157. Codebook Ident: 32 bits</t><t>
  158. This 32 bit field is used to associate the Vorbis data to a decoding Codebook.
  159. It is created by making a CRC32 checksum of the codebook required to decode the
  160. particular Vorbis audio stream.
  161. </t>
  162. <t>
  163. Continuation (C): 1 bit</t><t>
  164. Set to one if this is a continuation of a fragmented packet.
  165. </t>
  166. <t>
  167. Fragmented (F): 1 bit</t><t>
  168. Set to one if the payload contains complete packets or if it
  169. contains the last fragment of a fragmented packet.
  170. </t>
  171. <t>
  172. Payload Type (T): 2 bits</t><t>
  173. This field sets the packet payload type. There are currently four type of packet payloads.
  174. </t>
  175. <vspace blankLines="1" />
  176. <list style="empty">
  177. <t> 0 = Raw Vorbis payload</t>
  178. <t> 1 = Configuration payload</t>
  179. <t> 2 = Codebook payload</t>
  180. <t> 3 = Metadata payload</t>
  181. </list>
  182. <t>
  183. The last 4 bits are the number of complete packets in this payload.
  184. This provides for a maximum number of 15 Vorbis packets in the
  185. payload. If the packet contains fragmented data the number of packets MUST be set to 0.
  186. </t>
  187. </section>
  188. <section anchor="Payload Data" title="Payload Data">
  189. <t>
  190. Raw Vorbis packets are unbounded in length currently, although at some future
  191. point there will likely be a practical limit placed on them.
  192. Typical Vorbis packet sizes are from very small (2-3 bytes) to
  193. quite large (8-12 kilobytes). The reference implementation <xref target="libvorbis"></xref>
  194. typically produces packets less than ~800 bytes, except for the
  195. codebook header packets which are ~4-12 kilobytes.
  196. Within an RTP context the maximum Vorbis packet size, including the RTP and payload
  197. headers, SHOULD be kept below the path MTU to avoid packet fragmentation.
  198. </t>
  199. <t>
  200. Each Vorbis payload packet starts with a one octet length header,
  201. which is used to represent the size of the following data payload, followed
  202. by the raw Vorbis data.
  203. </t>
  204. <t>
  205. For payloads which consist of multiple Vorbis packets the payload data
  206. consists of the packet length followed by the packet data for each of
  207. the Vorbis packets in the payload.
  208. </t>
  209. <t>
  210. The Vorbis packet length header is the length of the Vorbis data
  211. block only and does not count the length octet.
  212. </t>
  213. <t>
  214. The payload packing of the Vorbis data packets SHOULD follow the
  215. guidelines set-out in <xref target="rfc3551"></xref> where the oldest packet
  216. occurs immediately after the RTP packet header.
  217. </t>
  218. <t>
  219. Channel mapping of the audio is in accordance with BS. 775-1
  220. ITU-R.
  221. </t>
  222. </section>
  223. <section anchor="Example RTP Packet" title="Example RTP Packet">
  224. <t>
  225. Here is an example RTP packet containing two Vorbis packets.
  226. </t>
  227. <t>
  228. RTP Packet Header:
  229. </t>
  230. <artwork><![CDATA[
  231. 0 1 2 3
  232. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  233. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  234. | 2 |0|0| 0 |0| PT | sequence number |
  235. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  236. | timestamp (in sample rate units) |
  237. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  238. | synchronisation source (SSRC) identifier |
  239. +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
  240. | contributing source (CSRC) identifiers |
  241. | ... |
  242. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  243. ]]></artwork>
  244. <t>
  245. Payload Data:
  246. </t>
  247. <artwork><![CDATA[
  248. 0 1 2 3
  249. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  250. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  251. | Codebook Ident |
  252. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  253. |0|1| 0 | 2 pks | len | vorbis data ... |
  254. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  255. .. ...vorbis data... ..
  256. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  257. .. data | len | next vorbis packet data... |
  258. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  259. ]]></artwork>
  260. </section>
  261. </section>
  262. <section anchor="Frame Packetizing" title="Frame Packetizing">
  263. <t>
  264. Each RTP packet contains either one complete Vorbis packet, one
  265. Vorbis packet fragment, or an integer number of complete Vorbis
  266. packets (up to a max of 15 packets, since the number of packets
  267. is defined by a 4 bit value).
  268. </t>
  269. <t>
  270. Any Vorbis data packet that is 256 octets or less SHOULD be bundled in the
  271. RTP packet with as many Vorbis packets as will fit, up to a maximum
  272. of 15.
  273. </t>
  274. <t>
  275. If a Vorbis packet is larger than 256 octets it MUST be
  276. fragmented. A fragmented packet has a zero in the last four bits
  277. of the payload header. Each fragment after the first will also set
  278. the Continued (C) bit to one in the payload header. The RTP packet
  279. containing the last fragment of the Vorbis packet will have the
  280. Fragmented (F) bit set to one. To maintain the correct sequence
  281. for fragmented packet reception the timestamp field of fragmented
  282. packets MUST be the same as the first packet sent, with the sequence
  283. number incremented as normal for the subsequent RTP packets. Path
  284. MTU is detailed in <xref target="rfc1063"></xref> and <xref target="rfc1981"></xref>.
  285. </t>
  286. <section anchor="Example Fragmented Vorbis Packet" title="Example Fragmented Vorbis Packet">
  287. <t>
  288. Here is an example fragmented Vorbis packet split over three RTP
  289. packets.
  290. </t>
  291. <artwork><![CDATA[
  292. Packet 1:
  293. 0 1 2 3
  294. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  295. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  296. |V=2|P|X| CC |M| PT | 1000 |
  297. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  298. | xxxxx |
  299. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  300. | synchronization source (SSRC) identifier |
  301. +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
  302. | contributing source (CSRC) identifiers |
  303. | ... |
  304. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  305. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  306. | Codebook Ident |
  307. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  308. |0|0| 0 | 0| len | vorbis data .. |
  309. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  310. | ..vorbis data.. |
  311. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  312. ]]></artwork>
  313. <t>
  314. In this packet the initial sequence number is 1000 and the
  315. timestamp is xxxxx. The number of packets field is set to 0.
  316. </t>
  317. <artwork><![CDATA[
  318. Packet 2:
  319. 0 1 2 3
  320. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  321. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  322. |V=2|P|X| CC |M| PT | 1001 |
  323. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  324. | xxxxx |
  325. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  326. | synchronization source (SSRC) identifier |
  327. +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
  328. | contributing source (CSRC) identifiers |
  329. | ... |
  330. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  331. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  332. | Codebook Ident |
  333. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  334. |1|0| 0 | 0| len | vorbis data ... |
  335. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  336. | ..vorbis data.. |
  337. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  338. ]]></artwork>
  339. <t>
  340. The C bit is set to 1 and the number of packets field is set to 0.
  341. For large Vorbis fragments there can be several of these type of
  342. payload packets. The maximum packet size SHOULD be no greater
  343. than the path MTU, including all RTP and payload headers. The
  344. sequence number has been incremented by one but the timestamp field
  345. remains the same as the initial packet.
  346. </t>
  347. <artwork><![CDATA[
  348. Packet 3:
  349. 0 1 2 3
  350. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  351. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  352. |V=2|P|X| CC |M| PT | 1002 |
  353. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  354. | xxxxx |
  355. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  356. | synchronization source (SSRC) identifier |
  357. +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
  358. | contributing source (CSRC) identifiers |
  359. | ... |
  360. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  361. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  362. | Codebook Ident |
  363. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  364. |1|1| 0 | 0| len | vorbis data .. |
  365. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  366. | ..vorbis data.. |
  367. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  368. ]]></artwork>
  369. <t>
  370. This is the last Vorbis fragment packet. The C and F bits are
  371. set and the packet count remains set to 0. As in the previous
  372. packets the timestamp remains set to the first packet in the
  373. sequence and the sequence number has been incremented.
  374. </t>
  375. </section>
  376. <section anchor="Packet Loss" title="Packet Loss">
  377. <t>
  378. As there is no error correction within the Vorbis stream, packet
  379. loss will result in a loss of signal. Packet loss is more of an
  380. issue for fragmented Vorbis packets as the client will have to
  381. cope with the handling of the C and F flags. If we use the
  382. fragmented Vorbis packet example above and the first packet is
  383. lost the client SHOULD detect that the next packet has the packet
  384. count field set to 0 and the C bit is set and MUST drop it. The
  385. next packet, which is the final fragmented packet, SHOULD be dropped
  386. in the same manner, or buffered. Feedback reports on lost and
  387. dropped packets MUST be sent back via RTCP.
  388. </t>
  389. </section>
  390. </section>
  391. <section anchor="Configuration Headers" title="Configuration Headers">
  392. <t>
  393. Unlike other mainstream audio codecs Vorbis has no statically
  394. configured probability model, instead it packs all entropy decoding
  395. configuration, VQ and Huffman models into a self-contained codebook.
  396. This codebook block also requires additional identification
  397. information detailing the number of audio channels, bitrates and
  398. other information used to initialise the Vorbis stream.
  399. </t>
  400. <t>
  401. To decode a Vorbis stream three configuration header blocks are
  402. needed. The first header indicates the sample and bitrates, the
  403. number of channels and the version of the Vorbis encoder used.
  404. The second header contains the decoders probability model, or
  405. codebook and the third header details stream metadata.
  406. </t>
  407. <t>
  408. As the RTP stream may change certain configuration data mid-session
  409. there are two different methods for delivering this configuration
  410. data to a client, in-band and SDP which is
  411. detailed below. SDP delivery is used to set-up an initial
  412. state for the client application and in-band is used to change state
  413. during the session. The changes may be due to different metadata
  414. or codebooks as well as different bitrates of the stream.
  415. </t>
  416. <t>
  417. Out of the two delivery vectors the use of an SDP attribute to indicate an URI
  418. where the configuration and codebook data can be obtained is preferred
  419. as they can be fetched reliably using TCP. The in-band codebook delivery SHOULD
  420. only be used in situations where the link between the client is unidirectional or if
  421. the SDP-based information is not available.
  422. </t>
  423. <t>
  424. Synchronizing the configuration and codebook headers to the RTP stream is
  425. critical. The 32 bit Codebook Ident field is used to indicate when a change in the stream has
  426. taken place. The client application MUST have in advance the correct configuration and codebook
  427. headers and if the client detects a change in the Ident value and does not have this information
  428. it MUST NOT decode the raw Vorbis data.
  429. </t>
  430. <section anchor="In-band Header Transmission" title="In-band Header Transmission">
  431. <t>
  432. The three header data blocks are sent in-band with the packet type bits set to
  433. match the payload type. Normally the codebook and configuration
  434. headers are sent once per session if the stream is an encoding of live audio, as typically
  435. the encoder state will not change, but the encoder state can change at the boundary
  436. of chained Vorbis audio files. Metadata can be sent at the start as well as any time during
  437. the life of the session. Clients MUST be capable of dealing with periodic re-transmission of the
  438. configuration headers.
  439. </t>
  440. <t>
  441. A Vorbis configuration header is indicated with the payload type field set to 1.
  442. The Vorbis version MUST be set to zero to comply with
  443. this document. The fields Sample Rate, Bitrate Maximum/Nominal/
  444. Minimum and Num Audio Channels are set in accordance with <xref target="vorbis-spec-ref"></xref> with
  445. the bsz fields above referring to the blocksize parameters. The
  446. framing bit is not used for RTP transportation and so applications
  447. constructing Vorbis files MUST take care to set this if required.
  448. </t>
  449. <artwork><![CDATA[
  450. 0 1 2 3
  451. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  452. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  453. |V=2|P|X| CC |M| PT | xxxx |
  454. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  455. | xxxxx |
  456. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  457. | synchronization source (SSRC) identifier |
  458. +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
  459. | contributing source (CSRC) identifiers |
  460. | ... |
  461. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  462. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  463. | Codebook Ident |
  464. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  465. |0|1| 2 | 1| bsz 0 | bsz 1 | Num Audio Channels |
  466. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  467. | Vorbis Version |
  468. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  469. | Audio Sample Rate |
  470. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  471. | Bitrate Maximum |
  472. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  473. | Bitrate Nominal |
  474. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  475. | Bitrate Minimum |
  476. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  477. ]]></artwork>
  478. <t>
  479. If the payload type field is set to 2, this indicates the packet contains codebook data.
  480. </t>
  481. <t>
  482. The configuration information detailed below MUST be completely
  483. intact, as a client can not decode a stream with an incomplete
  484. or corrupted codebook set.
  485. </t>
  486. <t>
  487. A 16 bit codebook length field precedes the codebook datablock. The length field
  488. allows for codebooks to be up to 64K in size. Packet fragmentation,
  489. as per the Vorbis data, MUST be performed if the codebooks size exceeds
  490. path MTU. The Codebook Ident field MUST be set to match the associated codebook
  491. needed to decode the Vorbis stream.
  492. </t>
  493. <t>
  494. The Codebook Ident is the CRC32 checksum of the codebook and
  495. is used to detect a corrupted codebook as well as
  496. associating it with its Vorbis data stream. This Ident value
  497. MUST NOT be set to the value of the current stream if this header is
  498. being sent before the boundary of the chained file has been reached.
  499. If a checksum failure is detected then this is considered to
  500. be a failure and MUST be reported to the client application.
  501. </t>
  502. <artwork><![CDATA[
  503. 0 1 2 3
  504. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  505. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  506. |V=2|P|X| CC |M| PT | xxxx |
  507. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  508. | xxxxx |
  509. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  510. | synchronization source (SSRC) identifier |
  511. +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
  512. | contributing source (CSRC) identifiers |
  513. | ... |
  514. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  515. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  516. | Codebook Ident |
  517. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  518. |0|1| 2 | 1| Codebook Length |
  519. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  520. | length | Codebook ..
  521. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  522. .. Codebook |
  523. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  524. ]]></artwork>
  525. <t>
  526. With the payload type flag set to 3, this indicates that the packet contain the
  527. comment metadata, such as artist name, track title and so on. These
  528. metadata messages are not intended to be fully descriptive but to
  529. offer basic track/song information. This message MUST be sent at
  530. the start of the stream, together with the setup and codebook
  531. headers, even if it contains no information. During a session the
  532. metadata associated with the stream may change from that specified
  533. at the start, e.g. a live concert broadcast changing acts/scenes, so
  534. clients MUST have the ability to receive header blocks. Details
  535. on the format of the comments can be found in the Vorbis
  536. documentation <xref target="v-comment"></xref>.
  537. </t>
  538. 1) [vendor_length] = read an unsigned integer of 32 bits
  539. 2) [vendor_string] = read a UTF-8 vector as [vendor_length] octets
  540. 3) [user_comment_list_length] = read an unsigned integer of 32 bits
  541. 4) iterate [user_comment_list_length] times {
  542. 5) [length] = read an unsigned integer of 32 bits
  543. 6) this iteration's user comment = read a UTF-8 vector as [length] octets
  544. }
  545. 7) [framing_bit] = read a single bit as boolean
  546. 8) if ( [framing_bit] unset or end of packet ) then ERROR
  547. 9) done.
  548. <t>
  549. The format for the data takes the form of a 32 bit codec vendors
  550. name length field followed by the name encoded in UTF-8. The next 32
  551. bit field denotes the number of user comments. Each of the user comments
  552. is prefixed by a 32 bit length field followed by the comment text.
  553. </t>
  554. <artwork><![CDATA[
  555. 0 1 2 3
  556. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  557. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  558. |V=2|P|X| CC |M| PT | xxxx |
  559. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  560. | xxxxx |
  561. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  562. | synchronization source (SSRC) identifier |
  563. +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
  564. | contributing source (CSRC) identifiers |
  565. | ... |
  566. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  567. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  568. | Codebook Ident |
  569. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  570. |0|1| 3 | 1| Vendor string length |
  571. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  572. | length | Vendor string ..
  573. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  574. | User comments list length |
  575. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  576. .. User comment length / User comment |
  577. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  578. ]]></artwork>
  579. </section>
  580. <section anchor="Session Description for Vorbis RTP Streams" title="Session Description for Vorbis RTP Streams">
  581. <t>
  582. Session description information concerning the Vorbis stream
  583. SHOULD be provided if possible and MUST be in accordance with <xref target="rfc2327"></xref>.
  584. </t>
  585. <t>
  586. If the stream comprises chained Vorbis files the configuration and codebook headers for each
  587. file SHOULD be packaged together and passed to the client using the headers attribute.
  588. </t>
  589. <t>
  590. Below is an outline of the mandatory SDP attributes.
  591. </t>
  592. <vspace blankLines="1" />
  593. <list style="empty">
  594. <t>c=IN IP4/6 </t>
  595. <t>m=audio RTP/AVP 98</t>
  596. <t>a=rtpmap:98 VORBIS/44100/2</t>
  597. <t>a=fmtp:98 header=&lt;URI of configuration header&gt; </t>
  598. </list>
  599. <t>
  600. The Vorbis configuration specified in the header attribute MUST contain
  601. all of the configuration data and codebooks needed for the life of the session.
  602. </t>
  603. <t>
  604. The port value is specified by the server application bound to
  605. the address specified in the c attribute. The bitrate value
  606. and channels specified in the m attribute MUST match the Vorbis
  607. sample rate value.
  608. </t>
  609. </section>
  610. <section anchor="Codebook Caching" title="Codebook Caching">
  611. <t>
  612. Codebook caching allows clients that have previously connected to a
  613. stream to re-use the associated codebooks and configuration data.
  614. When a client receives a codebook it may store it locally and can
  615. compare the CRC32 key with that of the new stream and begin decoding
  616. before it has received any of the headers.
  617. </t>
  618. </section>
  619. </section>
  620. <section anchor="IANA Considerations" title="IANA Considerations">
  621. <t>
  622. MIME media type name: audio
  623. </t>
  624. <t>
  625. MIME subtype: vorbis
  626. </t>
  627. <t>
  628. Required Parameters:</t><t>
  629. header indicates the URI of the decoding configuration headers.
  630. </t>
  631. <t>
  632. Optional Parameters: </t><t>
  633. None.
  634. </t>
  635. <t>
  636. Encoding considerations:</t><t>
  637. This type is only defined for transfer via RTP as specified
  638. in RFC XXXX.
  639. </t>
  640. <t>
  641. Security Considerations:</t><t>
  642. See Section 6 of RFC 3047.
  643. </t>
  644. <t>
  645. Interoperability considerations: none
  646. </t>
  647. <t>
  648. Published specification:</t>
  649. <t>See the Vorbis documentation <xref target="vorbis-spec-ref"></xref> for details.</t>
  650. <t>
  651. Applications which use this media type:</t><t>
  652. Audio streaming and conferencing tools
  653. </t>
  654. <t>
  655. Additional information: none
  656. </t>
  657. <t>
  658. Person &amp; email address to contact for further information:</t><t>
  659. Phil Kerr: &lt;phil@plus24.com&gt;
  660. </t>
  661. <t>
  662. Intended usage: COMMON
  663. </t>
  664. <t>
  665. Author/Change controller:</t><t>
  666. Author: Phil Kerr
  667. Change controller: IETF AVT Working Group
  668. </t>
  669. </section>
  670. <section anchor="Congestion Control" title="Congestion Control">
  671. <t>
  672. Vorbis clients SHOULD send regular receiver reports detailing
  673. congestion. A mechanism for dynamically downgrading the stream,
  674. known as bitrate peeling, will allow for a graceful backing off
  675. of the stream bitrate. This feature is not available at present
  676. so an alternative would be to redirect the client to a lower
  677. bitrate stream if one is available.
  678. </t>
  679. </section>
  680. <section anchor="Security Considerations" title="Security Considerations">
  681. <t>
  682. RTP packets using this payload format are subject to the security
  683. considerations discussed in the RTP specification <xref target="rfc3550"></xref>. This implies
  684. that the confidentiality of the media stream is achieved by using
  685. encryption. Because the data compression used with this payload
  686. format is applied end-to-end, encryption may be performed on the
  687. compressed data. Where the size of a data block is set care MUST
  688. be taken to prevent buffer overflows in the client applications.
  689. </t>
  690. </section>
  691. <section anchor="Acknowledgments" title="Acknowledgments">
  692. <t>
  693. This document is a continuation of draft-moffitt-vorbis-rtp-00.txt.
  694. The MIME type section is a continuation of draft-short-avt-rtp-
  695. vorbis-mime-00.txt
  696. </t>
  697. <t>
  698. Thanks to the AVT, Ogg Vorbis Communities / Xiph.org including
  699. Steve Casner, Aaron Colwell, Ross Finlayson, Ramon Garcia, Pascal Hennequin, Ralph Giles,
  700. Tor-Einar Jarnbjo, Colin Law, John Lazzaro, Jack Moffitt, Christopher Montgomery,
  701. Colin Perkins, Barry Short, Mike Smith, Magnus Westerlund.
  702. </t>
  703. </section>
  704. </middle>
  705. <back>
  706. <references title="Normative References">
  707. <reference anchor="rfc3533">
  708. <front>
  709. <title>The Ogg Encapsulation Format Version 0</title>
  710. <author initials="S." surname="Pfeiffer" fullname="Silvia Pfeiffer"></author>
  711. </front>
  712. <seriesInfo name="RFC" value="3533" />
  713. </reference>
  714. <reference anchor="rfc2119">
  715. <front>
  716. <title>Key words for use in RFCs to Indicate Requirement Levels </title>
  717. <author initials="S." surname="Bradner" fullname="Scott Bradner"></author>
  718. </front>
  719. <seriesInfo name="RFC" value="2119" />
  720. </reference>
  721. <reference anchor="rfc3550">
  722. <front>
  723. <title>RTP: A Transport Protocol for real-time applications</title>
  724. <author initials="H." surname="Schulzrinne" fullname=""></author>
  725. <author initials="S." surname="Casner" fullname=""></author>
  726. <author initials="R." surname="Frederick" fullname=""></author>
  727. <author initials="V." surname="Jacobson" fullname=""></author>
  728. </front>
  729. <seriesInfo name="RFC" value="3550" />
  730. </reference>
  731. <reference anchor="rfc3551">
  732. <front>
  733. <title>RTP Profile for Audio and Video Conferences with Minimal Control.</title>
  734. <author initials="H." surname="Schulzrinne" fullname=""></author>
  735. <author initials="S." surname="Casner" fullname=""></author>
  736. </front>
  737. <date month="July" year="2003" />
  738. <seriesInfo name="RFC" value="3551" />
  739. </reference>
  740. <reference anchor="rfc2327">
  741. <front>
  742. <title>SDP: Session Description Protocol</title>
  743. <author initials="M." surname="Handley" fullname="Mark Handley"></author>
  744. <author initials="V." surname="Jacobson" fullname="Van Jacobson"></author>
  745. </front>
  746. <seriesInfo name="RFC" value="2327" />
  747. </reference>
  748. <reference anchor="rfc1063">
  749. <front>
  750. <title>Path MTU Discovery</title>
  751. <author initials="J." surname="Mogul et al." fullname="J. Mogul et al."></author>
  752. </front>
  753. <seriesInfo name="RFC" value="1063" />
  754. </reference>
  755. <reference anchor="rfc1981">
  756. <front>
  757. <title>Path MTU Discovery for IP version 6</title>
  758. <author initials="J." surname="McCann et al." fullname="J. McCann et al."></author>
  759. </front>
  760. <seriesInfo name="RFC" value="1981" />
  761. </reference>
  762. </references>
  763. <references title="Informative References">
  764. <reference anchor="libvorbis">
  765. <front>
  766. <title>libvorbis: Available from the Xiph website, http://www.xiph.org</title>
  767. </front>
  768. </reference>
  769. <reference anchor="vorbis-spec-ref">
  770. <front>
  771. <title>Ogg Vorbis I spec: Codec setup and packet decode. http://www.xiph.org/ogg/vorbis/doc/vorbis-spec-ref.html</title>
  772. </front>
  773. </reference>
  774. <reference anchor="v-comment">
  775. <front>
  776. <title>Ogg Vorbis I spec: Comment field and header specification. http://www.xiph.org/ogg/vorbis/doc/v-comment.html</title>
  777. </front>
  778. </reference>
  779. </references>
  780. </back>
  781. </rfc>