README.adoc 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529
  1. = audiowmark - Audio Watermarking
  2. == Description
  3. `audiowmark` is an Open Source (GPL) solution for audio watermarking.
  4. A sound file is read by the software, and a 128-bit message is stored in a
  5. watermark in the output sound file. For human listeners, the files typically
  6. sound the same.
  7. However, the 128-bit message can be retrieved from the output sound file. Our
  8. tests show, that even if the file is converted to mp3 or ogg (with bitrate 128
  9. kbit/s or higher), the watermark usually can be retrieved without problems. The
  10. process of retrieving the message does not need the original audio file (blind
  11. decoding).
  12. Internally, audiowmark is using the patchwork algorithm to hide the data in the
  13. spectrum of the audio file. The signal is split into 1024 sample frames. For
  14. each frame, some pseoudo-randomly selected amplitudes of the frequency bands of
  15. a 1024-value FFTs are increased or decreased slightly, which can be detected
  16. later. The algorithm used here is inspired by
  17. Martin Steinebach: Digitale Wasserzeichen für Audiodaten.
  18. Darmstadt University of Technology 2004, ISBN 3-8322-2507-2
  19. == Open Source License
  20. `audiowmark` is *open source* software available under the *GPLv3
  21. or later* license.
  22. Copyright (C) 2018-2020 Stefan Westerfeld
  23. This program is free software: you can redistribute it and/or modify
  24. it under the terms of the GNU General Public License as published by
  25. the Free Software Foundation, either version 3 of the License, or
  26. (at your option) any later version.
  27. This program is distributed in the hope that it will be useful,
  28. but WITHOUT ANY WARRANTY; without even the implied warranty of
  29. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  30. GNU General Public License for more details.
  31. You should have received a copy of the GNU General Public License
  32. along with this program. If not, see <http://www.gnu.org/licenses/>.
  33. == Adding a Watermark
  34. To add a watermark to the soundfile in.wav with a 128-bit message (which is
  35. specified as hex-string):
  36. [subs=+quotes]
  37. ....
  38. *$ audiowmark add in.wav out.wav 0123456789abcdef0011223344556677*
  39. Input: in.wav
  40. Output: out.wav
  41. Message: 0123456789abcdef0011223344556677
  42. Strength: 10
  43. Time: 3:59
  44. Sample Rate: 48000
  45. Channels: 2
  46. Data Blocks: 4
  47. ....
  48. The most important options for adding a watermark are:
  49. --key <filename>::
  50. Use watermarking key from file <filename> (see <<key>>).
  51. --strength <s>::
  52. Set the watermarking strength (see <<strength>>).
  53. == Retrieving a Watermark
  54. To get the 128-bit message from the watermarked file, use:
  55. [subs=+quotes]
  56. ....
  57. *$ audiowmark get out.wav*
  58. pattern 0:05 0123456789abcdef0011223344556677 1.324 0.059 A
  59. pattern 0:57 0123456789abcdef0011223344556677 1.413 0.112 B
  60. pattern 0:57 0123456789abcdef0011223344556677 1.368 0.086 AB
  61. pattern 1:49 0123456789abcdef0011223344556677 1.302 0.098 A
  62. pattern 2:40 0123456789abcdef0011223344556677 1.361 0.093 B
  63. pattern 2:40 0123456789abcdef0011223344556677 1.331 0.096 AB
  64. pattern all 0123456789abcdef0011223344556677 1.350 0.054
  65. ....
  66. The output of `audiowmark get` is designed to be machine readable. Each line
  67. that starts with `pattern` contains one decoded message. The fields are
  68. seperated by one or more space characters. The first field is a *timestamp*
  69. indicating the position of the data block. The second field is the *decoded
  70. message*. For most purposes this is all you need to know.
  71. The software was designed under the assumption that you - the user - will be
  72. able to decide whether a message is correct or not. To do this, on watermarking
  73. song files, you could list each message you embedded in a database. During
  74. retrieval, you should look up each pattern `audiowmark get` outputs in the
  75. database. If the message is not found, then you should assume that a decoding
  76. error occurred. In our example each pattern was decoded correctly, because
  77. the watermark was not damaged at all, but if you for instance use lossy
  78. compression (with a low bitrate), it may happen that only some of the decoded
  79. patterns are correct. Or none, if the watermark was damaged too much.
  80. The third field is the *sync score* (higher is better). The synchronization
  81. algorithm tries to find valid data blocks in the audio file, that become
  82. candidates for decoding.
  83. The fourth field is the *decoding error* (lower is better). During message
  84. decoding, we use convolutional codes for error correction, to make the
  85. watermarking more robust.
  86. The fifth field is the *block type*. There are two types of data blocks,
  87. A blocks and B blocks. A single data block can be decoded alone, as it
  88. contains a complete message. However, if during watermark detection an
  89. A block followed by a B block was found, these two can be decoded
  90. together (then this field will be AB), resulting in even higher error
  91. correction capacity than one block alone would have.
  92. To improve the error correction capacity even further, the `all` pattern
  93. combines all data blocks that are available. The combined decoded
  94. message will often be the most reliable result (meaning that even if all
  95. other patterns were incorrect, this could still be right).
  96. The most important options for getting a watermark are:
  97. --key <filename>::
  98. Use watermarking key from file <filename> (see <<key>>).
  99. --strength <s>::
  100. Set the watermarking strength (see <<strength>>).
  101. [[key]]
  102. == Watermark Key
  103. Since the software is Open Source, a watermarking key should be used to ensure
  104. that the message bits cannot be retrieved by somebody else (which would also
  105. allow removing the watermark without loss of quality). The watermark key
  106. controls all pseudo-random parameters of the algorithm. This means that
  107. it determines which frequency bands are increased or decreased to store a
  108. 0 bit or a 1 bit. Without the key, it is impossible to decode the message
  109. bits from the audio file alone.
  110. Our watermarking key is a 128-bit AES key. A key can be generated using
  111. audiowmark gen-key test.key
  112. and can be used for the add/get commands as follows:
  113. audiowmark add --key test.key in.wav out.wav 0123456789abcdef0011223344556677
  114. audiowmark get --key test.key out.wav
  115. [[strength]]
  116. == Watermark Strength
  117. The watermark strength parameter affects how much the watermarking algorithm
  118. modifies the input signal. A stronger watermark is more audible, but also more
  119. robust against modifications. The default strength is 10. A watermark with that
  120. strength is recoverable after mp3/ogg encoding with 128kbit/s or higher. In our
  121. informal listening tests, this setting also has a very good subjective quality.
  122. A higher strength (for instance 15) would be helpful for instance if robustness
  123. against multiple conversions or conversions to low bit rates (i.e. 64kbit/s) is
  124. desired.
  125. A lower strength (for instance 6) makes the watermark less audible, but also
  126. less robust. Strengths below 5 are not recommended. To set the strength, the
  127. same value has to be passed during both, generation and retrieving the
  128. watermark. Fractional strengths (like 7.5) are possible.
  129. audiowmark add --strength 15 in.wav out.wav 0123456789abcdef0011223344556677
  130. audiowmark get --strength 15 out.wav
  131. == Short Payload (experimental)
  132. By default, the watermark will store a 128-bit message. In this mode, we
  133. recommend using a 128bit hash (or HMAC) as payload. No error checking is
  134. performed, the user needs to test patterns that the watermarker decodes to
  135. ensure that they really are one of the expected patterns, not a decoding
  136. error.
  137. As an alternative, an experimental short payload option is available, for very
  138. short payloads (12, 16 or 20 bits). It is enabled using the `--short <bits>`
  139. command line option, for instance for 16 bits:
  140. audiowmark add --short 16 in.wav out.wav abcd
  141. audiowmark get --short 16 out.wav
  142. Internally, a larger set of bits is sent to ensure that decoded short patterns
  143. are really valid, so in this mode, error checking is performed after decoding,
  144. and only valid patterns are reported.
  145. Besides error checking, the advantage of a short payload is that fewer bits
  146. need to be sent, so decoding will more likely to be successful on shorter
  147. clips.
  148. == Video Files
  149. For video files, `videowmark` can be used to add a watermark to the audio track
  150. of video files. To add a watermark, use
  151. [subs=+quotes]
  152. ....
  153. *$ videowmark add in.avi out.avi 0123456789abcdef0011223344556677*
  154. Audio Codec: -c:a mp3 -ab 128000
  155. Input: in.avi
  156. Output: out.avi
  157. Message: 0123456789abcdef0011223344556677
  158. Strength: 10
  159. Time: 3:53
  160. Sample Rate: 44100
  161. Channels: 2
  162. Data Blocks: 4
  163. ....
  164. To detect a watermark, use
  165. [subs=+quotes]
  166. ....
  167. *$ videowmark get out.avi*
  168. pattern 0:05 0123456789abcdef0011223344556677 1.294 0.142 A
  169. pattern 0:57 0123456789abcdef0011223344556677 1.191 0.144 B
  170. pattern 0:57 0123456789abcdef0011223344556677 1.242 0.145 AB
  171. pattern 1:49 0123456789abcdef0011223344556677 1.215 0.120 A
  172. pattern 2:40 0123456789abcdef0011223344556677 1.079 0.128 B
  173. pattern 2:40 0123456789abcdef0011223344556677 1.147 0.126 AB
  174. pattern all 0123456789abcdef0011223344556677 1.195 0.104
  175. ....
  176. The key and strength can be set using the command line options
  177. --key <filename>::
  178. Use watermarking key from file <filename> (see <<key>>).
  179. --strength <s>::
  180. Set the watermarking strength (see <<strength>>).
  181. Videos can be watermarked on-the-fly using <<hls>>.
  182. == Output as Stream
  183. Usually, an input file is read, watermarked and an output file is written.
  184. This means that it takes some time before the watermarked file can be used.
  185. An alternative is to output the watermarked file as stream to stdout. One use
  186. case is sending the watermarked file to a user via network while the
  187. watermarker is still working on the rest of the file. Here is an example how to
  188. watermark a wav file to stdout:
  189. audiowmark add in.wav - 0123456789abcdef0011223344556677 | play -
  190. In this case the file in.wav is read, watermarked, and the output is sent
  191. to stdout. The "play -" can start playing the watermarked stream while the
  192. rest of the file is being watermarked.
  193. If - is used as output, the output is a valid .wav file, so the programs
  194. running after `audiowmark` will be able to determine sample rate, number of
  195. channels, bit depth, encoding and so on from the wav header.
  196. Note that all input formats supported by audiowmark can be used in this way,
  197. for instance flac/mp3:
  198. audiowmark add in.flac - 0123456789abcdef0011223344556677 | play -
  199. audiowmark add in.mp3 - 0123456789abcdef0011223344556677 | play -
  200. == Input from Stream
  201. Similar to the output, the `audiowmark` input can be a stream. In this case,
  202. the input must be a valid .wav file. The watermarker will be able to
  203. start watermarking the input stream before all data is available. An
  204. example would be:
  205. cat in.wav | audiowmark add - out.wav 0123456789abcdef0011223344556677
  206. It is possible to do both, input from stream and output as stream.
  207. cat in.wav | audiowmark add - - 0123456789abcdef0011223344556677 | play -
  208. Streaming input is also supported for watermark detection.
  209. cat in.wav | audiowmark get -
  210. == Raw Streams
  211. So far, all streams described here are essentially wav streams, which means
  212. that the wav header allows `audiowmark` to determine sample rate, number of
  213. channels, bit depth, encoding and so forth from the stream itself, and the a
  214. wav header is written for the program after `audiowmark`, so that this can
  215. figure out the parameters of the stream.
  216. There are two cases where this is problematic. The first case is if the full
  217. length of the stream is not known at the time processing starts. Then a wav
  218. header cannot be used, as the wav file contains the length of the stream. The
  219. second case is that the program before or after `audiowmark` doesn't support wav
  220. headers.
  221. For these two cases, raw streams are available. The idea is to set all
  222. information that is needed like sample rate, number of channels,... manually.
  223. Then, headerless data can be processed from stdin and/or sent to stdout.
  224. --input-format raw::
  225. --output-format raw::
  226. --format raw::
  227. These can be used to set the input format or output format to raw. The
  228. last version sets both, input and output format to raw.
  229. --raw-rate <rate>::
  230. This should be used to set the sample rate. The input sample rate and
  231. the output sample rate will always be the same (no resampling is
  232. done by the watermarker). There is no default for the sampling rate,
  233. so this parameter must always be specified for raw streams.
  234. --raw-input-bits <bits>::
  235. --raw-output-bits <bits>::
  236. --raw-bits <bits>::
  237. The options can be used to set the input number of bits, the output number
  238. of bits or both. The number of bits can either be `16` or `24`. The default
  239. number of bits is `16`.
  240. --raw-input-endian <endian>::
  241. --raw-output-endian <endian>::
  242. --raw-endian <endian>::
  243. These options can be used to set the input/output endianness or both.
  244. The <endian> parameter can either be `little` or `big`. The default
  245. endianness is `little`.
  246. --raw-input-encoding <encoding>::
  247. --raw-output-encoding <encoding>::
  248. --raw-encoding <encoding>::
  249. These options can be used to set the input/output encoding or both.
  250. The <encoding> parameter can either be `signed` or `unsigned`. The
  251. default encoding is `signed`.
  252. --raw-channels <channels>::
  253. This can be used to set the number of channels. Note that the number
  254. of input channels and the number of output channels must always be the
  255. same. The watermarker has been designed and tested for stereo files,
  256. so the number of channels should really be `2`. This is also the
  257. default.
  258. [[hls]]
  259. == HTTP Live Streaming
  260. === Introduction for HLS
  261. HTTP Live Streaming (HLS) is a protocol to deliver audio or video streams via
  262. HTTP. One example for using HLS in practice would be: a user watches a video
  263. in a web browser with a player like `hls.js`. The user is free to
  264. play/pause/seek the video as he wants. `audiowmark` can watermark the audio
  265. content while it is being transmitted to the user.
  266. HLS splits the contents of each stream into small segments. For the watermarker
  267. this means that if the user seeks to a position far ahead in the stream, the
  268. server needs to start sending segments from where the new play position is, but
  269. everything in between can be ignored.
  270. Another important property of HLS is that it allows separate segments for the
  271. video and audio stream of a video. Since we watermark only the audio track of a
  272. video, the video segments can be sent as they are (and different users can get
  273. the same video segments). What is watermarked are the audio segments only, so
  274. here instead of sending the original audio segments to the user, the audio
  275. segments are watermarked individually for each user, and then transmitted.
  276. Everything necessary to watermark HLS audio segments is available within
  277. `audiowmark`. The server side support which is necessary to send the right
  278. watermarked segment to the right user is not included.
  279. [[hls-requirements]]
  280. === HLS Requirements
  281. HLS support requires some headers/libraries from ffmpeg:
  282. * libavcodec
  283. * libavformat
  284. * libavutil
  285. * libswresample
  286. To enable these as dependencies and build `audiowmark` with HLS support, use the
  287. `--with-ffmpeg` configure option:
  288. [subs=+quotes]
  289. ....
  290. *$ ./configure --with-ffmpeg*
  291. ....
  292. In addition to the libraries, `audiowmark` also uses the two command line
  293. programs from ffmpeg, so they need to be installed:
  294. * ffmpeg
  295. * ffprobe
  296. === Preparing HLS segments
  297. The first step for preparing content for streaming with HLS would be splitting
  298. a video into segments. For this documentation, we use a very simple example
  299. using ffmpeg. No matter what the original codec was, at this point we force
  300. transcoding to AAC with our target bit rate, because during delivery the stream
  301. will be in AAC format.
  302. [subs=+quotes]
  303. ....
  304. *$ ffmpeg -i video.mp4 -f hls -master_pl_name replay.m3u8 -c:a aac -ab 192k \
  305. -var_stream_map "a:0,agroup:aud v:0,agroup:aud" \
  306. -hls_playlist_type vod -hls_list_size 0 -hls_time 10 vs%v/out.m3u8*
  307. ....
  308. This splits the `video.mp4` file into an audio stream of segments in the `vs0`
  309. directory and a video stream of segments in the `vs1` directory. Each segment
  310. is approximately 10 seconds long, and a master playlist is written to
  311. `replay.m3u8`.
  312. Now we can add the relevant audio context to each audio ts segment. This is
  313. necessary so that when the segment is watermarked in order to be transmitted to
  314. the user, `audiowmark` will have enough context available before and after the
  315. segment to create a watermark which sounds correct over segment boundaries.
  316. [subs=+quotes]
  317. ....
  318. *$ audiowmark hls-prepare vs0 vs0prep out.m3u8 video.mp4*
  319. AAC Bitrate: 195641 (detected)
  320. Segments: 18
  321. Time: 2:53
  322. ....
  323. This steps reads the audio playlist `vs0/out.m3u8` and writes all segments
  324. contained in this audio playlist to a new directory `vs0prep` which
  325. contains the audio segments prepared for watermarking.
  326. The last argument in this command line is `video.mp4` again. All audio
  327. that is watermarked is taken from this audio master. It could also be
  328. supplied in `wav` format. This makes a difference if you use lossy
  329. compression as target format (for instance AAC), but your original
  330. video has an audio stream with higher quality (i.e. lossless).
  331. === Watermarking HLS segments
  332. So with all preparations made, what would the server have to do to send a
  333. watermarked version of the 6th audio segment `vs0prep/out5.ts`?
  334. [subs=+quotes]
  335. ....
  336. *$ audiowmark hls-add vs0prep/out5.ts send5.ts 0123456789abcdef0011223344556677*
  337. Message: 0123456789abcdef0011223344556677
  338. Strength: 10
  339. Time: 0:15
  340. Sample Rate: 44100
  341. Channels: 2
  342. Data Blocks: 0
  343. AAC Bitrate: 195641
  344. ....
  345. So instead of sending out5.ts (which has no watermark) to the user, we would
  346. send send5.ts, which is watermarked.
  347. In a real-world use case, it is likely that the server would supply the input
  348. segment on stdin and send the output segment as written to stdout, like this
  349. [subs=+quotes]
  350. ....
  351. *$ [...] | audiowmark hls-add - - 0123456789abcdef0011223344556677 | [...]*
  352. [...]
  353. ....
  354. The usual parameters are supported in `audiowmark hls-add`, like
  355. --key <filename>::
  356. Use watermarking key from file <filename> (see <<key>>).
  357. --strength <s>::
  358. Set the watermarking strength (see <<strength>>).
  359. The AAC bitrate for the output segment can be set using:
  360. --bit-rate <bit_rate>::
  361. Set the AAC bit-rate for the generated watermarked segment.
  362. The rules for the AAC bit-rate of the newly encoded watermarked segment are:
  363. * if the --bit-rate option is used during `hls-add`, this bit-rate will be used
  364. * otherwise, if the `--bit-rate` option is used during `hls-prepare`, this bit-rate will be used
  365. * otherwise, the bit-rate of the input material is detected during `hls-prepare`
  366. == Dependencies
  367. If you compile from source, `audiowmark` needs the following libraries:
  368. * libfftw3
  369. * libsndfile
  370. * libgcrypt
  371. * libzita-resampler
  372. * libmpg123
  373. If you want to build with HTTP Live Streaming support, see also
  374. <<hls-requirements>>.
  375. == Building fftw
  376. `audiowmark` needs the single prevision variant of fftw3.
  377. If you are building fftw3 from source, use the `--enable-float`
  378. configure parameter to build it, e.g.::
  379. cd ${FFTW3_SOURCE}
  380. ./configure --enable-float --enable-sse && \
  381. make && \
  382. sudo make install
  383. or, when building from git
  384. cd ${FFTW3_GIT}
  385. ./bootstrap.sh --enable-shared --enable-sse --enable-float && \
  386. make && \
  387. sudo make install
  388. == Docker Build
  389. You should be able to execute `audiowmark` via Docker.
  390. Example that outputs the usage message:
  391. docker build -t audiowmark .
  392. docker run -v <local-data-directory>:/data -it audiowmark -h