README 7.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214
  1. SongRec is an open-source Shazam client for Linux, written in Rust.
  2. Features:
  3. * Recognize audio from an arbitrary audio file.
  4. * Recognize audio from the microphone.
  5. * Usage from both GUI and command line (for the file recognition part).
  6. * Provide an history of the recognized songs on the GUI, exportable to
  7. CSV.
  8. * Continuous song detection from the microphone, with the ability to
  9. choose your input device.
  10. * Ability to recognize songs from your speakers rather than your
  11. microphone (on compatible PulseAudio setups).
  12. * Generate a lure from a song that, when played, will fool Shazam into
  13. thinking that it is the concerned song.
  14. A (command-line only) Python version, which I made before rewriting in
  15. Rust for performance, is also available for demonstration purposes. It
  16. supports file recognition only.
  17. ## How it works
  18. For useful information about how audio fingerprinting works, you may
  19. want to read [this article](http://coding-geek.com/how-shazam-works/).
  20. To be put simply, Shazam generates a spectrogram (a time/frequency 2D
  21. graph of the sound, with amplitude at intersections) of the sound, and
  22. maps out the frequency peaks from it (which should match key points of
  23. the harmonics of voice or of certains instruments).
  24. Shazam also downsamples the sound at 16 KHz before processing, and cuts
  25. the sound in four bands of 250-520 Hz, 520-1450 Hz, 1450-3500 Hz,
  26. 3500-5500 Hz (so that if a band is too much scrambled by noise,
  27. recognition from other bands may apply). The frequency peaks are then
  28. sent to the servers, which subsequently look up the strongest peaks in
  29. a database, in order look for the simultaneous presence of neighboring
  30. peaks both in the associated reference fingerprints and in the
  31. fingerprint we sent.
  32. Hence, the Shazam fingerprinting algorithm, as implemented by the
  33. client, is fairly simple, as much of the processing is done
  34. server-side. The general functionment of Shazam has been documented in
  35. public [research
  36. papers](https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf) and
  37. patents.
  38. Note: It is not mandatory, but if you want to be able to recognize more
  39. formats than WAV, OGG, FLAC and MP3, you should ensure that you have
  40. the `ffmpeg` package installed.
  41. ## Compilation
  42. (**WARNING**: Remind to compile the code in "--release" mode for
  43. correct performance.)
  44. ### Installing Rust
  45. First, you need to [install the Rust compiler and package
  46. manager](https://www.rust-lang.org/tools/install). It has been observed
  47. to work with `rustc` 1.43.0 to the current rustc 1.47.0.
  48. Install Rust and put it in path, for all distributions:
  49. ```bash
  50. curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Type
  51. "1"
  52. # Login and reconnect to add Rust to the $PATH, or run:
  53. source $HOME/.cargo/env
  54. # If you already installed Rust, then update it:
  55. rustup update
  56. ```
  57. ### Install dependent libraries (nothing exotic)
  58. Debian:
  59. ```bash
  60. sudo apt install build-essential libasound2-dev libgtk-3-dev libssl-dev
  61. ```
  62. Void Linux (libressl):
  63. ```shell
  64. sudo xbps-install base-devel alsa-lib-devel gtk+3-devel libressl-devel
  65. ```
  66. Void Linux (openssl):
  67. ```shell
  68. sudo xbps-install base-devel alsa-lib-devel gtk+3-devel openssl-devel
  69. ```
  70. ### Compiling the project
  71. This will compile and run the projet:
  72. ```bash
  73. # For the stable release:
  74. cargo install songrec
  75. songrec
  76. # For the Github tree:
  77. git clone git@github.com:marin-m/songrec.git
  78. cd songrec
  79. cargo run --release
  80. ```
  81. For the latter, you will then find the project's binary (that you will
  82. be able to move or execute directly) at `target/release/songrec`.
  83. ## Sample usage
  84. Passing no arguments or using the `gui` subcommand will launch the GUI,
  85. and try to recognize audio real-time as soon as the application is
  86. launched:
  87. ```
  88. ./songrec
  89. ./songrec gui
  90. ```
  91. Using the `gui-norecording` subcommand will launch the GUI without
  92. recognizing audio as soon as the software is started (you will need to
  93. click the "Turn on microphone recognition" button to do so):
  94. ```
  95. ./songrec gui-norecording
  96. ```
  97. The GUI allows you to recognize songs either from your microphone,
  98. speakers (on compatible PulseAudio setups), or from an audio file. The
  99. MP3, FLAC, WAV and OGG formats should be accepted for audio files if
  100. FFMpeg is not installed, and any audio or video formats supported by
  101. FFMpeg should be accepted if FFMpeg is installed.
  102. The following commands allow to recognize sound from your microphone or
  103. from a file using the command line (`listen` runs while the microphone
  104. is usable while `recognize` recognizes only one song), use the `-h`
  105. flag in order to see all the available options:
  106. ```
  107. ./songrec listen -h
  108. ./songrec recognize -h
  109. ```
  110. By default, only the artist and track name of the concerned song are
  111. displayed to the standard output, and other information may be
  112. displayed to the error output. The `--csv` and `--json` options allow
  113. to display more programmatically usable information to the standard
  114. output.
  115. The above decribes the newer CLI interface of SongRec, but an older
  116. interface, operating only on audio files or raw audio fingerprints, is
  117. also available and described below.
  118. The following subcommand will try to recognize audio from the middle of
  119. an audio file, and print the JSON response from Shazam servers:
  120. ```
  121. ./songrec audio-file-to-recognized-song sound_file.mp3
  122. ```
  123. The following subcommands will do the same with an intermediary step,
  124. manipulating data-URI audio fingerprints as used by Shazam internally:
  125. ```
  126. ./songrec audio-file-to-fingerprint sound_file.mp3
  127. ./songrec fingerprint-to-recognized-song
  128. 'data:audio/vnd.shazam.sig;base64,...'
  129. ```
  130. The following will produce back hearable tones from a given
  131. fingerprint, that should be able to fool Shazam into thinking that this
  132. is the original song (either to the default audio output device, or to
  133. a .WAV file):
  134. ```
  135. ./songrec fingerprint-to-lure 'data:audio/vnd.shazam.sig;base64,...'
  136. ./songrec fingerprint-to-lure 'data:audio/vnd.shazam.sig;base64,...'
  137. /tmp/output.wav
  138. ```
  139. When using the application, you may notice that certain information
  140. will be saved to `~/.local/share/SongRec` (or an equivalent directory
  141. depending on your operating system), including the CSV-format list of
  142. the last recognized songs and the last selected microphone input device
  143. (so that it is chosen back when restarting the app). You may want to
  144. delete this directory in case of persistent issues.
  145. ## Privacy
  146. SongRec collects no data and contacts no other servers than Shazam's.
  147. SongRec does not upload raw audio data anywhere: only fingerprints of
  148. the audio are uploaded, which means sequences of frequency peaks
  149. encoded in the form of "(frequency, amplitude, time)" tuples.
  150. This does not suffice to represent anything hearable alone (use the
  151. "Play a Shazam lure" button to see how much this is different from full
  152. sound); that means that no actually hearable sound (e.g voice
  153. fragments) is sent to servers, only metadata derived on the
  154. characteristics of the sound that may only suffice to recognize a song
  155. already known by Shazam is being sent.
  156. ## Legal
  157. This software is released under the [GNU GPL
  158. v3](https://www.gnu.org/licenses/gpl-3.0.html) license. It was created
  159. with the intent of providing interoperability between the remote Shazam
  160. services and Linux-based deskop systems.
  161. Please note that in certain countries located outside of the European
  162. Union, especially the United States, software patents may apply.