the-salty-bible.md 22 KB

The Salty Bible

Get Salty's MusicBrainz Picard Script...

1. Organization

1.1. Use a meaningful and consistent library structure

Get Salty's MusicBrainz Picard Naming Script...

  • Including the date at the beginning allows sorting releases chronologically and helps identify releases at a glance.

  • Including the catalog number or barcode allows identifying releases at a glance. When both are available the catalog number is prioritized because it is easier to differentiate at a glance.

  • Grouping tracks into folders by disc subtitle makes it possible to have different covers for each disc in a box set without embedding.

  • Replacing disallowed characters with "_" ensures a folder or file with an empty name is not created, minimizes loss of information and loss of uncertainty when glancing at the files. The underscore character is a good replacement candidate due to it being easy to distinguish and type. Disallowed characters are not replaced with similar looking unicode variants because that will end up producing misleading output.

  • The "." character is disallowed because on Windows folders cannot end with it.

  • Audio quality is not included because its format differs by audio format and handling releases with multiple audio formats significantly increases the complexity of the format script.

  • Rip log score is not included because there are multiple definitions for scoring and computing a score significantly increases the complexity of the format script.

  • Releases are not grouped by genre as it is subjective and a release can belong to many genres.

  • Do not replace spaces with underscores as it makes the path hard to read and results in loss of information when disallowed characters are already replaced with "_".

  • Keeping more information in the path helps users find the releases when sharing on P2P platforms.

1.2. Lowercase all file extensions

In a sampling of 3.3 million files with a file extension around 1% had an uppercase extension. The files originated from all across a daily driven desktop installation of Windows. As 99% of the files had a lowercase extension, it is reasonable to consider it standard and expect that not all software will support uppercase extensions.

1.3. Generate a fresh AccurateRip log for all ingested releases with a rip log

AccurateRip logs (.accurip) generated by CueTools are info files about the audio tracks of a CD rip. From the log it is possible to determine if the audio on disk matches other rips of the CD performed around the world. Both the CueTools and AccurateRip database are consulted for this info.

Only EAC rip logs contain data from both of the aforementioned sources while both XLD and Whipper only support AccurateRip, which also has fewer data entries than CueTools in general. In addition, as a considerable amount of shared CD rips are performed near their release date, the rip log may not have as trustworthy information as an AccurateRip log with the freshly fetched data. Furthermore, it is also possible to determine whether the audio files belong to the EAC, XLD or Whipper rip log included with the tracks.

1.4. Use proper physically identifiable names for scans

Sequential scan names like 1.png, 2.png, etc are confusing for people who are not physically familiar with the specific release or packaging type, same goes for owners coming back after having lost their physical copy.

  1. To label scans of a series of items (e.g. inserts, booklet pages, booklets themselves, etc), add N to the scan name, where N is the number of the scan. However, if there is only one scan, there is no need to add the numerical suffix, since it is understood that there is only one item.

  2. If both the front and back of an item are included in separate files, add Front or Back to the scan name. However, if only the front scan is present, there is no need to add the side suffix, since it is understood that the scan is of the primary visual asset, which is the front of the item.

  3. If both the front and back of a foldable item are included in a single file, add Outside to the scan name. The same principle can be applied to a scan of the inside of an item, where Inside is added to the scan name.

  4. If there are multiple scans of the same surface, add descriptive terms (e.g. with Sticker, with Obi, etc) to the scan name to differentiate them.

Possible scan names (excluding suffixes) and their physical counterparts:

Scan Name Physical Counterpart
Back Back of a disc case or a record sleeve, sometimes the spine is also included in this scan
Book Page Page of a book
Book A set of pages that have been fastened together inside a cover to be read
Booklet Page Page of a booklet
Booklet Thin book with a small number of pages and a paper cover
Card Piece of cardboard, including playing cards
Disc Optical disc, use Matrix for the back of the disc
Front Front of a disc case or a record sleeve
Insert Piece of paper, often an announcement or advertisement
Matrix Back of an optical disc
Obi Piece of paper wrapped around the spine of Japanese CDs
Postcard Postcard
Quadfold Booklet that fols out four times
Record Vinyl, shellac or acetate record, use Front and Back for side A and B
Slipcase Spine Spine of a slipcase, same concept as spine
Slipcase A box with an opening for the contents to slide out of, also has a Top and Bottom
Spine Side strip of a case, often with brief info about the release and the only visible part when the case is stored in a shelf
Sticker Adhesive piece of paper
Tray Inside of a case with the tray and its liner, sometimes with Disc or with Discs
Trifold Booklet that folded out three times

1.5. Rename .cue, .log and .accurip files to CD1, CD2, etc

Short and concise names make it easier to identify the file type and avoid having problems with the path length limit.

1.6. Delete .m3u, .m3u8, foo_dr.txt and audiochecker.log files

These files provide no useful information about a release.

1.7. Use short variants for media types with multiple file extensions

Use short file extensions for .jpg and .tif files.

1.8. Do not pick apart or merge releases

Releases must be considered inseparable. Deleting, adding or swapping tracks of a release compromises its integrity: makes it unverifiable as a whole and its album level metadata not match what is actually stored on disk.

2. Formats

2.1. Encode all lossless PCM audio using libFLAC with the highest compression level

FLAC is a free, open source, well documented and broadly supported lossless audio codec. The most recent version of libFLAC, which is the reference implementation for the codec, provides the highest compression ratio when compared to earlier versions of libFLAC and the FFMPEG encoder.

2.2. Do not encode lossless to lossy or lossy to lossless

Converting from lossless to lossy reduces the quality and undermines the verifiability of the release, while converting from lossy to lossless offers no benefit and is misleading. The only scenario in which converting from lossy to lossless is acceptable when the source uses a rare and generally unsupported lossy codec.

2.3. Do not resample audio files

Resampling audio is a lossy process. If it is not done with the correct settings it will often not match the same quality when downloaded from the original source.

2.4. Keep covers square if feasible

If a cover is within about 5% of 1.0 aspect ratio, crop it to be square if at all possible. Sometimes covers include erroneous white borders, which should not be counted towards the aspect ratio and fixed by cropping.

2.5. Keep the highest resolution cover available

Keeping high resolution album covers ensures that they will remain visually clear as monitor resolutions continue to increase.

2.6. Prefer 100%LOG CD rips over low quality WEB rips

A properly logged CD rip has more verification value than a 16bit 44kHz WEB rip. Additionally, it is not worth sacrificing storage space for hires WEB rips that do not make effective use of the available bit depth and spectrum limits or originate from a low quality source.

2.7. Avoid audio above 96kHz

The storage requirements at such high resolutions are too high to be practical.

2.8. Encode and optimize all lossless scans to PNG

PNG is a widely used and broadly supported lossless image codec. Keeping all scans in the same format makes it easier to manage them. However, because many encoders cannot fully optimize the produced image files, the images should additionally be ran through an optimizer like oxipng to further save on disk space.

2.9. Do not transliterate text

Languages should be written in their native script as loss of nuance and errors are very common in transliteration. Sort tags should be used for transliterations.

2.10. Do not alter audio files

Do not edit, normalize, equalize, compress, amplify, fade or otherwise destructively and irreversably alter audio files.

2.12. Avoid rips of analog media

Analog audio often has noticable artifacts due to the nature of the ripping process. As no two analog rips are the same the rips also cannot feasibly be compared for accuracy.

2.13. Avoid MQA

MQA is a proprietary lossy audio codec misleadingly stored in a PCM container.

3. Tags

3.1. Properly utilize audio tags

Filling metadata helps music players to organize and display releases in a meaningful way. Due to the inherent limitations of what tags can reasonably convey, it is important to carefully consider what info is stored and how music players will preceive it.

Descriptions of common tags and ways to utilize them:

Tag Name Description
ALBUM Album scoped, single-valued

Title of the specific edition of the release with an edition suffix where applicable.

ALBUMARTIST Album scoped, single-valued

All album artists, except featured artists, with all join phrases as a single value. This ensures the directory structure and music players will group the releases together ignoring ephemeral pairups.

ARTIST Track scoped, single-valued, multi-valued

All track artists. Depending on how scrobbling works in the music player of choice it might be better to treat it multi-valued tag, otherwise it can be treated as single-valued to preserve the relationships between the artists.

BARCODE Album scoped, multi-valued

Release specific 14, 13, 12, 8 or 6 digit UPC/EAN barcodes. Usually found on the back of the case.

CATALOGNUMBER Album scoped, multi-valued

Release specific catalog numbers. Usually found on the side of the case or on the disc itself.

COMPOSER Track scoped, multi-valued

All non-fictional composers as mulitple values. Especially important for classical music.

DATE Album scoped, single-valued

Release date of the specific edition in "YYYY-MM-DD", "YYYY-MM" or "YYYY" format, depending on what parts are known.

DISCNUMBER Disc scoped, single-valued

Number of the disc, not padded.

DISCSUBTITLE Disc scoped, single-valued

Mainly used for box sets where each disc represents a different album or named collection of tracks. For example the Grand Theft Auto: San Andreas Official Soundtrack - Box Set release with each disc representing a radio station.

DISCTOTAL Album scoped, single-valued

Total number of discs in the release, not padded.

GENRE Track scoped, multi-valued

In addition to the standard genres, use the following genres to provide more information about a track: Instrumental.

ISRC Track scoped, single-valued

Used to help identifying tracks on streaming services and web stores.

MEDIA Album scoped, multi-valued

All mediums contained in the specific release as multiple values. Example values: Digital Media, CD, 8cm CD, CD-R, CCCD,HDCD, SACD, Vinyl, 7" Vinyl, 10" Vinyl, 12" Vinyl, Cassette, USB, VHS, DVD, BD.

Scoped to an album to allow searching for releases containing specific mediums. As music players often do not support directly including video files in a library, this makes it possible to find releases that include blurays.

MUSICBRAINZ_*ID Various scopes, multi-valued

Associates files with a concrete data source in the form of MusicBrainz database entries. MusicBrainz contains extensive data about tracks and releases that is typically not stored in tags due to its complex nature. Software can use these MusicBrainz id tags to fetch additional information for display, management or scrobbling purpose.

ORIGINALDATE Album scoped, single-valued

Original release date in YYYY-MM-DD format. Should be the release date of the oldest edition of the release aka the first release in a release group.

RELEASETYPE Album scoped, multi-valued

Example values: Album, Single, EP, Compilation, Anthology, Soundtrack, Spokenword, Live, Bootleg.

SOURCE Album scoped, single-valued

Name of the streaming service where the files stored on disk are from.

TITLE Track scoped, single-valued

Title of a track.

TRACKNUMBER Track scoped, single-valued

Number of a track on a medium, not padded.

TRACKTOTAL Disc scoped, single-valued

Total count of tracks on a medium, not padded.

URL Album scoped, single-valued

Address on the streaming service where the files stored on disk originate from in the shortest form possible to save space.

Principles:

  • Should almost always start with https:// but not with https://www..
  • If the address contains a region, set it to the region used for ripping.

Examples:

From To
https://music.apple.com/jp/album/夜に駆ける/1542182291 https://music.apple.com/jp/album/1542182291
https://qobuz.com/us-en/album/the-book-yoasobi/dslwccxo58ehc https://qobuz.com/us-en/album/dslwccxo58ehc

3.2. Always use "Various Artists" for various artists

This way all releases with various artists will be consistent and grouped together.

3.3. Do not embed images

Embedding images into each audio file of a release uses more space than storing a single image file in the folder. Additionally, embedded images are usually of lower quality to reduce space usage, despite occupying approximately the same amount of space as a single high-quality image file.

3.4. Do not include featured artists in track titles

An artist is not part of the song title.

3.5. Do not use fancy Unicode symbols for common punctuation marks

Using Unicode variants of common symbols makes management and search tasks more challenging due to the characters looking extremely similar and not always being fuzzy matched to their common counterparts.

Replacement map from fancy Unicode symbols to ASCII:

Name From To
NO-BREAK SPACE
LEFT SINGLE QUOTATION MARK '
RIGHT SINGLE QUOTATION MARK '
SINGLE LOW-9 QUOTATION MARK ,
SINGLE HIGH-REVERSED-9 QUOTATION MARK '
LEFT DOUBLE QUOTATION MARK "
RIGHT DOUBLE QUOTATION MARK "
DOUBLE LOW-9 QUOTATION MARK "
DOUBLE HIGH-REVERSED-9 QUOTATION MARK "
HYPHEN -
NON-BREAKING HYPHEN -
EN DASH -
EM DASH -
FIGURE DASH -
HORIZONTAL BAR -
ONE DOT LEADER .
TWO DOT LEADER ..
HORIZONTAL ELLIPSIS ...
DOUBLE EXCLAMATION MARK !!
DOUBLE QUESTION MARK ??
FRACTION SLASH /
DIVISION SLASH /
WAVE DASH ~
FULLWIDTH TILDE ~
FULLWIDTH LEFT PARENTHESIS (
FULLWIDTH RIGHT PARENTHESIS )
FULLWIDTH LEFT SQUARE BRACKET [
FULLWIDTH RIGHT SQUARE BRACKET ]
FULLWIDTH LESS-THAN SIGN <
FULLWIDTH GREATER-THAN SIGN >

3.6. Do not use artist sort names as artist names

Family names, honorary titles, definite articles, "The", "DJ", etc can only be moved to the end in sort tags like ARTISTSORT and ALBUMARTISTSORT as those tags are specifically meant for customizing values to ensure a specific sort order. Eveywhere else the artist should be left as it is depicted in official sources. For extensive information on constructing sort names MusicBrainz Artist Sort Name Style Guide.

{{ pageinfo(updated="2023/03/23") }}