Content Addressable Archival System
Timothy Rice aeace7f8da Add License | 3 anni fa | |
---|---|---|
LICENSE | 3 anni fa | |
README.md | 3 anni fa | |
carcass | 3 anni fa | |
carcass-absorb | 3 anni fa | |
carcass-ls | 3 anni fa |
It is widely known that while Git can store binary data, it's not optimal. Perhaps less widely known, but more important if you are dealing with a lot of binary files, is that in fact Git begins to display significant performance issues if you try to add large archives of binary data. The more binary data you add, the more noticeable the issues will become, until the repository becomes almost impossible to work with.
Yet, Git does follow certain principles which could be useful for binary archives, provided you begin with a different set of optimization priorities.
Carcass borrows certain principles from Git, such as offline-first storage of objects named after their own hash. However, it is focused on the following types of data:
These considerations suggest a system focused on uncompressed tar archives of one collection per directory. Due to the flat structure, lack of extra compression, and no expectation of diffs, it is possible to build a system that is not only better than Git for binary files, but is also much simpler to build.
Note that Carcass is currently a prototype for building a proof-of-concept. Don't expect it to be bug-free or feature-rich.
Low priorities:
rsync
.Like Git, the Carcass command itself is essentially a wrapper to any number of subcommands, which in turn are discovered from your PATH
in the form of carcass-<subcmd>
.
At the moment, there are two subcommands:
carcass ls
shows all the current objects under .carcass
along with the root directory inside those archives.carcass absorb <directory>
basically tars up the target directory, stores it as an object under .carcass
, and deletes the original directory.