12345678910111213141516171819202122232425262728293031323334353637383940 |
- This is Ugene's (http://ugene.net/) fork of the CLARK tool
- (http://clark.cs.ucr.edu/Tool/), with supports building DB directly from
- gzip & 7z packed RefSeq files
- CLARK: CLAssifier based on Reduced K-mers
- The problem of DNA sequence classification is central to several
- application domains in molecular biology, genomics, metagenomics and
- genetics. The problem is computationally challenging due to the size of
- datasets generated by modern sequencing instruments and the growing size
- of reference sequence databases.
- CLARK is a novel method for supervised sequence classification based on
- discriminative k-mers. Somewhat unique among other metagenomic and
- genomic classification methods, CLARK provides a confidence score for
- its assignments which can be used in downstream analysis. The utility of
- CLARK is demonstrated on two distinct specific classification problems:
- 1) the assignment of metagenomic reads to known bacterial genomes
- 2) the assignment of BAC clones and transcript to chromosome arms (in
- the absence of a finished assembly for the reference genome).
- Three classifiers or variants in the CLARK framework are provided :
- CLARK (default): created for powerful workstation, it may require a
- significant amount of RAM to run with large database (e.g., all
- bacterial genomes from NCBI/RefSeq). This classifier queries k-mers
- with exact matching.
- CLARK-l (light): created for workstations with limited memory, this
- software tool provides precise classification on small metagenomes.
- Indeed, for metagenomics analysis, CLARK-l works with a sparse or
- "light" database (up to 4 GB of RAM) that is built using distant and
- non-overlapping k-mers. This classifier queries k-mers with exact
- matching.
- CLARK-S (spaced): created for powerful workstation exploiting spaced k-
- mers, this classifier requires a higher RAM usage than CLARK or CLARK-l,
- but it does offer a higher sensitivity. CLARK-S completes the CLARK
- series of classifiers.
|