README 1.9 KB

12345678910111213141516171819202122232425262728293031323334353637383940
  1. This is Ugene's (http://ugene.net/) fork of the CLARK tool
  2. (http://clark.cs.ucr.edu/Tool/), with supports building DB directly from
  3. gzip & 7z packed RefSeq files
  4. CLARK: CLAssifier based on Reduced K-mers
  5. The problem of DNA sequence classification is central to several
  6. application domains in molecular biology, genomics, metagenomics and
  7. genetics. The problem is computationally challenging due to the size of
  8. datasets generated by modern sequencing instruments and the growing size
  9. of reference sequence databases.
  10. CLARK is a novel method for supervised sequence classification based on
  11. discriminative k-mers. Somewhat unique among other metagenomic and
  12. genomic classification methods, CLARK provides a confidence score for
  13. its assignments which can be used in downstream analysis. The utility of
  14. CLARK is demonstrated on two distinct specific classification problems:
  15. 1) the assignment of metagenomic reads to known bacterial genomes
  16. 2) the assignment of BAC clones and transcript to chromosome arms (in
  17. the absence of a finished assembly for the reference genome).
  18. Three classifiers or variants in the CLARK framework are provided :
  19. CLARK (default): created for powerful workstation, it may require a
  20. significant amount of RAM to run with large database (e.g., all
  21. bacterial genomes from NCBI/RefSeq). This classifier queries k-mers
  22. with exact matching.
  23. CLARK-l (light): created for workstations with limited memory, this
  24. software tool provides precise classification on small metagenomes.
  25. Indeed, for metagenomics analysis, CLARK-l works with a sparse or
  26. "light" database (up to 4 GB of RAM) that is built using distant and
  27. non-overlapping k-mers. This classifier queries k-mers with exact
  28. matching.
  29. CLARK-S (spaced): created for powerful workstation exploiting spaced k-
  30. mers, this classifier requires a higher RAM usage than CLARK or CLARK-l,
  31. but it does offer a higher sensitivity. CLARK-S completes the CLARK
  32. series of classifiers.