README.rdoc 2.7 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
  1. = UHFerret
  2. homepage:: https://peterlane.codeberg.page/ferret/
  3. source:: https://codeberg.org/peterlane/uhferret-gem/tags
  4. == Description
  5. UHFerret is a copy-detection tool, supporting the analysis of large sets of
  6. documents to find pairs of documents with substantial amounts of lexical
  7. copying. Documents containing either natural language (e.g. English) or
  8. computer programs (in C-family) may be processed.
  9. This library provides a Ruby wrapper around uhferret suitable for
  10. scripting, a command-line executable, 'uhferret', and a simple
  11. server version, 'uhferret-server'.
  12. NB: to install uhferret, Ruby must be able to compile and build C extensions.
  13. == Use
  14. === Command Line
  15. Usage: uhferret [options] file1 file2 ...
  16. -h, --help help message
  17. -c, --code process documents as code
  18. -t, --text process documents as text (default)
  19. -d, --data-table output similarity table (default)
  20. -l, --list-trigrams output trigram list
  21. -a, --all-comparisons output list of all comparisons
  22. -x, --xml-report FILE generate xml report from two documents
  23. -f, --definition-file FILE read document names from file
  24. To compute the similarities of a set of files, use:
  25. $ uhferret file1.txt file2.txt ...
  26. An xml output can be generated for a pair of files using:
  27. $ uhferret -x outfile.xml file1.txt file2.txt
  28. The xml output can be displayed in a browser using the style sheet
  29. 'uhferret.xsl' in the examples folder, and then printed from the browser.
  30. === Program
  31. Ferret can also be used as a library, and called from within a program.
  32. For example:
  33. ferret = Ferret.new
  34. ferret.add 'filename1.txt'
  35. ferret.add 'filename2.txt'
  36. ferret.run
  37. ferret.output_similarity_table
  38. Will create a new instance of Ferret, add two documents, run and then output the
  39. similarity between the two.
  40. === Server
  41. Usage: uhferret-server [options]
  42. -h, --help help message
  43. -p, --port n port number
  44. -f, --folder FOLDER base folder
  45. The folder to store the processed files will default to
  46. 'FerretFiles' and the port to 2000.
  47. Initial address: http://localhost:2000/ferret/home
  48. NB: The server uses some \*nix commands, and so currently does not work
  49. under Windows.
  50. == Acknowledgements
  51. UHFerret has been developed at the University of Hertfordshire by members of
  52. the Plagiarism Detection Group. The original concept of using trigrams for
  53. measuring copying was developed by Caroline Lyon and James Malcolm. JunPeng
  54. Bao, Ruth Barrett and Bob Dickerson also contributed to the development of
  55. earlier versions of Ferret.