1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980 |
- = UHFerret
- homepage:: https://peterlane.codeberg.page/ferret/
- source:: https://codeberg.org/peterlane/uhferret-gem/tags
- == Description
- UHFerret is a copy-detection tool, supporting the analysis of large sets of
- documents to find pairs of documents with substantial amounts of lexical
- copying. Documents containing either natural language (e.g. English) or
- computer programs (in C-family) may be processed.
- This library provides a Ruby wrapper around uhferret suitable for
- scripting, a command-line executable, 'uhferret', and a simple
- server version, 'uhferret-server'.
- NB: to install uhferret, Ruby must be able to compile and build C extensions.
- == Use
- === Command Line
- Usage: uhferret [options] file1 file2 ...
- -h, --help help message
- -c, --code process documents as code
- -t, --text process documents as text (default)
- -d, --data-table output similarity table (default)
- -l, --list-trigrams output trigram list
- -a, --all-comparisons output list of all comparisons
- -x, --xml-report FILE generate xml report from two documents
- -f, --definition-file FILE read document names from file
- To compute the similarities of a set of files, use:
- $ uhferret file1.txt file2.txt ...
- An xml output can be generated for a pair of files using:
- $ uhferret -x outfile.xml file1.txt file2.txt
- The xml output can be displayed in a browser using the style sheet
- 'uhferret.xsl' in the examples folder, and then printed from the browser.
- === Program
- Ferret can also be used as a library, and called from within a program.
- For example:
- ferret = Ferret.new
- ferret.add 'filename1.txt'
- ferret.add 'filename2.txt'
- ferret.run
- ferret.output_similarity_table
- Will create a new instance of Ferret, add two documents, run and then output the
- similarity between the two.
- === Server
- Usage: uhferret-server [options]
- -h, --help help message
- -p, --port n port number
- -f, --folder FOLDER base folder
- The folder to store the processed files will default to
- 'FerretFiles' and the port to 2000.
- Initial address: http://localhost:2000/ferret/home
- NB: The server uses some \*nix commands, and so currently does not work
- under Windows.
- == Acknowledgements
- UHFerret has been developed at the University of Hertfordshire by members of
- the Plagiarism Detection Group. The original concept of using trigrams for
- measuring copying was developed by Caroline Lyon and James Malcolm. JunPeng
- Bao, Ruth Barrett and Bob Dickerson also contributed to the development of
- earlier versions of Ferret.
|