README.rdoc 3.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
  1. = SVM Toolkit
  2. source:: https://notabug.org/peterlane/svm_toolkit/
  3. == Description
  4. Support-vector machines are a popular tool in data mining. This package
  5. includes an amended version of the Java implementation of the libsvm library
  6. (version 3.11). Additional methods and examples are provided to support
  7. standard training techniques, such as cross-validation, and simple
  8. visualisations. Training/testing of models can use a variety of built-in or
  9. user-defined evaluation methods, including overall accuracy, geometric mean,
  10. precision and recall.
  11. == Features
  12. - All features of LibSVM 3.11 are supported, and many are augmented with Ruby wrappers.
  13. - Loading Problem definitions from file in Svmlight, Csv or Arff (simple subset) format.
  14. - Creating Problem definitions from values supplied programmatically in arrays.
  15. - Rescaling of feature values.
  16. - Integrated cost/gamma search for model with RBF kernel, taking advantage of multiple cores.
  17. - Contour plot visualisation of cost/gamma search results.
  18. - Model provides value of w-squared for hyperplane.
  19. - svm-demo application, a version of the svm_toy applet which comes with libsvm.
  20. - Model stores indices of training instances used as support vectors.
  21. - User-selected evaluation techniques supported in Model#evaluate_dataset and
  22. Svm.cross_validation_search.
  23. - Library provides evaluation classes for Cohen's Kappa statistics, F-measure,
  24. geometric-mean, Matthews Correlation Coefficient, overall-accuracy,
  25. precision, and recall.
  26. == Example
  27. The following example illustrates how a dataset can be constructed in code, and
  28. an SVM model created and tested against the different kernels.
  29. require "svm_toolkit"
  30. include SvmToolkit
  31. puts "Classification with LIBSVM"
  32. puts "--------------------------"
  33. # Sample dataset: the 'Play Tennis' dataset
  34. # from T. Mitchell, Machine Learning (1997)
  35. # --------------------------------------------
  36. # Labels for each instance in the training set
  37. # 1 = Play, 0 = Not
  38. Labels = [0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0]
  39. # Recoding the attribute values into range [0, 1]
  40. Instances = [
  41. [0.0,1.0,1.0,0.0],
  42. [0.0,1.0,1.0,1.0],
  43. [0.5,1.0,1.0,0.0],
  44. [1.0,0.5,1.0,0.0],
  45. [1.0,0.0,0.0,0.0],
  46. [1.0,0.0,0.0,1.0],
  47. [0.5,0.0,0.0,1.0],
  48. [0.0,0.5,1.0,0.0],
  49. [0.0,0.0,0.0,0.0],
  50. [1.0,0.5,0.0,0.0],
  51. [0.0,0.5,0.0,1.0],
  52. [0.5,0.5,1.0,1.0],
  53. [0.5,1.0,0.0,0.0],
  54. [1.0,0.5,1.0,1.0]
  55. ]
  56. # create some arbitrary train/test split
  57. TrainingSet = Problem.from_array(Instances.slice(0, 10), Labels.slice(0, 10))
  58. TestSet = Problem.from_array(Instances.slice(10, 4), Labels.slice(10, 4))
  59. # Iterate over each kernel type
  60. Parameter.kernels.each do |kernel|
  61. # -- train model for this kernel type
  62. params = Parameter.new(
  63. :svm_type => Parameter::C_SVC,
  64. :kernel_type => kernel,
  65. :cost => 10,
  66. :degree => 1,
  67. :gamma => 100
  68. )
  69. model = Svm.svm_train(TrainingSet, params)
  70. # -- test kernel performance on the training set
  71. errors = model.evaluate_dataset(TrainingSet, :print_results => true)
  72. puts "Kernel #{Parameter.kernel_name(kernel)} has #{errors} on the training set"
  73. # -- test kernel performance on the test set
  74. errors = model.evaluate_dataset(TestSet, :print_results => true)
  75. puts "Kernel #{Parameter.kernel_name(kernel)} has #{errors} on the test set"
  76. end
  77. More examples can be found in the source code, linked above.
  78. == Acknowledgements
  79. The svm_toolkit is based on LibSVM, which is available from:
  80. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
  81. The contour plot uses the PlotPackage library, available from:
  82. http://thehuwaldtfamily.org/java/Packages/Plot/PlotPackage.html