123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104 |
- = SVM Toolkit
- source:: https://notabug.org/peterlane/svm_toolkit/
- == Description
- Support-vector machines are a popular tool in data mining. This package
- includes an amended version of the Java implementation of the libsvm library
- (version 3.11). Additional methods and examples are provided to support
- standard training techniques, such as cross-validation, and simple
- visualisations. Training/testing of models can use a variety of built-in or
- user-defined evaluation methods, including overall accuracy, geometric mean,
- precision and recall.
- == Features
- - All features of LibSVM 3.11 are supported, and many are augmented with Ruby wrappers.
- - Loading Problem definitions from file in Svmlight, Csv or Arff (simple subset) format.
- - Creating Problem definitions from values supplied programmatically in arrays.
- - Rescaling of feature values.
- - Integrated cost/gamma search for model with RBF kernel, taking advantage of multiple cores.
- - Contour plot visualisation of cost/gamma search results.
- - Model provides value of w-squared for hyperplane.
- - svm-demo application, a version of the svm_toy applet which comes with libsvm.
- - Model stores indices of training instances used as support vectors.
- - User-selected evaluation techniques supported in Model#evaluate_dataset and
- Svm.cross_validation_search.
- - Library provides evaluation classes for Cohen's Kappa statistics, F-measure,
- geometric-mean, Matthews Correlation Coefficient, overall-accuracy,
- precision, and recall.
- == Example
- The following example illustrates how a dataset can be constructed in code, and
- an SVM model created and tested against the different kernels.
- require "svm_toolkit"
- include SvmToolkit
-
- puts "Classification with LIBSVM"
- puts "--------------------------"
-
- # Sample dataset: the 'Play Tennis' dataset
- # from T. Mitchell, Machine Learning (1997)
- # --------------------------------------------
- # Labels for each instance in the training set
- # 1 = Play, 0 = Not
- Labels = [0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0]
-
- # Recoding the attribute values into range [0, 1]
- Instances = [
- [0.0,1.0,1.0,0.0],
- [0.0,1.0,1.0,1.0],
- [0.5,1.0,1.0,0.0],
- [1.0,0.5,1.0,0.0],
- [1.0,0.0,0.0,0.0],
- [1.0,0.0,0.0,1.0],
- [0.5,0.0,0.0,1.0],
- [0.0,0.5,1.0,0.0],
- [0.0,0.0,0.0,0.0],
- [1.0,0.5,0.0,0.0],
- [0.0,0.5,0.0,1.0],
- [0.5,0.5,1.0,1.0],
- [0.5,1.0,0.0,0.0],
- [1.0,0.5,1.0,1.0]
- ]
-
- # create some arbitrary train/test split
- TrainingSet = Problem.from_array(Instances.slice(0, 10), Labels.slice(0, 10))
- TestSet = Problem.from_array(Instances.slice(10, 4), Labels.slice(10, 4))
-
- # Iterate over each kernel type
- Parameter.kernels.each do |kernel|
-
- # -- train model for this kernel type
- params = Parameter.new(
- :svm_type => Parameter::C_SVC,
- :kernel_type => kernel,
- :cost => 10,
- :degree => 1,
- :gamma => 100
- )
- model = Svm.svm_train(TrainingSet, params)
-
- # -- test kernel performance on the training set
- errors = model.evaluate_dataset(TrainingSet, :print_results => true)
- puts "Kernel #{Parameter.kernel_name(kernel)} has #{errors} on the training set"
-
- # -- test kernel performance on the test set
- errors = model.evaluate_dataset(TestSet, :print_results => true)
- puts "Kernel #{Parameter.kernel_name(kernel)} has #{errors} on the test set"
- end
- More examples can be found in the source code, linked above.
- == Acknowledgements
- The svm_toolkit is based on LibSVM, which is available from:
- http://www.csie.ntu.edu.tw/~cjlin/libsvm/
- The contour plot uses the PlotPackage library, available from:
- http://thehuwaldtfamily.org/java/Packages/Plot/PlotPackage.html
|