rFerns.Rd 6.0 KB

  1. % Generated by roxygen2: do not edit by hand
  2. % Please edit documentation in R/ferns.R
  3. \name{rFerns}
  4. \alias{rFerns}
  5. \alias{rFerns.formula}
  6. \alias{rFerns.matrix}
  7. \alias{rFerns.default}
  8. \title{Classification with random ferns}
  9. \usage{
  10. rFerns(x, ...)
  11. \method{rFerns}{formula}(formula, data = .GlobalEnv, ...)
  12. \method{rFerns}{matrix}(x, y, ...)
  13. \method{rFerns}{default}(x, y, depth = 5, ferns = 1000,
  14. importance = "none", saveForest = TRUE, consistentSeed = NULL,
  15. threads = 0, ...)
  16. }
  17. \arguments{
  18. \item{x}{Data frame containing attributes; must have unique names and contain only numeric, integer or (ordered) factor columns.
  19. Factors must have less than 31 levels. No \code{NA} values are permitted.}
  20. \item{...}{For formula and matrix methods, a place to state parameters to be passed to default method.
  21. For the print method, arguments to be passed to \code{print}.}
  22. \item{formula}{alternatively, formula describing model to be analysed.}
  23. \item{data}{in which to interpret formula.}
  24. \item{y}{A decision vector. Must a factor of the same length as \code{nrow(X)} for ordinary many-label classification, or a logical matrix with each column corresponding to a class for multi-label classification.}
  25. \item{depth}{The depth of the ferns; must be in 1--16 range. Note that time and memory requirements scale with \code{2^depth}.}
  26. \item{ferns}{Number of ferns to be build.}
  27. \item{importance}{Set to calculate attribute importance measure (VIM);
  28. \code{"simple"} will calculate the default mean decrease of true class score (MDTS, something similar to Random Forest's MDA/MeanDecreaseAccuracy),
  29. \code{"shadow"} will calculate MDTS and additionally MDTS of this attribute shadow, an implicit feature build by shuffling values within it, thus stripping it from information (which is slightly slower).
  30. Shadow importance is useful as a reference to judge significance of a regular importance.
  31. \code{"none"} turns importance calculation off, for a slightly faster execution.
  32. For compatibility with pre-1.2 rFerns, \code{TRUE} will resolve to \code{"simple"} and \code{FALSE} to \code{"none"}.
  33. Abbreviation can be used instead of a full value.}
  34. \item{saveForest}{Should the model be saved? It must be \code{TRUE} if you want to use the model for prediction; however, if you are interested in importance or OOB error only, setting it to \code{FALSE} significantly improves memory requirements, especially for large \code{depth} and \code{ferns}.}
  35. \item{consistentSeed}{PRNG seed used for shadow importance \emph{only}.
  36. Must be either a 2-element integer vector or \code{NULL}, which corresponds to seeding from the default PRNG.}
  37. \item{threads}{Number or OpenMP threads to use. The default value of \code{0} means all available to OpenMP.
  38. It should be set to the same value in two merged models to make shadow importance meaningful.}
  39. }
  40. \value{
  41. An object of class \code{rFerns}, which is a list with the following components:
  42. \item{model}{The built model; \code{NULL} if \code{saveForest} was \code{FALSE}.}
  43. \item{oobErr}{OOB approximation of accuracy.
  44. Ignores never-OOB-tested objects (see oobScores element).}
  45. \item{importance}{The importance scores or \code{NULL} if \code{importance} was set to \code{"none"}.
  46. In a first case it is a \code{data.frame} with two or three columns:
  47. \code{MeanScoreLoss} which is a mean decrease of a score of a correct class when a certain attribute is permuted,
  48. \code{Tries} which is number of ferns which utilised certain attribute, and, only when \code{importance} was set to \code{"shadow"},
  49. \code{Shadow}, which is a mean decrease of accuracy for the correct class for a permuted copy of an attribute (useful as a baseline for normal importance).
  50. The \code{rownames} are set and equal to the \code{names(x)}.}
  51. \item{oobScores}{A matrix of OOB scores of each class for each object in training set.
  52. Rows correspond to classes in the same order as in \code{levels(Y)}.
  53. If the \code{ferns} is too small, some columns may contain \code{NA}s, what means that certain objects were never in test set.}
  54. \item{oobPreds}{A vector of OOB predictions of class for each object in training set. Never-OOB-tested objects (see above) have predictions equal to \code{NA}.}
  55. \item{oobConfusionMatrix}{Confusion matrix build from \code{oobPreds} and \code{y}.}
  56. \item{timeTaken}{Time used to train the model (smaller than wall time because data preparation and model final touches are excluded; however it includes the time needed to compute importance, if it applies).
  57. An object of \code{difftime} class.}
  58. \item{parameters}{Numerical vector of three elements: \code{classes}, \code{depth} and \code{ferns}, containing respectively the number of classes in decision and copies of \code{depth} and \code{ferns} parameters.}
  59. \item{classLabels}{Copy of \code{levels(Y)} after purging unused levels.}
  60. \item{consistentSeed}{Consistent seed used; only present for \code{importance="shadow"}.
  61. Can be used to seed a new model via \code{consistentSeed} argument.}
  62. \item{isStruct}{Copy of the train set structure, required internally by predict method.}
  63. }
  64. \description{
  65. This function builds a random ferns model on the given training data.
  66. }
  67. \note{
  68. The unused levels of the decision will be removed; on the other hand unused levels of categorical attributes will be preserved, so that they could be present in the data later predicted with the model.
  69. The levels of ordered factors in training and predicted data must be identical.
  70. Do not use formula interface for a data with large number of attributes; the overhead from handling the formula may be significant.
  71. }
  72. \examples{
  73. set.seed(77)
  74. #Fetch Iris data
  75. data(iris)
  76. #Build model
  77. rFerns(Species~.,data=iris)
  78. ##Importance
  79. rFerns(Species~.,data=iris,importance="shadow")->model
  80. print(model$imp)
  81. }
  82. \references{
  83. Ozuysal M, Calonder M, Lepetit V & Fua P. (2009). \emph{Fast Keypoint Recognition using Random Ferns}, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448-461.
  84. Kursa MB (2014). \emph{rFerns: An Implementation of the Random Ferns Method for General-Purpose Machine Learning}, Journal of Statistical Software, 61(10), 1-13.
  85. }