getImpLegacyRf.Rd 3.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
  1. % Generated by roxygen2: do not edit by hand
  2. % Please edit documentation in R/importance.R
  3. \name{getImpLegacyRf}
  4. \alias{getImpLegacyRf}
  5. \alias{getImpLegacyRfZ}
  6. \alias{getImpLegacyRfGini}
  7. \alias{getLegacyImpRfRaw}
  8. \alias{getImpLegacyRfRaw}
  9. \title{randomForest importance adapters}
  10. \usage{
  11. getImpLegacyRfZ(x, y, ...)
  12. getImpLegacyRfRaw(x, y, ...)
  13. getImpLegacyRfGini(x, y, ...)
  14. }
  15. \arguments{
  16. \item{x}{data frame of predictors including shadows.}
  17. \item{y}{response vector.}
  18. \item{...}{parameters passed to the underlying \code{\link[randomForest]{randomForest}} call; they are relayed from \code{...} of \code{\link{Boruta}}.}
  19. }
  20. \description{
  21. Those function is intended to be given to a \code{getImp} argument of \code{\link{Boruta}} function to be called by the Boruta algorithm as an importance source.
  22. \code{getImpLegacyRfZ} generates default, normalized permutation importance, \code{getImpLegacyRfRaw} raw permutation importance, finally \code{getImpLegacyRfGini} generates Gini index importance, all using \code{\link[randomForest]{randomForest}} as a Random Forest algorithm implementation.
  23. }
  24. \note{
  25. The \code{getImpLegacyRfZ} function was a default importance source in Boruta versions prior to 5.0; since then \code{\link{ranger}} Random Forest implementation is used instead of \code{\link[randomForest]{randomForest}}, for speed, memory conservation and an ability to utilise multithreading.
  26. Both importance sources should generally lead to the same results, yet there are differences.
  27. Most notably, ranger by default treats factor attributes as ordered (and works very slow if instructed otherwise with \code{respect.unordered.factors=TRUE}); on the other hand it lifts 32 levels limit specific to \code{\link[randomForest]{randomForest}}.
  28. To this end, Boruta decision for factor attributes may be different.
  29. Random Forest methods has two main parameters, number of attributes tried at each split and the number of trees in the forest; first one is called \code{mtry} in both implementations, but the second \code{ntree} in \code{\link[randomForest]{randomForest}} and \code{num.trees} in \code{\link{ranger}}.
  30. To this end, to maintain compatibility, \code{getImpRf*} functions still accept \code{ntree} parameter relaying it into \code{num.trees}.
  31. Still, both parameters take the same defaults in both implementations (square root of the number all all attributes and 500 respectively).
  32. Moreover, \code{\link{ranger}} brings some addition capabilities to Boruta, like analysis of survival problems or sticky variables which are always considered on splits.
  33. Finally, the results for the same PRNG seed will be different.
  34. }
  35. \examples{
  36. set.seed(777)
  37. #Add some nonsense attributes to iris dataset by shuffling original attributes
  38. iris.extended<-data.frame(iris,apply(iris[,-5],2,sample))
  39. names(iris.extended)[6:9]<-paste("Nonsense",1:4,sep="")
  40. #Run Boruta on this data
  41. Boruta(Species~.,getImp=getImpLegacyRfZ,
  42. data=iris.extended,doTrace=2)->Boruta.iris.extended
  43. #Nonsense attributes should be rejected
  44. print(Boruta.iris.extended)
  45. }