123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354 |
- % Generated by roxygen2: do not edit by hand
- % Please edit documentation in R/importance.R
- \name{getImpLegacyRf}
- \alias{getImpLegacyRf}
- \alias{getImpLegacyRfZ}
- \alias{getImpLegacyRfGini}
- \alias{getLegacyImpRfRaw}
- \alias{getImpLegacyRfRaw}
- \title{randomForest importance adapters}
- \usage{
- getImpLegacyRfZ(x, y, ...)
- getImpLegacyRfRaw(x, y, ...)
- getImpLegacyRfGini(x, y, ...)
- }
- \arguments{
- \item{x}{data frame of predictors including shadows.}
- \item{y}{response vector.}
- \item{...}{parameters passed to the underlying \code{\link[randomForest]{randomForest}} call; they are relayed from \code{...} of \code{\link{Boruta}}.}
- }
- \description{
- Those function is intended to be given to a \code{getImp} argument of \code{\link{Boruta}} function to be called by the Boruta algorithm as an importance source.
- \code{getImpLegacyRfZ} generates default, normalized permutation importance, \code{getImpLegacyRfRaw} raw permutation importance, finally \code{getImpLegacyRfGini} generates Gini index importance, all using \code{\link[randomForest]{randomForest}} as a Random Forest algorithm implementation.
- }
- \note{
- The \code{getImpLegacyRfZ} function was a default importance source in Boruta versions prior to 5.0; since then \code{\link{ranger}} Random Forest implementation is used instead of \code{\link[randomForest]{randomForest}}, for speed, memory conservation and an ability to utilise multithreading.
- Both importance sources should generally lead to the same results, yet there are differences.
- Most notably, ranger by default treats factor attributes as ordered (and works very slow if instructed otherwise with \code{respect.unordered.factors=TRUE}); on the other hand it lifts 32 levels limit specific to \code{\link[randomForest]{randomForest}}.
- To this end, Boruta decision for factor attributes may be different.
- Random Forest methods has two main parameters, number of attributes tried at each split and the number of trees in the forest; first one is called \code{mtry} in both implementations, but the second \code{ntree} in \code{\link[randomForest]{randomForest}} and \code{num.trees} in \code{\link{ranger}}.
- To this end, to maintain compatibility, \code{getImpRf*} functions still accept \code{ntree} parameter relaying it into \code{num.trees}.
- Still, both parameters take the same defaults in both implementations (square root of the number all all attributes and 500 respectively).
- Moreover, \code{\link{ranger}} brings some addition capabilities to Boruta, like analysis of survival problems or sticky variables which are always considered on splits.
- Finally, the results for the same PRNG seed will be different.
- }
- \examples{
- set.seed(777)
- #Add some nonsense attributes to iris dataset by shuffling original attributes
- iris.extended<-data.frame(iris,apply(iris[,-5],2,sample))
- names(iris.extended)[6:9]<-paste("Nonsense",1:4,sep="")
- #Run Boruta on this data
- Boruta(Species~.,getImp=getImpLegacyRfZ,
- data=iris.extended,doTrace=2)->Boruta.iris.extended
- #Nonsense attributes should be rejected
- print(Boruta.iris.extended)
- }
|