Using Random Forest for predictive modeling and defining importance in customer satisfaction surveys
Commonly used methods to calculate derived importance are "Correlation analysis", "Analysis of Variance" and "Regression analysis". The main disadvantage of these methods is how they deal with missing values, categorical variables and multicollinearity.
Models can easily handle both continuous and categorical variables.
Models can effectively model interactions between predictors.
Models allow for nonlinear dependencies.
Models allow large number of predictors. Classical methods often face the curse of dimensionality resulting in bad accuracies on validation samples.
Predictions are possible even when an important predictor has a missing value. Better estimation of the missing values are made within the algorithm. For classical methods, imputing data is often done before, independently from the model.
Models produced by Random Forest are not affected by extreme values or strange distributions in continuous variables. In fact, they are invariant under monotone transformations of the predictors.
You can depict the variable importance and what is the marginal effect of each variable.
Random Forest has been proven to be one of the most successful methods in machine learning on real life applications.