Kaggle Titanic Competition Part VII – Random Forests and Feature Importance
In the last post we took a look at how reduce noisy variables from our data set using PCA, and today we'll actually start modelling! Random Forests are one of the easiest models to run, and highly effective as well. A great combination for sure. If you're just starting out with a new problem, this is a great model to quickly build a reference model. There aren't a whole lot of parameters to tune, which makes it very user friendly. The primary parameters include how many decision trees to include in the forest, how much data to include in [...]