dimensionality reduction – Ultraviolet Analytics

1 Dec, 2014

Kaggle Titanic Competition Part VII – Random Forests and Feature Importance

Dave2017-01-30T13:53:17-08:00December 1st, 2014|0 Comments

In the last post we took a look at how reduce noisy variables from our data set using PCA, and today we'll actually start modelling! Random Forests are one of the easiest models to run, and highly effective as well. A great combination for sure. If you're just starting out with a new problem, this is a great model to quickly build a reference model. There aren't a whole lot of parameters to tune, which makes it very user friendly. The primary parameters include how many decision trees to include in the forest, how much data to include in [...]

26 Nov, 2014

Kaggle Titanic Competition Part VI – Dimensionality Reduction

Dave2017-01-30T13:53:44-08:00November 26th, 2014|0 Comments

In the last post, we looked at how to use an automated process to generate a large number of non-correlated variables. Now we're going to look at a very common way to reduce the number of features that we use in modelling. You may be wondering why we'd remove variables we just took the time to create. The answer is pretty simple - sometimes it helps. If you think about a predictive model in terms of finding a "signal" or "pattern" in the data, it makes sense that you want to remove noise in the data that hides the [...]