Kaggle Titanic Competition Part VI – Dimensionality Reduction
In the last post, we looked at how to use an automated process to generate a large number of non-correlated variables. Now we're going to look at a very common way to reduce the number of features that we use in modelling. You may be wondering why we'd remove variables we just took the time to create. The answer is pretty simple - sometimes it helps. If you think about a predictive model in terms of finding a "signal" or "pattern" in the data, it makes sense that you want to remove noise in the data that hides the [...]