Kaggle Titanic Competition Part IV – Derived Variables
In the previous post, we began taking a look at how to convert the raw data into features that can be used by the Random Forest model. Any variable that is generated from one or more existing variables is called a "derived" variable. We've discussed basic transformations that result in useful derived variables, and in this post we'll look at some more interesting derived variables that aren't simple transformations. An important aspect of feature engineering is using insight and creativity to find new features to feed the model. You'll read this over and over again, and it really can't [...]