2 Custom Tuning of Hyperparameter
Instead of the default method provided by the randomForest
package, we can try to create our own tuning mechanism. In this part of the tutorial, we are going to learn how to tune a random forest using different values of mtry
and different values of ntree
.
In this tutorial, we will deal with a simple problem: given a list of values to try, how to find the best combination of model that will provide the least error.
We will work with the same dataset Boston
in order to understand the difference in the approaches in a better manner.
To start, we will run the model for all possible combination of our choice and we will store the values of mtry
, ntree
, and the RMSE in a data frame.
## parameters
ntrees <- c(50, 100, 150, 200, 250, 300, 350, 400, 450, 500)
npred <- c(1,2,3,4,5,6,7,8,9,10,11,12,13)
nitr = length(ntrees)*length(npred)
outmat <- data.frame("ntrees" = numeric(nitr),
"npred" = numeric(nitr),
"RMSE" = numeric(nitr),
stringsAsFactors = F)
## loop counter
count = 1
## loop with model, prediction etc.
for (j in 1:length(ntrees)){
for (k in 1:length(npred)){
## i-th model
set.seed(123)
model <- randomForest(medv~., data=Boston, subset=train, mtry=npred[k], ntree= ntrees[j])
## prediction
yhat.rf = predict(model, newdata=Boston[-train,])
rmse = mean((yhat.rf-boston.test)^2)
## inserting model parameters and stats to a data frame for further comparisons
outmat[count,] = c("ntrees" = ntrees[j],
"npred"= npred[k],
"RMSE"= rmse)
count = count + 1
## calling garbage collector to assure free space in RAM
gc()
}
}
Now, we would like to have a look at the combination that provided us with the best model:
min.rmse=min(outmat$RMSE)
model.best = outmat[outmat$RMSE==min.rmse,]
model.best
## ntrees npred RMSE
## 24 100 11 9.024417
So, the best random forest model has the number of trees = 100, number of predictors to select = 11, and the RMSE = 9.0244166.