2 Custom Tuning of Hyperparameter

Instead of the default method provided by the randomForest package, we can try to create our own tuning mechanism. In this part of the tutorial, we are going to learn how to tune a random forest using different values of mtry and different values of ntree.

In this tutorial, we will deal with a simple problem: given a list of values to try, how to find the best combination of model that will provide the least error.

We will work with the same dataset Boston in order to understand the difference in the approaches in a better manner.

To start, we will run the model for all possible combination of our choice and we will store the values of mtry, ntree, and the RMSE in a data frame.

## parameters

ntrees <- c(50, 100, 150, 200, 250, 300, 350, 400, 450, 500)
npred <- c(1,2,3,4,5,6,7,8,9,10,11,12,13)

nitr = length(ntrees)*length(npred)

outmat <- data.frame("ntrees" = numeric(nitr),
                     "npred"  = numeric(nitr),
                     "RMSE"   = numeric(nitr),
                     stringsAsFactors = F)

## loop counter

count = 1

## loop with model, prediction etc.

for (j in 1:length(ntrees)){
  for (k in 1:length(npred)){
    

      ## i-th model
      set.seed(123)
      model <-  randomForest(medv~., data=Boston, subset=train, mtry=npred[k], ntree= ntrees[j])

      ## prediction 

      yhat.rf = predict(model, newdata=Boston[-train,])
      rmse = mean((yhat.rf-boston.test)^2)

      ## inserting model parameters and stats to a data frame for further comparisons
outmat[count,] = c("ntrees" = ntrees[j],
                      "npred"= npred[k],
                      "RMSE"= rmse)
count = count + 1
## calling garbage collector to assure free space in RAM

      gc()
    }
  }

Now, we would like to have a look at the combination that provided us with the best model:

min.rmse=min(outmat$RMSE)
model.best = outmat[outmat$RMSE==min.rmse,]
model.best
##    ntrees npred     RMSE
## 24    100    11 9.024417

So, the best random forest model has the number of trees = 100, number of predictors to select = 11, and the RMSE = 9.0244166.