1 Support Vector Regression

Support vector machines (SVM) are often used for classification and rarely in machine learning tutorials people talk about Support Vector Regression (SVR). Although the process of SVM for classification and SVR are almost identical, the interpretation of the separating hyperplane is slightly different between the two processes.

While in classification tasks, the separating hyperplane partitions the feature space in distinct areas, one for each class, the hyperplane in SVR can be thought of the function that will generate the predictions. The idea of a decision boundary or margin works in the same way for both classification and regression problems, where a large margin means underfitting and a very narrow margin means that the decision boundary has clasely traced the training data, resulting in overfitting and less training error, albeit with a high probability of increased test error.

In this mini tutorial, we will take a toy example and show how the SVR works and how does it look like.

1.1 Simple Linear Regression

Let’s load the data, display the data and fit a line:

# Load the data

data <- readxl::read_xlsx("data.xlsx")
 
# Plot the data
plot(data, pch=16)
 
# Create a linear regression model
modellm <- lm(Y ~ X, data)
 
# Add the fitted line
abline(modellm)

1.2 How good are the predictions?

If instead of the line fitted above, we wanted to check how good the predictions are, we can plot the results in the following way:

# make a prediction for each X
predictedY <- predict(modellm, data)
# Plot the data
plot.new()
plot(data, pch=16)
# display the predictions
points(data$X, predictedY, col = "blue", pch=4)

In order to numerically understand the efficacy of the linear model, we need to find the RMSE:

rmse <- function(error)
{
  sqrt(mean(error^2))
}
 
error <- modellm$residuals 
predictionRMSE <- rmse(error)  
predictionRMSE
## [1] 5.703778

We know now that the RMSE of our linear regression model is 5.7037779. Let’s try to improve it with SVR!

1.3 Support Vector Regression

We will use the package e1071 and the svm function to do SVR:

library(e1071)

model <- svm(Y ~ X , data)
 
predictedY <- predict(model, data)

plot.new()
plot(data, pch=16)
points(data$X, predictedY, col = "red", pch=4)

This time the predictions seem to be closer to the real values ! Let’s compute the RMSE of our support vector regression model.

error <- data$Y - predictedY
svrPredictionRMSE <- rmse(error)
svrPredictionRMSE
## [1] 3.157061

As expected the RMSE is better, it is now 3.1570607 compared to 5.7037779 before.

But can we do better ?

1.4 Tuning the SVR model

In order to improve the performance of the support vector regression we will need to select the best parameters for the model.

In our previous example, we performed an epsilon-regression, we did not set any value for epsilon (\(\epsilon\)), but it took a default value of 0.1. There is also a cost parameter which we can change to avoid overfitting.

The process of choosing these parameters is called hyperparameter optimization, or model selection.

The standard way of doing it is by doing a grid search. It means we will train a lot of models for the different couples of \(\epsilon\) and cost, and choose the best one.

# perform a grid search
tuneResult <- tune(svm, Y ~ X,  data = data,
              ranges = list(epsilon = seq(0,1,0.1), cost = 2^(2:9))
)
print(tuneResult)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  epsilon cost
##        0    4
## 
## - best performance: 9.900356
# Draw the tuning graph
plot(tuneResult)

There is two important points in the code above:

we use the tune method to train models with \(\epsilon\)=0,0.1,0.2,…,1 and cost = 22,23,24,…,29 which means it will train 88 models (it can take a long time).
the tuneResult returns the MSE, don’t forget to convert it to RMSE before comparing the value to our previous model.

On the graph of tuneResult we can see that the darker the region is the better our model is (because the RMSE is closer to zero in darker regions).

This means we can try another grid search in a narrower range we will try with \(\epsilon\) values between 0 and 0.2. It does not look like the cost value is having an effect for the moment so we will keep it as it is to see if it changes.

tuneResult <- tune(svm, Y ~ X,  data = data,
                   ranges = list(epsilon = seq(0,0.2,0.01), cost = 2^(2:9))
) 
 
print(tuneResult)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  epsilon cost
##     0.09  128
## 
## - best performance: 8.18756
plot(tuneResult)

We trained 168 different models with this small piece of code.

As we zoom inside the dark region we can see that there are several darker patches. Thankfully for us, we don’t have to select the best model with our eyes and R allows us to get it very easily and use it to make predictions.

tunedModel <- tuneResult$best.model
tunedModelY <- predict(tunedModel, data) 
 
error <- data$Y - tunedModelY  
 
# this value can be different on your computer
# because the tune method  randomly shuffles the data
tunedModelRMSE <- rmse(error)
tunedModelRMSE
## [1] 2.072399

We improved the RMSE of our support vector regression model again!

We can visualize all our models. The first SVR model is in red, and the tuned SVR model is in blue on the graph below :