Boosting Algorithms

Gradient Boosting Machines (GBM)

Image Source: brucecompany.com

Introduction

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in an iterative fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.

History

Boosting is one of the most powerful learning ideas introduced in the last twenty years. It was originally designed for classification problems, but it can be extended to regression as well. The motivation for boosting was a procedure that combines the outputs of many “weak” classifiers to produce a powerful “committee.” A weak classifier (e.g. decision tree) is one whose error rate is only slightly better than random guessing.

Adaboost

AdaBoost short for “Adaptive Boosting”, is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire in 1996, which is now considered to be a special case of Gradient Boosting.

Adaboost

There are some differences between the AdaBoost algorithm and modern Gradient Boosting. In the AdaBoost algorithm, the “shortcomings” of existing weak learners are identified by high-weight data points, however in Gradient Boosting, the shortcomings are identified by gradients.

Adaboost

The idea of gradient boosting originated in the observation by Leo Breiman that boosting can be interpreted as an optimization algorithm on a suitable cost function. Explicit regression gradient boosting algorithms were subsequently developed by Jerome H. Friedman, simultaneously with the more general functional gradient boosting perspective of Llew Mason, Jonathan Baxter, Peter Bartlett and Marcus Frean.

Adaboost

The latter two papers introduced the abstract view of boosting algorithms as iterative functional gradient descent algorithms. That is, algorithms that optimize a cost function over function space by iteratively choosing a function (weak hypothesis) that points in the negative gradient direction. This functional gradient view of boosting has led to the development of boosting algorithms in many areas of machine learning and statistics beyond regression and classification.

In general, in terms of model performance, we have the following heirarchy:

\[Boosting > Random \: Forest > Bagging > Single \: Tree\]

Gradient Boosting

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.

The purpose of boosting is to sequentially apply the weak classification algorithm to repeatedly modified versions of the data, thereby producing a sequence of weak classifiers $G_m(x)$, $m = 1, 2, ... , M$.

Stagewise Additive Modeling

Boosting builds an additive model:

\[F(x) = \sum_{m=1}^M \beta_m b(x; \gamma_m)\]

where $b(x; \gamma_m)$ is a tree and $\gamma_m$ parameterizes the splits. With boosting, the parameters, $(\beta_m, \gamma_m)$ are fit in a stagewise fashion. This slows the process down, and overfits less quickly.

AdaBoost

AdaBoost builds an additive logistic regression model by stagewise fitting.
AdaBoost uses an exponential loss function of the form, $L(y, F(x)) = exp(-yF(x))$, similar to the negative binomial log-likelihood loss.
The principal attraction of the exponential loss in the context of additive modeling is computational; it leads to the simple modular reweighting
Instead of fitting trees to residuals, the special form of the exponential loss function in AdaBoost leads to fitting trees to weighted versions of the original data.

Boosting

Source: Elements of Statistical Learning

Adaboost

Source: Elements of Statistical Learning

Gradient Boosting Algorithm

Friedman’s Gradient Boosting Algorithm for a generic loss function, $L(y_i, \gamma)$:

Source: Elements of Statistical Learning

Loss Functions and Gradients

Source: Elements of Statistical Learning

The optimal number of iterations, T, and the learning rate, λ, depend on each other.

Stochastic GBM

Stochastic Gradient Boosting (Friedman, 2002) proposed the stochastic gradient boosting algorithm that simply samples uniformly without replacement from the dataset before estimating the next gradient step. He found that this additional step greatly improved performance.

Practical Tips

It’s more common to grow shorter trees (“shrubs” or “stumps”) in GBM than you do in Random Forest.
It’s useful to try a variety of column sample (and column sample per tree) rates.
Don’t assume that the set of optimal tuning parameters for one implementation of GBM will carry over and also be optimal in a different GBM implementation.

Resources

GBM Software in R

This is not a comprehensive list of GBM software in R, however, we detail two of the most popular implementations here: gbm, and xgboost.

Some other projects

The CRAN Machine Learning Task View lists the following projects as well. The Hinge-loss is optimized by the boosting implementation in package bst. Package GAMBoost can be used to fit generalized additive models by a boosting algorithm. An extensible boosting framework for generalized linear, additive and nonparametric models is available in package mboost. Likelihood- based boosting for Cox models is implemented in CoxBoost and for mixed models in GMMBoost. GAMLSS models can be fitted using boosting by gamboostLSS.

gbm

Authors: Originally written by Greg Ridgeway, added to by various authors, currently maintained by Harry Southworth

Backend: C++

The gbm R package is an implementation of extensions to Freund and Schapire’s AdaBoost algorithm and Friedman’s gradient boosting machine. This is the original R implementation of GBM. A presentation is available here by Mark Landry.

Features

Features:

Stochastic GBM.
Supports up to 1024 factor levels.
Supports Classification and regression trees.
Includes regression methods for:
- least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, Learning to Rank measures (LambdaMart)
Out-of-bag estimator for the optimal number of iterations is provided.
Easy to overfit since early stopping functionality is not automated in this package.
If internal cross-validation is used, this can be parallelized to all cores on the machine.
Currently undergoing a major refactoring & rewrite (and has been for some time).
GPL-2/3 License.

Loading library

#install.packages("gbm")
#install.packages("cvAUC")
library(gbm)
library(cvAUC)

Loading data

# Load 2-class HIGGS dataset
train <- data.table::fread("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test <- data.table::fread("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")

set.seed(1)
model <- gbm(
  formula = response ~ .,
  distribution = "bernoulli",
  data = train,
  n.trees = 70,
  interaction.depth = 5,
  shrinkage = 0.3,
  bag.fraction = 0.5,
  train.fraction = 1.0,
  n.cores = NULL # will use all cores by default
  )

Running & Evaluation

print(model)
## gbm(formula = response ~ ., distribution = "bernoulli", data = train, 
##     n.trees = 70, interaction.depth = 5, shrinkage = 0.3, bag.fraction = 0.5, 
##     train.fraction = 1, n.cores = NULL)
## A gradient boosted model with bernoulli loss function.
## 70 iterations were performed.
## There were 28 predictors of which 28 had non-zero influence.

# Generate predictions on test dataset
preds <- predict(model, newdata = test, n.trees = 70)
labels <- test[,"response"]

# Compute AUC on the test set
cvAUC::AUC(predictions = preds, labels = labels)
## [1] 0.7741609

xgboost

Authors: Tianqi Chen, Tong He, Michael Benesty

XGBoost is short for eXtreme Gradient Boosting package.

The purpose of this section is to show you how to use XGBoost to build a model and make predictions.

It is an efficient and scalable implementation of gradient boosting framework by Friedman et al. (2000) and Friedman (2001). Two solvers are included:

linear model ;
tree learning algorithm.

It supports various objective functions, including regression, classification and ranking. The package is made to be extendible, so that users are also allowed to define their own objective functions easily.

It has been used to win several Kaggle competitions.

Features

It has several features:

Speed: it can automatically do parallel computation on Windows and Linux, with OpenMP. It is generally over 10 times faster than the classical gbm.
Input Type: it takes several types of input data:
- Dense Matrix: R’s dense matrix, i.e. matrix ;
- Sparse Matrix: R’s sparse matrix, i.e. Matrix::dgCMatrix ;
- Data File: local data files ;
- xgb.DMatrix: its own class (recommended).
Sparsity: it accepts sparse input for both tree booster and linear booster, and is optimized for sparse input ;
Customization: it supports customized objective functions and evaluation functions.

Installation

GitHub version

For weekly updated version (highly recommended), install from GitHub:

install.packages("drat", repos="https://cran.rstudio.com")
drat:::addRepo("dmlc")
install.packages("xgboost", repos="http://dmlc.ml/drat/", type = "source")

Windows user will need to install Rtools first.

Installation

CRAN version

The version 0.4-2 is on CRAN, and you can install it by:

install.packages("xgboost")

Formerly available versions can be obtained from the CRAN archive

Loading

For the purpose of this tutorial we will load XGBoost package.

require(xgboost)

Dataset presentation

In this example, we are aiming to predict whether a mushroom can be eaten or not (like in many tutorials, example data are the same as you will use on in your every day life :-).

Mushroom data is cited from UCI Machine Learning Repository. Bache and Lichman (2013).

Dataset loading

We will load the agaricus datasets embedded with the package and will link them to variables.

The datasets are already split in:

train: will be used to build the model ;
test: will be used to assess the quality of our model.

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test

Looking at the data

Each variable is a list containing two things, label and data:

str(train)
## List of 2
##  $ data :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. ..@ i       : int [1:143286] 2 6 8 11 18 20 21 24 28 32 ...
##   .. ..@ p       : int [1:127] 0 369 372 3306 5845 6489 6513 8380 8384 10991 ...
##   .. ..@ Dim     : int [1:2] 6513 126
##   .. ..@ Dimnames:List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:126] "cap-shape=bell" "cap-shape=conical" "cap-shape=convex" "cap-shape=flat" ...
##   .. ..@ x       : num [1:143286] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..@ factors : list()
##  $ label: num [1:6513] 1 0 0 1 0 0 0 1 0 0 ...

label is the outcome of our dataset meaning it is the binary classification we will try to predict.

Looking at the data

Let’s discover the dimensionality of our datasets.

dim(train$data)
## [1] 6513  126
dim(test$data)
## [1] 1611  126

Looking at the data

This dataset is very small to not make the R package too heavy, however XGBoost is built to manage huge dataset very efficiently.

As seen below, the data are stored in a dgCMatrix which is a sparse matrix and label vector is a numeric vector ({0,1}):

class(train$data)[1]
## [1] "dgCMatrix"
class(train$label)
## [1] "numeric"

Basic Training using XGBoost

This step is the most critical part of the process for the quality of our model.

Basic training

We are using the train data. As explained above, both data and label are stored in a list.

In a sparse matrix, cells containing 0 are not stored in memory. Therefore, in a dataset mainly made of 0, memory size is reduced. It is very usual to have such dataset.

Training parameters

We will train decision tree model using the following parameters:

objective = "binary:logistic": we will train a binary classification model;
max_depth = 2: the trees won’t be deep, because our case is very simple;
nthread = 2: the number of CPU threads we are going to use;
nrounds = 2: there will be two passes on the data, the second one will enhance the model by further reducing the difference between ground truth and prediction.

Training code

bstSparse <- xgboost(data = train$data, label = train$label, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
## [17:02:38] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [1]  train-logloss:0.233357 
## [2]  train-logloss:0.136649

More complex the relationship between your features and your label is, more passes you need.

Parameter variations

Dense matrix

Alternatively, you can put your dataset in a dense matrix, i.e. a basic R matrix.

bstDense <- xgboost(data = as.matrix(train$data), label = train$label, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
## [17:02:38] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [1]  train-logloss:0.233357 
## [2]  train-logloss:0.136649

Parameter variations

xgb.DMatrix

XGBoost offers a way to group them in a xgb.DMatrix. You can even add other meta data in it. It will be useful for the most advanced features we will discover later.

dtrain <- xgb.DMatrix(data = train$data, label = train$label)
bstDMatrix <- xgboost(data = dtrain, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
## [17:02:38] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [1]  train-logloss:0.233357 
## [2]  train-logloss:0.136649

Parameter variations

Verbose option

XGBoost has several features to help you to view how the learning progress internally. The purpose is to help you to set the best parameters, which is the key of your model quality.

One of the simplest way to see the training progress is to set the verbose option (see below for more advanced techniques).

Verbose

# verbose = 0, no message
bst <- xgboost(data = dtrain, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic", verbose = 0)
## [17:02:38] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.

Verbose

# verbose = 1, print evaluation metric
bst <- xgboost(data = dtrain, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic", verbose = 1)
## [17:02:38] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [1]  train-logloss:0.233357 
## [2]  train-logloss:0.136649

Verbose

# verbose = 2, also print information about tree
bst <- xgboost(data = dtrain, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic", verbose = 2)
## [17:02:38] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [1]  train-logloss:0.233357 
## [2]  train-logloss:0.136649

Basic prediction using XGBoost

Perform the prediction

The purpose of the model we have built is to classify new data. As explained before, we will use the test dataset for this step.

pred <- predict(bst, test$data)

# size of the prediction vector
print(length(pred))
## [1] 1611

# limit display of predictions to the first 10
print(head(pred))
## [1] 0.28583017 0.92392391 0.28583017 0.28583017 0.05169873 0.92392391

These numbers doesn’t look like binary classification {0,1}. We need to perform a simple transformation before being able to use these results.

Transform the regression in a binary classification

The only thing that XGBoost does is a regression. XGBoost is using label vector to build its regression model.

How can we use a regression model to perform a binary classification?

If we think about the meaning of a regression applied to our data, the numbers we get are probabilities that a datum will be classified as 1. Therefore, we will set the rule that if this probability for a specific datum is > 0.5 then the observation is classified as 1 (or 0 otherwise).

prediction <- as.numeric(pred > 0.5)
print(head(prediction))
## [1] 0 1 0 0 0 1

Measuring model performance

To measure the model performance, we will compute a simple metric, the average error.

err <- mean(as.numeric(pred > 0.5) != test$label)
print(paste("test-error=", err))
## [1] "test-error= 0.0217256362507759"

Explanation

Steps explanation:

as.numeric(pred > 0.5) applies our rule that when the probability (<=> regression <=> prediction) is > 0.5 the observation is classified as 1 and 0 otherwise ;
probabilityVectorPreviouslyComputed != test$label computes the vector of error between true data and computed probabilities ;
mean(vectorOfErrors) computes the average error itself.

The most important thing to remember is that to do a classification, you just do a regression to the label and then apply a threshold.

Multiclass classification works in a similar way.

This metric is 0.02 and is pretty low: our yummy mushroom model works well!

Advanced features

Most of the features below have been implemented to help you to improve your model by offering a better understanding of its content.

Dataset preparation

For the following advanced features, we need to put data in xgb.DMatrix as explained above.

dtrain <- xgb.DMatrix(data = train$data, label=train$label)
dtest <- xgb.DMatrix(data = test$data, label=test$label)

Measure learning progress with xgb.train

Both xgboost (simple) and xgb.train (advanced) functions train models.

One of the special feature of xgb.train is the capacity to follow the progress of the learning after each round. Because of the way boosting works, there is a time when having too many rounds lead to an overfitting. You can see this feature as a cousin of cross-validation method. The following techniques will help you to avoid overfitting or optimizing the learning time in stopping it as soon as possible.

Measure learning progress with xgb.train

One way to measure progress in learning of a model is to provide to XGBoost a second dataset already classified. Therefore it can learn on the first dataset and test its model on the second one. Some metrics are measured after each round during the learning.

in some way it is similar to what we have done above with the average error. The main difference is that below it was after building the model, and now it is during the construction that we measure errors.

Measure learning progress with xgb.train

For the purpose of this example, we use watchlist parameter. It is a list of xgb.DMatrix, each of them tagged with a name.

watchlist <- list(train=dtrain, test=dtest)

bst <- xgb.train(data=dtrain, max_depth=2, eta=1, nthread = 2, nrounds=2, watchlist=watchlist, objective = "binary:logistic")
## [17:02:38] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [1]  train-logloss:0.233357  test-logloss:0.226688 
## [2]  train-logloss:0.136649  test-logloss:0.137877

Measure learning progress with xgb.train

XGBoost has computed at each round the same average error metric than seen above (we set nrounds to 2, that is why we have two lines). Obviously, the train-error number is related to the training dataset (the one the algorithm learns from) and the test-error number to the test dataset.

Both training and test error related metrics are very similar, and in some way, it makes sense: what we have learned from the training dataset matches the observations from the test dataset.

Measure learning progress with xgb.train

For a better understanding of the learning progression, you may want to have some specific metric or even use multiple evaluation metrics.

bst <- xgb.train(data=dtrain, max_depth=2, eta=1, nthread = 2, nrounds=2, watchlist=watchlist, eval_metric = "error", eval_metric = "logloss", objective = "binary:logistic")
## [1]  train-error:0.046522    train-logloss:0.233357  test-error:0.042831 test-logloss:0.226688 
## [2]  train-error:0.022263    train-logloss:0.136649  test-error:0.021726 test-logloss:0.137877

eval_metric allows us to monitor two new metrics for each round, logloss and error.

Linear boosting

Until now, all the learnings we have performed were based on boosting trees. XGBoost implements a second algorithm, based on linear boosting. The only difference with previous command is booster = "gblinear" parameter (and removing eta parameter).

bst <- xgb.train(data=dtrain, booster = "gblinear", max_depth=2, nthread = 2, nrounds=2, watchlist=watchlist, eval_metric = "error", eval_metric = "logloss", objective = "binary:logistic")
## [17:02:38] WARNING: amalgamation/../src/learner.cc:573: 
## Parameters: { "max_depth" } might not be used.
## 
##   This may not be accurate due to some parameters are only used in language bindings but
##   passed down to XGBoost core.  Or some parameters are not used but slip through this
##   verification. Please open an issue if you find above cases.
## 
## 
## [1]  train-error:0.005834    train-logloss:0.180540  test-error:0.003724 test-logloss:0.184085 
## [2]  train-error:0.002610    train-logloss:0.077862  test-error:0.001862 test-logloss:0.079509

Linear boosting

In this specific case, linear boosting gets slightly better performance metrics than decision trees based algorithm.

In simple cases, it will happen because there is nothing better than a linear algorithm to catch a linear link. However, decision trees are much better to catch a non linear link between predictors and outcome. Because there is no silver bullet, we advise you to check both algorithms with your own datasets to have an idea of what to use.

Manipulating xgb.DMatrix

Save / Load

Like saving models, xgb.DMatrix object (which groups both dataset and outcome) can also be saved using xgb.DMatrix.save function.

xgb.DMatrix.save(dtrain, "dtrain.buffer")
## [1] TRUE
# to load it in, simply call xgb.DMatrix
dtrain2 <- xgb.DMatrix("dtrain.buffer")
## [17:02:38] 6513x126 matrix with 143286 entries loaded from dtrain.buffer
bst <- xgb.train(data=dtrain2, max_depth=2, eta=1, nthread = 2, nrounds=2, watchlist=watchlist, objective = "binary:logistic")
## [17:02:38] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [1]  train-logloss:0.233357  test-logloss:0.226688 
## [2]  train-logloss:0.136649  test-logloss:0.137877

Information extraction

Information can be extracted from xgb.DMatrix using getinfo function. Hereafter we will extract label data.

label = getinfo(dtest, "label")
pred <- predict(bst, dtest)
err <- as.numeric(sum(as.integer(pred > 0.5) != label))/length(label)
print(paste("test-error=", err))
## [1] "test-error= 0.0217256362507759"

View feature importance/influence from the learnt model

Feature importance is similar to R gbm package’s relative influence (rel.inf).

importance_matrix <- xgb.importance(model = bst)
print(importance_matrix)
xgb.plot.importance(importance_matrix = importance_matrix)

View feature importance/influence from the learnt model

##    Feature       Gain     Cover Frequency
## 1:      28 0.67615471 0.4978746       0.4
## 2:      55 0.17135375 0.1920543       0.2
## 3:      59 0.12317236 0.1638750       0.2
## 4:     108 0.02931918 0.1461960       0.2

View the trees from a model

You can dump the tree you learned using xgb.dump into a text file.

xgb.dump(bst, with_stats = TRUE)
##  [1] "booster[0]"                                                                   
##  [2] "0:[f28<-9.53674316e-07] yes=1,no=2,missing=1,gain=4000.53101,cover=1628.25"   
##  [3] "1:[f55<-9.53674316e-07] yes=3,no=4,missing=3,gain=1158.21204,cover=924.5"     
##  [4] "3:leaf=1.71217716,cover=812"                                                  
##  [5] "4:leaf=-1.70044053,cover=112.5"                                               
##  [6] "2:[f108<-9.53674316e-07] yes=5,no=6,missing=5,gain=198.173828,cover=703.75"   
##  [7] "5:leaf=-1.94070864,cover=690.5"                                               
##  [8] "6:leaf=1.85964918,cover=13.25"                                                
##  [9] "booster[1]"                                                                   
## [10] "0:[f59<-9.53674316e-07] yes=1,no=2,missing=1,gain=832.545044,cover=788.852051"
## [11] "1:[f28<-9.53674316e-07] yes=3,no=4,missing=3,gain=569.725098,cover=768.389709"
## [12] "3:leaf=0.78471756,cover=458.936859"                                           
## [13] "4:leaf=-0.968530357,cover=309.45282"                                          
## [14] "2:leaf=-6.23624468,cover=20.462389"

You can plot the trees from your model using xgb.plot.tree

xgb.plot.tree(model = bst)

if you provide a path to fname parameter you can save the trees to your hard drive.

Save and load models

Maybe your dataset is big, and it takes time to train a model on it? May be you are not a big fan of losing time in redoing the same task again and again? In these very rare cases, you will want to save your model and load it when required.

Thankfully, for you, XGBoost implements such functions!

# save model to binary local file
xgb.save(bst, "xgboost.model")
## [1] TRUE

xgb.save function should return TRUE if everything goes well and crashes otherwise.

Save and load models

An interesting test to see how identical our saved model is to the original one would be to compare the two predictions.

# load binary model to R
bst2 <- xgb.load("xgboost.model")
pred2 <- predict(bst2, test$data)

# And now the test
print(paste("sum(abs(pred2-pred))=", sum(abs(pred2-pred))))
## [1] "sum(abs(pred2-pred))= 0"

result is 0? We are good!

References

Bache, K., and M. Lichman. 2013. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml/.

Friedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 1189–1232.

Friedman, Jerome, Trevor Hastie, Robert Tibshirani, and others. 2000. “Additive Logistic Regression: A Statistical View of Boosting (with Discussion and a Rejoinder by the Authors).” The Annals of Statistics 28 (2): 337–407.