Before starting
- Read the use-case
site to get to know how to define a
Compboost
object using theR6
interface.
What is Needed
compboost
was designed to provide a component-wise
boosting framework with maximal flexibility. This vignette gives an
overview how to define custom losses in R
as well as in
C++
without recompiling the whole package. These custom
losses can be used for training the model and/or logging mechanisms.
The loss function for training a model with boosting is required to be differentiable. Hence, we need to define the loss function and the gradient. Further, boosting is initialized as loss optimal constant. To capture this, we have to define the loss optimal constant as function of a response vector. Having these three components, it is quite easy to define custom losses.
As showcase, we are rebuilding two different loss functions:
- The quadratic loss as easy example for
C++
- The Poisson loss for counting data as more sophisticated loss
example in
R
Define a new loss in R
For this example we are using the VonBort
dataset
provided by the package vcd
:
“Data from von Bortkiewicz (1898), given by Andrews & Herzberg (1985), on number of deaths by horse or mule kicks in 14 corps of the Prussian army.”
data(VonBort, package = "vcd")
We like to model the deaths using a Poisson regression in boosting. That means we have to define a proper loss function, the gradient, and the constant initialization.
The scheme for the loss, the gradient, and the constant initialization is to specify a function of the following form:
- loss:
function (truth, response)
- gradient:
function (truth, response)
- constant initializer:
function (truth)
The constant initialization
\[\mathsf{arg min}_{c\in\mathbb{R}} \sum_{i = 1}^n L\left(y^{(i)}, c\right) = \log(\bar{y})\]
Define the loss
Finally, having these three components allows to define a
LossCustom
object:
# Define custom loss:
my_poisson_loss = LossCustom$new(lossPoisson, gradPoisson, constInitPoisson)
Train a model
This loss object can be used for any task that requires a loss object:
cboost = Compboost$new(VonBort, "deaths", loss = my_poisson_loss)
cboost$addBaselearner("year", "spline", BaselearnerPSpline)
cboost$train(500, trace = 0)