Extending compboost with losses

Before starting

Read the use-case site to get to know how to define a Compboost object using the R6 interface.

What is Needed

compboost was designed to provide a component-wise boosting framework with maximal flexibility. This vignette gives an overview how to define custom losses in R as well as in C++ without recompiling the whole package. These custom losses can be used for training the model and/or logging mechanisms.

The loss function for training a model with boosting is required to be differentiable. Hence, we need to define the loss function and the gradient. Further, boosting is initialized as loss optimal constant. To capture this, we have to define the loss optimal constant as function of a response vector. Having these three components, it is quite easy to define custom losses.

As showcase, we are rebuilding two different loss functions:

The quadratic loss as easy example for C++
The Poisson loss for counting data as more sophisticated loss example in R

Define a new loss in `R`

For this example we are using the VonBort dataset provided by the package vcd:

“Data from von Bortkiewicz (1898), given by Andrews & Herzberg (1985), on number of deaths by horse or mule kicks in 14 corps of the Prussian army.”

data(VonBort, package = "vcd")

We like to model the deaths using a Poisson regression in boosting. That means we have to define a proper loss function, the gradient, and the constant initialization.

The scheme for the loss, the gradient, and the constant initialization is to specify a function of the following form:

loss: function (truth, response)
gradient: function (truth, response)
constant initializer: function (truth)

The loss function

\[L(y,f) = -\log\left( \exp(f)^y \exp(\exp(f)) \right) - \log(y!)\]

lossPoisson = function (truth, response) {
  return(-log(exp(response)^truth * exp(-exp(response))) - gamma(truth + 1))
}

The gradient of the loss function

\[\frac{\partial}{\partial f} L(y,f) = \exp(f) - y\]

gradPoisson = function (truth, response) {
  return(exp(response) - truth)
}

The constant initialization

\[\mathsf{arg min}_{c\in\mathbb{R}} \sum_{i = 1}^n L\left(y^{(i)}, c\right) = \log(\bar{y})\]

constInitPoisson = function (truth) {
  return(log(mean(truth)))
}

Define the loss

Finally, having these three components allows to define a LossCustom object:

# Define custom loss:
my_poisson_loss = LossCustom$new(lossPoisson, gradPoisson, constInitPoisson)

Train a model

This loss object can be used for any task that requires a loss object:

cboost = Compboost$new(VonBort, "deaths", loss = my_poisson_loss)
cboost$addBaselearner("year", "spline", BaselearnerPSpline)
cboost$train(500, trace = 0)