In order to provide high flexibility we decided to write the base implementation of compboost in C++ to make use of object oriented programming. These C++ classes can be used in R after exposing them as S4 classes. To take away another layer of abstraction we decided to break the S4 classes down to a few R6 classes wrapping the original ones.

All in all, the class system of compboost is a mixture of raw exposed S4 classes and convenience classes written in R6. As usual for object oriented programming, the classes works on a reference base. This introduction aims to show how these references work and can be accessed.

The two classes that are most affected from the reference system are the Response and Data classes since they hold and transform data.

## Response Class

The target variable is represented by an object that inherits from the Response class. Depending on the target type we like to have different transformations of the internally predicted scores. For instance, having a binary classification task the score $$\hat{f}(x) \in \mathbb{R}$$ is transformed to a $$[0,1]$$ scale by using the logistic function:

$\hat{\pi}(x) = \frac{1}{1 + \exp(-\hat{f}(x))}$

To show the how references work here we first define a ResponseBinaryClassif object. Therefore, we use the mtcars dataset and create a new binary target variable for fast $$\text{qsec} < 17$$ or slow $$\text{qsec} \geq 17$$ cars and create the response object:

df = mtcars[, c("mpg", "disp", "hp", "drat", "wt")]
df$qsec_cat = ifelse(mtcars$qsec < 17, "fast", "slow")

obj_response = ResponseBinaryClassif$new("qsec_cat", "fast", df$qsec_cat)
obj_response
#>
#> Binary classification response of target "qsec_cat" and threshold 0.5
#> ResponseBinaryClassifPrinter

To access the underlying representation of the response class (here a binary variable) one can use $getResponse(). In the initialization of a new response object the prediction $$\hat{f} \in \mathbb{R}$$ is initialized with zeros. We can also use the response object to calculate the transformed predictions $$\hat{\pi} \in [0,1]$$: knitr::kable(head(data.frame( target = df$qsec_cat,
target_representation = obj_response$getResponse(), prediction_initialization = obj_response$getPrediction(),
prediction_transformed = obj_response$getPredictionTransform() ))) target target_representation prediction_initialization prediction_transformed fast 1 0 0.5 slow -1 0 0.5 slow -1 0 0.5 slow -1 0 0.5 slow -1 0 0.5 slow -1 0 0.5 In the case of binary classification we can also use the response object to calculate the predictions on a label basis by using a specified threshold $$a$$: $\hat{y} = 1 \ \ \text{if} \ \ \hat{\pi}(x) \geq a$ The default threshold here is 0.5: obj_response$getThreshold()
#> [1] 0.5
head(obj_response$getPredictionResponse()) #> [,1] #> [1,] 1 #> [2,] 1 #> [3,] 1 #> [4,] 1 #> [5,] 1 #> [6,] 1 By setting the threshold to 0.6 we observe now that each class is predicted as negative: obj_response$setThreshold(0.6)
head(obj_response$getPredictionResponse()) #> [,1] #> [1,] -1 #> [2,] -1 #> [3,] -1 #> [4,] -1 #> [5,] -1 #> [6,] -1 This behavior has nothing to do with references at the moment. But, just prediction a score of 0 for all observations is no good predictor. During the fitting of a component-wise boosting model these predictions are adjusted over and over again by the Compboost object. This is where the reference comes in: cboost = boostSplines(data = df, target = obj_response, loss = LossBinomial$new(),
iterations = 2000L, trace = 500L)
#>    1/2000   risk = 0.93
#>  500/2000   risk = 0.63
#> 1000/2000   risk = 0.6
#> 1500/2000   risk = 0.58
#> 2000/2000   risk = 0.57
#>
#>
#> Train 2000 iterations in 0 Seconds.
#> Final risk based on the train set: 0.57

Having again a look at the predictions shows that they are different than before without touching the response class. This is because of the response object that is passed to the Compboost object. During the fitting process the predictions of the response object are set by the model:

knitr::kable(head(data.frame(
target = df$qsec_cat, prediction = obj_response$getPrediction(),
prediction_transformed = obj_response$getPredictionTransform(), prediction_response = obj_response$getPredictionResponse()
)))
target prediction prediction_transformed prediction_response
fast 0.6894909 0.6658537 1
slow -1.3258158 0.2098523 -1
slow -4.0962469 0.0163628 -1
slow -7.9047751 0.0003688 -1
slow -2.7548448 0.0598136 -1
slow -3.4097625 0.0319918 -1