Initializing Model

Due to the R6 API it is necessary to create a new class object by calling the $new() constructor which gets the data, the target as character, and the used loss. Note that it is important to pass an initialized loss object which gives the opportunity to use, for example, a custom offset:

Adding Base-Learner

Adding new base-learners requires as first argument a character to indicate what feature we want to use for the new base-learner. As second argument it is important to define an identifier for the factory. This is necessary since it is possible to define multiple base-learners on the same source.

Categorical Features

When adding categorical features, each group is added as single base-learner. Do also note that we don’t want an intercept here:

Finally, we can get all registered factories:

Define Logger

A logger is another class that is evaluated after each iteration to track the performance, elapsed runtime, or the iterations. For each Compboost object is by default one iterations logger defined with as many iterations as specified in the $train() function.

To be able to control the fitting behavior with logger, each logger can also be defined as stopper to stop the fitting process after a pre-defined stopping criteria.

Time logger

This logger tracks the elapsed time. The time unit can be one of microseconds, seconds or minutes. The logger stops if max_time is reached. But we do not use that logger as stopper here:

Train Model and Access Elements

cboost$train(2000, trace = 100)
#>    1/2000   risk = 0.73  oob_risk = 0.74   time = 0   
#>  100/2000   risk = 0.65  oob_risk = 0.67   time = 14372   
#>  200/2000   risk = 0.62  oob_risk = 0.65   time = 30179   
#>  300/2000   risk = 0.61  oob_risk = 0.64   time = 46497   
#>  400/2000   risk = 0.6  oob_risk = 0.63   time = 62688   
#>  500/2000   risk = 0.59  oob_risk = 0.63   time = 79918   
#>  600/2000   risk = 0.59  oob_risk = 0.63   time = 96813   
#>  700/2000   risk = 0.59  oob_risk = 0.62   time = 114323   
#>  800/2000   risk = 0.59  oob_risk = 0.62   time = 132308   
#>  900/2000   risk = 0.59  oob_risk = 0.62   time = 150694   
#> 1000/2000   risk = 0.59  oob_risk = 0.62   time = 169573   
#> 1100/2000   risk = 0.58  oob_risk = 0.62   time = 188328   
#> 1200/2000   risk = 0.58  oob_risk = 0.62   time = 207671   
#> 1300/2000   risk = 0.58  oob_risk = 0.62   time = 226255   
#> 1400/2000   risk = 0.58  oob_risk = 0.62   time = 245202   
#> 1500/2000   risk = 0.58  oob_risk = 0.62   time = 264858   
#> 1600/2000   risk = 0.58  oob_risk = 0.62   time = 285766   
#> 1700/2000   risk = 0.58  oob_risk = 0.62   time = 305588   
#> 1800/2000   risk = 0.58  oob_risk = 0.62   time = 326261   
#> 1900/2000   risk = 0.58  oob_risk = 0.62   time = 346695   
#> 2000/2000   risk = 0.58  oob_risk = 0.62   time = 366849   
#> 
#> 
#> Train 2000 iterations in 0 Seconds.
#> Final risk based on the train set: 0.58
cboost
#> Component-Wise Gradient Boosting
#> 
#> Trained on df_train with target Survived
#> Number of base-learners: 5
#> Learning rate: 0.05
#> Iterations: 2000
#> Offset: 0.1986
#> 
#> LossBinomial Loss:
#> 
#>   Loss function: L(y,x) = log(1 + exp(-2yf(x))
#> 
#> 

Objects of the Compboost class do have member functions such as $getEstimatedCoef(), $getInbagRisk() or $predict() to access the results:

To obtain a vector of the selected base-learners just call $getSelectedBaselearner()

We can also access the predictions directly from the response object cboost$response and cboost$response_oob. Note that $response_oob was created automatically when defining an oob_fraction within the constructor:

Visualizing Inbag vs. Out-Of-Bag Behavior

Retrain the Model

To set the whole model to another iteration one can again call $train(). The model is then set to an already seen iteration, if the new iteration is smaller than the already trained once or it trains additional base-learner until the new number is reached:

Visualizing Base-Learner

To visualize a base-learner it is important to exactly use a name from $getBaselearnerNames():

gg1 = cboost$plot("Age_spline")
gg2 = cboost$plot("Age_spline", iters = c(50, 100, 500, 1000, 1500))

gg1 = cboost$plot("Age_spline")
gg2 = cboost$plot("Age_spline", iters = c(50, 100, 500, 1000, 1500))