Initializing Model

Due to the R6 API it is necessary to create a new class object which gets the data, the target as character, and the used loss. Note that it is important to give an initialized loss object:

Use an initialized object for the loss gives the opportunity to use a loss initialized with a custom offset.

Adding Base-Learner

Adding new base-learners is also done by giving a character to indicate the feature. As second argument it is important to name an identifier for the factory since we can define multiple base-learner on the same source.

Categorical Features

When adding categorical features each group is added as single base-learner to avoid biased feature selection. Also note that we don’t need an intercept here:

Finally, we can check what factories are registered:

Define Logger

Time logger

This logger logs the elapsed time. The time unit can be one of microseconds, seconds or minutes. The logger stops if max_time is reached. But we do not use that logger as stopper here:

Train Model and Access Elements

cboost$train(2000, trace = 100)
#>    1/2000   risk = 0.73  oob_risk = 0.76   time = 0   
#>  100/2000   risk = 0.64  oob_risk = 0.69   time = 15622   
#>  200/2000   risk = 0.62  oob_risk = 0.67   time = 31711   
#>  300/2000   risk = 0.61  oob_risk = 0.66   time = 47901   
#>  400/2000   risk = 0.6  oob_risk = 0.65   time = 64569   
#>  500/2000   risk = 0.6  oob_risk = 0.65   time = 81588   
#>  600/2000   risk = 0.59  oob_risk = 0.65   time = 99339   
#>  700/2000   risk = 0.59  oob_risk = 0.65   time = 117371   
#>  800/2000   risk = 0.59  oob_risk = 0.65   time = 135977   
#>  900/2000   risk = 0.59  oob_risk = 0.65   time = 153953   
#> 1000/2000   risk = 0.59  oob_risk = 0.65   time = 172255   
#> 1100/2000   risk = 0.59  oob_risk = 0.65   time = 190418   
#> 1200/2000   risk = 0.59  oob_risk = 0.65   time = 208840   
#> 1300/2000   risk = 0.59  oob_risk = 0.65   time = 228773   
#> 1400/2000   risk = 0.59  oob_risk = 0.65   time = 250424   
#> 1500/2000   risk = 0.59  oob_risk = 0.65   time = 271120   
#> 1600/2000   risk = 0.59  oob_risk = 0.65   time = 290963   
#> 1700/2000   risk = 0.59  oob_risk = 0.65   time = 311443   
#> 1800/2000   risk = 0.59  oob_risk = 0.65   time = 332204   
#> 1900/2000   risk = 0.59  oob_risk = 0.65   time = 353300   
#> 2000/2000   risk = 0.59  oob_risk = 0.65   time = 374042   
#> 
#> 
#> Train 2000 iterations in 0 Seconds.
#> Final risk based on the train set: 0.59
cboost
#> Component-Wise Gradient Boosting
#> 
#> Trained on df_train with target Survived
#> Number of base-learners: 5
#> Learning rate: 0.05
#> Iterations: 2000
#> Offset: 0.2069
#> 
#> LossBinomial Loss:
#> 
#>   Loss function: L(y,x) = log(1 + exp(-2yf(x))
#> 
#> 

Objects of the Compboost class do have member functions such as getEstimatedCoef(), getInbagRisk() or predict() to access the results:

To obtain a vector of selected learner just call getSelectedBaselearner()

We can also access predictions directly from the response object cboost$response and cboost$response_oob. Note that $response_oob was created automatically when defining an oob_fraction within the constructor:

Retrain the Model

To set the whole model to another iteration one can easily call train() to another iteration:

Visualizing Base-Learner

To visualize a base-learner it is important to exactly use a name from getBaselearnerNames():

gg1 = cboost$plot("Age_spline")
gg2 = cboost$plot("Age_spline", iters = c(50, 100, 500, 1000, 1500))

gg1 = cboost$plot("Age_spline")
gg2 = cboost$plot("Age_spline", iters = c(50, 100, 500, 1000, 1500))