Wrapper to boost general additive models using components

This wrapper function automatically initializes the model by adding all numerical features as components. This means, that for each numerical feature a linear effect and non-linear spline base-learner is added. The non-linear part is constructed in way that it cannot model the linear part. Hence, it is just selected if a non-linear base learner is really necessary. Categorical features are dummy encoded and inserted using another linear base-learners without intercept.

The returned object is an object of the Compboost class. This object can be used for further analyses (see ?Compboost for details).

Usage

boostComponents(
  data,
  target,
  optimizer = NULL,
  loss = NULL,
  learning_rate = 0.05,
  iterations = 100,
  trace = -1,
  degree = 3,
  n_knots = 20,
  penalty = 2,
  df = 0,
  differences = 2,
  data_source = InMemoryData,
  oob_fraction = NULL,
  bin_root = 0,
  cache_type = "inverse",
  stop_args = list(),
  df_cat = 1,
  stop_time = "microseconds",
  additional_risk_logs = list()
)

Arguments

data: (data.frame())
A data frame containing the data.
target: (character(1) | ResponseRegr | ResponseBinaryClassif)
Character value containing the target variable or response object. Note that the loss must match the data type of the target.
optimizer: (OptimizerCoordinateDescent | OptimizerCoordinateDescentLineSearch | OptimizerAGBM | OptimizerCosineAnnealing)
An initialized S4 optimizer object (requires to call Optimizer*.new(..). See the respective help page for further information.
loss: (LossQuadratic | LossBinomial | LossHuber | LossAbsolute | LossQuantile)
An initialized S4 loss object (requires to call Loss*$new(...)). See the respective help page for further information.
learning_rate: (numeric(1))
Learning rate to shrink the parameter in each step.
iterations: (integer(1))
Number of iterations that are trained. If iterations == 0, the untrained object is returned. This can be useful if other base learners (e.g. an interaction via a tensor base learner) are added.
trace: (integer(1))
Integer indicating how often a trace should be printed. Specifying trace = 10, then every 10th iteration is printed. If no trace should be printed set trace = 0. Default is -1 which means that in total 40 iterations are printed.
degree: (integer(1))cr Polynomial degree of the splines.
n_knots: (integer(1))
Number of equidistant "inner knots". The actual number of used knots does also depend on the polynomial degree.
penalty: (numeric(1))
Penalty term for p-splines. If the penalty equals 0, then ordinary b-splines are fitted. The higher the penalty, the higher the smoothness.
df: (numeric(1))
Degrees of freedom of the base learner(s).
differences: (integer(1))
Number of differences that are used for penalization. The higher the difference, the higher the smoothness.
data_source: (Data*)
Uninitialized Data* object which is used to store the data. At the moment just in memory training is supported.
oob_fraction: (numeric(1))
Fraction of how much data are used to track the out of bag risk.
bin_root: (integer(1))
The binning root to reduce the data to $n^{1/\text{binroot}}$ data points (default bin_root = 1, which means no binning is applied). A value of bin_root = 2 is suggested for the best approximation error (cf. Wood et al. (2017) Generalized additive models for gigadata: modeling the UK black smoke network daily data).
cache_type: (character(1))
String to indicate what method should be used to estimate the parameter in each iteration. Default is cache_type = "cholesky" which computes the Cholesky decomposition, caches it, and reuses the matrix over and over again. The other option is to use cache_type = "inverse" which does the same but caches the inverse.
stop_args: (list(2))
List containing two elements patience and eps_for_break which can be set to use early stopping on the left out data from setting oob_fraction. If ! is.null(stop_args), early stopping is triggered.
df_cat: (numeric(1))
Degrees of freedom of the categorical base-learner.
stop_time: (character(1))
Unit of measured time.
additional_risk_logs: (list(Logger))
Additional logger passed to the Compboost object.

Value

A model of the Compboost class. This model is an R6 object which can be used for retraining, predicting, plotting, and anything described in ?Compboost.

Examples

mod = boostComponents(data = iris, target = "Sepal.Length", df = 4)
#>   1/100   risk = 0.32  time = 0   
#>   2/100   risk = 0.29  time = 79   
#>   4/100   risk = 0.25  time = 170   
#>   6/100   risk = 0.22  time = 257   
#>   8/100   risk = 0.2  time = 339   
#>  10/100   risk = 0.17  time = 423   
#>  12/100   risk = 0.16  time = 506   
#>  14/100   risk = 0.14  time = 588   
#>  16/100   risk = 0.13  time = 670   
#>  18/100   risk = 0.12  time = 753   
#>  20/100   risk = 0.12  time = 836   
#>  22/100   risk = 0.11  time = 918   
#>  24/100   risk = 0.1  time = 999   
#>  26/100   risk = 0.1  time = 1080   
#>  28/100   risk = 0.097  time = 1164   
#>  30/100   risk = 0.094  time = 1248   
#>  32/100   risk = 0.091  time = 1331   
#>  34/100   risk = 0.089  time = 1416   
#>  36/100   risk = 0.086  time = 1499   
#>  38/100   risk = 0.084  time = 1582   
#>  40/100   risk = 0.083  time = 1664   
#>  42/100   risk = 0.081  time = 1747   
#>  44/100   risk = 0.079  time = 1832   
#>  46/100   risk = 0.078  time = 1917   
#>  48/100   risk = 0.076  time = 1999   
#>  50/100   risk = 0.075  time = 2083   
#>  52/100   risk = 0.074  time = 2184   
#>  54/100   risk = 0.072  time = 2268   
#>  56/100   risk = 0.071  time = 2353   
#>  58/100   risk = 0.07  time = 2436   
#>  60/100   risk = 0.069  time = 2521   
#>  62/100   risk = 0.068  time = 2606   
#>  64/100   risk = 0.067  time = 2688   
#>  66/100   risk = 0.066  time = 2777   
#>  68/100   risk = 0.065  time = 2861   
#>  70/100   risk = 0.065  time = 2947   
#>  72/100   risk = 0.064  time = 3031   
#>  74/100   risk = 0.063  time = 3115   
#>  76/100   risk = 0.063  time = 3200   
#>  78/100   risk = 0.062  time = 3287   
#>  80/100   risk = 0.061  time = 3372   
#>  82/100   risk = 0.061  time = 3457   
#>  84/100   risk = 0.06  time = 3545   
#>  86/100   risk = 0.06  time = 3631   
#>  88/100   risk = 0.059  time = 3718   
#>  90/100   risk = 0.059  time = 3803   
#>  92/100   risk = 0.058  time = 3889   
#>  94/100   risk = 0.058  time = 3975   
#>  96/100   risk = 0.058  time = 4061   
#>  98/100   risk = 0.057  time = 4151   
#> 100/100   risk = 0.057  time = 4239   
#> 
#> 
#> Train 100 iterations in 0 Seconds.
#> Final risk based on the train set: 0.057
#> 
mod$getBaselearnerNames()
#> [1] "Sepal.Width_linear"                       
#> [2] "Sepal.Width_Sepal.Width_spline_centered"  
#> [3] "Petal.Length_linear"                      
#> [4] "Petal.Length_Petal.Length_spline_centered"
#> [5] "Petal.Width_linear"                       
#> [6] "Petal.Width_Petal.Width_spline_centered"  
#> [7] "Species_ridge"                            
table(mod$getSelectedBaselearner())
#> 
#> Petal.Length_Petal.Length_spline_centered 
#>                                        17 
#>                       Petal.Length_linear 
#>                                        59 
#>                        Sepal.Width_linear 
#>                                        24 
plotPEUni(mod, "Petal.Length")

mod$predict()
#>            [,1]
#>   [1,] 5.056588
#>   [2,] 4.930277
#>   [3,] 4.950279
#>   [4,] 4.985867
#>   [5,] 5.081851
#>   [6,] 5.247896
#>   [7,] 5.031326
#>   [8,] 5.061654
#>   [9,] 4.905014
#>  [10,] 4.985867
#>  [11,] 5.137441
#>  [12,] 5.091742
#>  [13,] 4.930277
#>  [14,] 4.838307
#>  [15,] 5.121707
#>  [16,] 5.314278
#>  [17,] 5.127116
#>  [18,] 5.056588
#>  [19,] 5.222634
#>  [20,] 5.162703
#>  [21,] 5.121584
#>  [22,] 5.137441
#>  [23,] 4.959040
#>  [24,] 5.096322
#>  [25,] 5.180734
#>  [26,] 4.990692
#>  [27,] 5.091742
#>  [28,] 5.086916
#>  [29,] 5.031326
#>  [30,] 5.041217
#>  [31,] 5.015955
#>  [32,] 5.061654
#>  [33,] 5.238490
#>  [34,] 5.233425
#>  [35,] 4.985867
#>  [36,] 4.919608
#>  [37,] 5.026066
#>  [38,] 5.081851
#>  [39,] 4.899754
#>  [40,] 5.061654
#>  [41,] 5.026066
#>  [42,] 4.722918
#>  [43,] 4.950279
#>  [44,] 5.117004
#>  [45,] 5.281784
#>  [46,] 4.930277
#>  [47,] 5.192791
#>  [48,] 4.980801
#>  [49,] 5.137441
#>  [50,] 5.006064
#>  [51,] 6.190562
#>  [52,] 6.095561
#>  [53,] 6.262441
#>  [54,] 5.639078
#>  [55,] 6.041791
#>  [56,] 5.994512
#>  [57,] 6.215824
#>  [58,] 5.376738
#>  [59,] 6.067053
#>  [60,] 5.696216
#>  [61,] 5.352787
#>  [62,] 5.905896
#>  [63,] 5.613815
#>  [64,] 6.114774
#>  [65,] 5.620309
#>  [66,] 6.023459
#>  [67,] 6.045037
#>  [68,] 5.784787
#>  [69,] 5.842938
#>  [70,] 5.645691
#>  [71,] 6.238799
#>  [72,] 5.765390
#>  [73,] 6.110867
#>  [74,] 6.089512
#>  [75,] 5.926536
#>  [76,] 5.998196
#>  [77,] 6.137750
#>  [78,] 6.286902
#>  [79,] 6.019774
#>  [80,] 5.504361
#>  [81,] 5.577351
#>  [82,] 5.535187
#>  [83,] 5.696216
#>  [84,] 6.261801
#>  [85,] 6.045037
#>  [86,] 6.146086
#>  [87,] 6.165299
#>  [88,] 5.821360
#>  [89,] 5.860574
#>  [90,] 5.689603
#>  [91,] 5.897147
#>  [92,] 6.092316
#>  [93,] 5.714865
#>  [94,] 5.351475
#>  [95,] 5.830109
#>  [96,] 5.905896
#>  [97,] 5.880634
#>  [98,] 5.926536
#>  [99,] 5.294365
#> [100,] 5.810049
#> [101,] 6.924529
#> [102,] 6.261801
#> [103,] 6.787418
#> [104,] 6.584052
#> [105,] 6.726964
#> [106,] 7.226203
#> [107,] 5.918725
#> [108,] 7.010945
#> [109,] 6.600652
#> [110,] 7.062325
#> [111,] 6.388113
#> [112,] 6.366606
#> [113,] 6.552337
#> [114,] 6.160590
#> [115,] 6.287064
#> [116,] 6.492918
#> [117,] 6.552337
#> [118,] 7.491818
#> [119,] 7.315773
#> [120,] 6.084803
#> [121,] 6.718079
#> [122,] 6.186654
#> [123,] 7.239194
#> [124,] 6.161392
#> [125,] 6.743342
#> [126,] 6.899267
#> [127,] 6.137750
#> [128,] 6.237179
#> [129,] 6.558789
#> [130,] 6.726964
#> [131,] 6.860226
#> [132,] 7.301489
#> [133,] 6.558789
#> [134,] 6.287064
#> [135,] 6.508264
#> [136,] 6.910751
#> [137,] 6.710364
#> [138,] 6.577600
#> [139,] 6.188275
#> [140,] 6.521963
#> [141,] 6.634576
#> [142,] 6.362851
#> [143,] 6.261801
#> [144,] 6.837943
#> [145,] 6.743342
#> [146,] 6.389376
#> [147,] 6.160590
#> [148,] 6.389376
#> [149,] 6.597750
#> [150,] 6.337588