Compboost wraps the
S4 class system exposed by
Rcpp to make defining
objects, adding objects, the training, calculating predictions, and plotting much easier.
As already mentioned, the
Compboost R6 class is just a wrapper and compatible
with the most
# Constructor cboost = Compboost$new(data, target, optimizer = OptimizerCoordinateDescent$new(), loss, learning_rate = 0.05, oob_fraction = NULL) # Member functions cboost$addLogger(logger, use_as_stopper = FALSE, logger_id, ...) cbboost$addBaselearner(feature, id, bl_factory, data_source = InMemoryData, data_target = InMemoryData, ...) cbboost$train(iteration = 100, trace = -1) cboost$getCurrentIteration() cboost$prepareData(newdata) cboost$prepareResponse(response) cboost$predict(newdata = NULL, as_response = FALSE) cboost$getInbagRisk() cboost$getSelectedBaselearner() cboost$getEstimatedCoef() cboost$plot(blearner_name = NULL, iters = NULL, from = NULL, to = NULL, length_out = 1000) cboost$getBaselearnerNames() cboost$getLoggerData() cboost$calculateFeatureImportance(num_feats = NULL) cboost$plotFeatureImportance(num_feats = NULL) cboost$plotInbagVsOobRisk()
A data frame containing the data (features as well as target).
Character value containing the target variable or
Response object. Note that the loss has to match the
data type of the target.
S4 Optimizer object exposed by Rcpp (e.g.
to specify how features are selected in each iteration.
S4 Loss object exposed by Rcpp which is used to calculate the risk and pseudo
Learning rate to shrink the new parameters in each iteration.
Fraction of how much data are used to calculate the out of bag risk.
S4 Logger class object that is registered in the model.
See the details for possible choices.
Logical value indicating whether the new logger should also be used as stopper (early stopping). Default value is
Id of the new logger. This is necessary to be able to register multiple logger.
Further arguments passed to the constructor of the
S4 Logger class specified in
logger. For possible arguments see details or the help pages (e.g.
Vector of column names that are used as input data matrix for a single base-learner. Note that not every base-learner supports the use of multiple features (e.g. the spline base-learner does not).
Id of the base-learners. This is necessary since it is possible to define multiple learners using equal features.
Uninitialized base-learner factory given as
S4 Factory class. See the details
for possible choices.
Data source object. Just in memory data objects are supported at the moment.
Data target object. Just in memory data objects are supported at the moment.
Further arguments passed to the constructor of the
S4 Factory class specified in
bl_factory. For possible arguments see the help pages (e.g.
Number of iterations that are trained. If the model is already trained it sets to the given number by going back to already trained base-learners or it trains new ones. Note: This function defines an iteration logger with the id
_iterations which is used as stopper for the new training.
Integer indicating after how many iterations a trace should be printed. Specifying
trace = 10, then every
10th iteration is printed. If you do not want to print the trace set
trace = 0. Default is
-1 which means that in total 40 iterations are printed.
Data to predict on. If newdata equals
NULL predictions on the training data are returned.
Character name of the base-learner to plot the contribution to the response. Available choices for
Integer vector containing the iterations the user wants to visualize.
Lower bound for the x axis (should be smaller than
Upper bound for the x axis (should be greater than
Number of equidistant points between
to used for plotting.
For cboost$calculateFeatureImportance() and cboost$plotFeatureImportance():
Number of features for which the Importance will be returned.
Data used for training the algorithm.
Data used for out of bag tracking.
Fraction of how much data are used to track the out of bag risk.
Response object that is created or passed in target for training the model.
Response object that is created by specifying the
oob_fraction to evaluate each iteration.
Name of the target variable.
Name of the given dataset.
Optimizer used within the fitting process.
Loss used to calculate pseudo residuals and empirical risk.
Learning rate used to shrink the estimated parameter in each iteration.
S4 Compboost_internal class object from which the main operations (such as train) are called.
List of all registered factories represented as
S4 FactoryList class.
Character containing the name of the positive class in the case of (binary) classification.
Logical indicating whether all stopper should be used simultaneously or if it is sufficient to just use the first stopper to stop the algorithm.
method to add a logger to the algorithm (Note: This is just possible before the training).
method to add a new base-learner to the algorithm (Note: This is just possible before the training).
method to get the current iteration on which the algorithm is set.
method to train the algorithm.
method to predict on a trained object.
method to get a character vector of selected base-learner.
method to get a list of estimated coefficient of each selected base-learner.
method to plot individual feature effects.
method to get the names of the registered factories.
method to prepare data to track the out of bag risk of an arbitrary loss/performance function.
method to the the logged data from all registered logger.
method to calculate feature importance.
method to plot the feature importance calculated by
method to plot the inbag vs the out of bag behavior. This is just applicable if a logger with name
oob_logger was registered. This is automatically done if the
oob_fraction is set.
cboost = Compboost$new(mtcars, "mpg", loss = LossQuadratic$new(), oob_fraction = 0.3) cboost$addBaselearner("hp", "spline", BaselearnerPSpline, degree = 3, n.knots = 10, penalty = 2, differences = 2)#> Warning: Unused arguments "n.knots" in list.cboost$train(1000)#> 1/1000 risk = 14 oob_risk = 27 #> 25/1000 risk = 4.2 oob_risk = 2.9 #> 50/1000 risk = 3.1 oob_risk = 5 #> 75/1000 risk = 2.8 oob_risk = 6.6 #> 100/1000 risk = 2.7 oob_risk = 7.3 #> 125/1000 risk = 2.6 oob_risk = 7.6 #> 150/1000 risk = 2.5 oob_risk = 7.8 #> 175/1000 risk = 2.4 oob_risk = 8 #> 200/1000 risk = 2.4 oob_risk = 8.1 #> 225/1000 risk = 2.3 oob_risk = 8.2 #> 250/1000 risk = 2.3 oob_risk = 8.3 #> 275/1000 risk = 2.3 oob_risk = 8.4 #> 300/1000 risk = 2.3 oob_risk = 8.4 #> 325/1000 risk = 2.2 oob_risk = 8.5 #> 350/1000 risk = 2.2 oob_risk = 8.6 #> 375/1000 risk = 2.2 oob_risk = 8.6 #> 400/1000 risk = 2.2 oob_risk = 8.6 #> 425/1000 risk = 2.2 oob_risk = 8.7 #> 450/1000 risk = 2.1 oob_risk = 8.7 #> 475/1000 risk = 2.1 oob_risk = 8.7 #> 500/1000 risk = 2.1 oob_risk = 8.7 #> 525/1000 risk = 2.1 oob_risk = 8.8 #> 550/1000 risk = 2.1 oob_risk = 8.8 #> 575/1000 risk = 2.1 oob_risk = 8.8 #> 600/1000 risk = 2.1 oob_risk = 8.8 #> 625/1000 risk = 2 oob_risk = 8.8 #> 650/1000 risk = 2 oob_risk = 8.8 #> 675/1000 risk = 2 oob_risk = 8.8 #> 700/1000 risk = 2 oob_risk = 8.8 #> 725/1000 risk = 2 oob_risk = 8.8 #> 750/1000 risk = 2 oob_risk = 8.8 #> 775/1000 risk = 2 oob_risk = 8.9 #> 800/1000 risk = 2 oob_risk = 8.9 #> 825/1000 risk = 2 oob_risk = 8.9 #> 850/1000 risk = 1.9 oob_risk = 8.9 #> 875/1000 risk = 1.9 oob_risk = 8.9 #> 900/1000 risk = 1.9 oob_risk = 8.9 #> 925/1000 risk = 1.9 oob_risk = 8.9 #> 950/1000 risk = 1.9 oob_risk = 8.9 #> 975/1000 risk = 1.9 oob_risk = 8.9 #> 1000/1000 risk = 1.9 oob_risk = 8.9 #> #> #> Train 1000 iterations in 0 Seconds. #> Final risk based on the train set: 1.9 #>table(cboost$getSelectedBaselearner())#> #> hp_spline #> 1000cboost$plot("hp_spline")cboost$plotInbagVsOobRisk()