BaselearnerPSpline creates a spline base learner object. The object calculates the B-spline basis functions and in the case of P-splines also the penalty. Instead of defining the penalty term directly, one should consider to restrict the flexibility by setting the degrees of freedom.


S4 object.



Data object which contains the raw data (see ?InMemoryData).


Type of the base learner (if not specified, blearner_type = "spline" is used). The unique id of the base learner is defined by appending blearner_type to the feature name: paste0(data_source$getIdentifier(), "_", blearner_type).


Degree of the piecewise polynomial (default degree = 3 for cubic splines).


Number of inner knots (default n_knots = 20). The inner knots are expanded by degree - 1 additional knots at each side to prevent unstable behavior on the edges.


Penalty term for P-splines (default penalty = 2). Set to zero for B-splines.


The number of differences to are penalized. A higher value leads to smoother curves.


Degrees of freedom of the base learner(s).


The binning root to reduce the data to \(n^{1/\text{binroot}}\) data points (default bin_root = 1, which means no binning is applied). A value of bin_root = 2 is suggested for the best approximation error (cf. Wood et al. (2017) Generalized additive models for gigadata: modeling the UK black smoke network daily data).


BaselearnerPSpline$new(data_source, list(degree, n_knots, penalty, differences, df, bin_root))
BaselearnerPSpline$new(data_source, blearner_type, list(degree, n_knots, penalty, differences, df, bin_root))


This class doesn't contain public fields.


  • $summarizeFactory(): () -> ()

  • $transfromData(newdata): list(InMemoryData) -> matrix()

  • $getMeta(): () -> list()

Inherited methods from Baselearner

  • $getData(): () -> matrix()

  • $getDF(): () -> integer()

  • $getPenalty(): () -> numeric()

  • $getPenaltyMat(): () -> matrix()

  • $getFeatureName(): () -> character()

  • $getModelName(): () -> character()

  • $getBaselearnerId(): () -> character()


The data matrix is instantiated as transposed sparse matrix due to performance reasons. The member function $getData() accounts for that while $transformData() returns the raw data matrix as p x n matrix.


# Sample data:
x = runif(100, 0, 10)
y = sin(x) + rnorm(100, 0, 0.2)
dat = data.frame(x, y)

# S4 wrapper

# Create new data object, a matrix is required as input:
data_mat = cbind(x)
data_source = InMemoryData$new(data_mat, "my_data_name")

# Create new linear base learner factory:
bl_sp_df2 = BaselearnerPSpline$new(data_source,
  list(n_knots = 10, df = 2, bin_root = 2))
bl_sp_df5 = BaselearnerPSpline$new(data_source,
  list(n_knots = 15, df = 5))

# Get the transformed data:
#> [1] 14 10
#> [1]  19 100

# Summarize factory:
#> Spline factory of degree 3
#> 	- Name of the used data: my_data_name
#> 	- Factory creates the following base learner: spline_degree_3

# Get full meta data such as penalty term or matrix as well as knots:
#> List of 8
#>  $ df         : num 2
#>  $ penalty    : num 729953
#>  $ penalty_mat: num [1:14, 1:14] 1 -2 1 0 0 0 0 0 0 0 ...
#>  $ degree     : num 3
#>  $ n_knots    : num 10
#>  $ differences: num 2
#>  $ bin_root   : num 0
#>  $ knots      : num [1:18, 1] -2.587 -1.6952 -0.8033 0.0886 0.9804 ...
#>          [,1]
#> [1,] 729953.1
bl_sp_df5$getPenalty() # The penalty here is smaller due to more flexibility
#>         [,1]
#> [1,] 52.0648

# Transform "new data":
newdata = list(InMemoryData$new(cbind(rnorm(5)), "my_data_name"))
#> $design
#> 5 x 14 sparse Matrix of class "dgCMatrix"
#> [1,] 0.1666667 0.66666667 0.1666667 .         .           . . . . . . . . .
#> [2,] 0.1666667 0.66666667 0.1666667 .         .           . . . . . . . . .
#> [3,] .         0.05295636 0.5818030 0.3599001 0.005340631 . . . . . . . . .
#> [4,] 0.1666667 0.66666667 0.1666667 .         .           . . . . . . . . .
#> [5,] 0.1666667 0.66666667 0.1666667 .         .           . . . . . . . . .
#> $design
#> 5 x 19 sparse Matrix of class "dgCMatrix"
#> [1,] 0.1666667 6.666667e-01 0.1666667 .         .         . . . . . . . . . . .
#> [2,] 0.1666667 6.666667e-01 0.1666667 .         .         . . . . . . . . . . .
#> [3,] .         9.687235e-05 0.2115857 0.6599926 0.1283248 . . . . . . . . . . .
#> [4,] 0.1666667 6.666667e-01 0.1666667 .         .         . . . . . . . . . . .
#> [5,] 0.1666667 6.666667e-01 0.1666667 .         .         . . . . . . . . . . .
#> [1,] . . .
#> [2,] . . .
#> [3,] . . .
#> [4,] . . .
#> [5,] . . .

# R6 wrapper

cboost_df2 = Compboost$new(dat, "y")
cboost_df2$addBaselearner("x", "sp", BaselearnerPSpline,
  n_knots = 10, df = 2, bin_root = 2)
cboost_df2$train(200, 0)
#> Train 200 iterations in 0 Seconds.
#> Final risk based on the train set: 0.21

cboost_df5 = Compboost$new(dat, "y")
cboost_df5$addBaselearner("x", "sp", BaselearnerPSpline,
  n_knots = 15, df = 5)
cboost_df5$train(200, 0)
#> Train 200 iterations in 0 Seconds.
#> Final risk based on the train set: 0.02

# Access base learner directly from the API (n = sqrt(100) = 10 with binning):
#>  num [1:14, 1:10] 0.167 0.667 0.167 0 0 ...
#>  num [1:19, 1:100] 0 0 0 0 0 ...

gg_df2 = plotPEUni(cboost_df2, "x")
gg_df5 = plotPEUni(cboost_df5, "x")


(gg_df2 | gg_df5) &
  geom_point(data = dat, aes(x = x, y = y - c(cboost_df2$offset)), alpha = 0.2)