Instead of getting huge predictive power, model-based (or component-wise) boosting aggregates statistical models to maintain interpretability. This is done by restricting the used base learners to be linear in the parameters. This is important in terms of parameter estimation. For instance, having a base learner $$b_j(x, \theta^{[m]})$$ of a feature $$x$$ with a corresponding parameter vector $$\theta^{[m]}$$ and another base learner $$b_j$$ of the same type but with a different parameter vector $$\theta^{[m^\prime]}$$, then it is possible, due to linearity, to combine those two learners to a new one, again, of the same type: $b_j(x, \theta^{[m]}) + b_j(x, \theta^{[m^\prime]}) = b_j(x, \theta^{[m]} + \theta^{[m^\prime]})$ Instead of boosting trees like xgboost, component-wise boosting is (commonly) used with statistical models to construct an interpretable model with inherent feature selection.
The illustration above uses three base learners $$b_1$$, $$b_2$$, and $$b_3$$. You can think of each base learner as a wrapper around a feature which represents the effect of that feature. Hence, selecting a base learner more frequently than another one can indicate a higher importance.
\begin{align} \text{Iteration 1:} \ &\hat{f}^{[1]}(x) = f_0 + \beta b_3(x_3, \theta^{[1]}) \\ \text{Iteration 2:} \ &\hat{f}^{[2]}(x) = f_0 + \beta b_3(x_3, \theta^{[1]}) + \beta b_3(x_3, \theta^{[2]}) \\ \text{Iteration 3:} \ &\hat{f}^{[3]}(x) = f_0 + \beta b_3(x_3, \theta^{[1]}) + \beta b_3(x_3, \theta^{[2]}) + \beta b_1(x_1, \theta^{[3]}) \end{align} Using the linearity of the base learners allows to aggregate $$b_3$$: $\hat{f}^{[3]}(x) = f_0 + \beta \left( b_3(x_3, \theta^{[1]} + \theta^{[2]}) + b_1(x_1, \theta^{[3]}) \right)$
This simple example illustrates some strength of component-wise boosting. For example, an inherent variable selection or the estimation of partial feature effects. Further, component-wise boosting is an efficient approach to get fit to high-dimensional feature spaces $$p \gg n$$.