Skip to article frontmatterSkip to article content

Forests

How big a forest?

In using bagging forests, you have to pick the number of trees. True or false: in the limit as the number of trees goes to infinity, the training error will always converge to zero.

Feature importance

Many packages allow you compute feature importance based on an ensemble of trees. In the context of estimating a regression function, each feature importance values indicate the extent to which each feature was helpful in reducing the mean squared predictive error on the training data. More specifically...

  1. The importance value for each feature is proportional to the total MSPE improvement due to splitting with that feature.

  2. The importance values are computed by first taking the proportion of total MSPE improvement due to splitting with each feature. The importance value for each feature is then computed as the negative log of the corresponding proportion.

  3. The importance value for each feature is equal to the exponentiated total MSPE improvement due to splitting with that feature.

Random forests vs boosting forests

Which is true?

  1. Boosting forests are built one at a time, each tree built using information from all the trees that have already been built. By contrast, each tree in a random forest is built independently.

  2. Random forests are built one at a time, each tree built using information from all the trees that have already been built. By contrast, each tree in a boosting forest is built independently.

Boosting by hand

You have a dataset with three samples. For each sample you have one predictor XX and one response YY. You also have already used a tree to fit an estimate, f^(x)E[YX=x]\hat f(x)\approx \mathbb{E}[Y|X=x]. The dataset and the current estimate f^(x)\hat f(x) are shown below.

     Sample      X     \hat f(X)     Y 
  ------------ ----- ------------- -----
       #1        3        4.5       4.5
       #2        7         6        5.0
       #3        8         6        7.0

Now you would like to use boosting. You want to fit a second tree that improves on the results of the first. You decide your new tree will have only two leaves. Let g^(x)\hat g(x) denote the function associated with the new tree. What is the function?