Errata: May 19, 2021
Thank you for purchasing Machine learning with R, the tidyverse, and mlr. Please post errata not listed below in this book's LiveBook Errata thread. We'll update this list as necessary. Thank you!

In chapter 2, page 27, section 2.3.3:

The sentence: "A common frustration people have..." should finish with an additional clause in parentheses, such that the sentence ends: "...variables to factors by default (prior to R 4.0.0)."

In chapter 2, pages 44 & 45, section 2.7.1:

The third element of the elementLengths object should be [1] 10, not [1] 20. Similarly, the result of map(listOfNumerics, length) should show that the $c element is [1] 10, not [1] 20.

In chapter 2, page 46, section 2.7.2:

The third elements of the map_int(listOfNumerics, length) and map_chr(listOfNumerics, length) outputs should be 10 and "10", respectively (10, rather than 20).

In chapter 8, page 201, section 8.3:

The heading for listing 8.9 is incorrect. It should be: Cross-validating the model-building process.

In chapter 9, page 231, section 9.2.4:

Delete from list level 2, item b: "Update template to support more than two levels of nested ordered lists."

Correct list should be:

"1) Split data into three folds.

2) For each fold:

a) Use the rpart algorithm to impute the missing values

b) Perform feature selection:

i) Use a selection method (such as backward search) to select combinations of features to train models on.

ii) Use 10-fold cross-validation to evaluate the performance of each model.

3) Return the best-performing model for each of the three outer folds.

4) Return the Mean MSE to give us our estimate of performance."

In chapter 16, page 391, section 16.2.4:

This correction hasn't been made in any format.

The TIP at the bottom of the page should be corrected to: Two other internal cluster metrics are implemented by mlr: silhouette and G2 (use listMeasures("cluster") to list the available metrics). Both metrics are more computationally expensive to compute. Sadly, the dunn index no longer works in mlr since this book was published, so you should omit it in the subsequent examples. Use clValid::dunn() to calculate it yourself.

In chapter 20, page 475, section 20.1.3:

The sidebar entitled "Training set, test set and ... validation set?" reverses the terms "test set" and "validation set". It should read: You may see other people refer to splitting their data into a training set, test set, and validation set. I want to show you how this is just a special case of nested cross-validation. When using this approach, people train the model using the training set, using a range of hyperparameter values, and use the validation set to evaluate the performance of these hyperparameter values. The model with the best-performing hyperparameter values is then given the test set to make predictions on. The performance of the model on the test set is used as the final indicator of the model-building process' performance. The importance of this is that the test set isn’t seen by the model during training at all, including during hyperparameter tuning, so there is no information leak for the model to learn patterns present in the test set."