rf-finalize

tidymodels

statlearning

template

string

Published

May 17, 2023

Aufgabe

Berechnen Sie ein prädiktives Modell mit dieser Modellgleichung:

body_mass_g ~ . (Datensatz: palmerpenguins::penguins).

Berichten Sie den RSMSE im Test-Sample!

Hinweise: - Tunen Sie mtry - Verwenden Sie Kreuzvalidierung - Verwenden Sie Standardwerte, wo nicht anders angegeben. - Fixieren Sie Zufallszahlen auf den Startwert 42.

Lösung

# Setup:
library(tidymodels)

── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──

✔ broom        1.0.5     ✔ recipes      1.0.8
✔ dials        1.2.0     ✔ rsample      1.2.0
✔ dplyr        1.1.3     ✔ tibble       3.2.1
✔ ggplot2      3.4.4     ✔ tidyr        1.3.0
✔ infer        1.0.5     ✔ tune         1.1.2
✔ modeldata    1.2.0     ✔ workflows    1.1.3
✔ parsnip      1.1.1     ✔ workflowsets 1.0.1
✔ purrr        1.0.2     ✔ yardstick    1.2.0

── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ purrr::discard() masks scales::discard()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
✖ recipes::step()  masks stats::step()
• Search for functions across packages at https://www.tidymodels.org/find/

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.4
✔ lubridate 1.9.3     ✔ stringr   1.5.0

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ readr::col_factor() masks scales::col_factor()
✖ purrr::discard()    masks scales::discard()
✖ dplyr::filter()     masks stats::filter()
✖ stringr::fixed()    masks recipes::fixed()
✖ dplyr::lag()        masks stats::lag()
✖ readr::spec()       masks yardstick::spec()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tictoc)  # Zeitmessung


# Data:
d_path <- "https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv"
d <- read_csv(d_path)

Rows: 344 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (6): rownames, bill_length_mm, bill_depth_mm, flipper_length_mm, body_ma...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# rm NA in the dependent variable:
d <- d %>% 
  drop_na(body_mass_g)


set.seed(42)
d_split <- initial_split(d)
d_train <- training(d_split)
d_test <- testing(d_split)


# model:
mod_rf <-
  rand_forest(mode = "regression",
           mtry = tune())


# cv:
set.seed(42)
rsmpl <- vfold_cv(d_train)


# recipe:
rec_plain <- 
  recipe(body_mass_g ~  ., data = d_train) %>% 
  step_impute_bag(all_predictors())


# workflow:
wf1 <-
  workflow() %>% 
  add_model(mod_rf) %>% 
  add_recipe(rec_plain)


# tuning:
tic()
wf1_fit <-
  wf1 %>% 
  tune_grid(
    resamples = rsmpl)

i Creating pre-processing data to finalize unknown parameter: mtry

toc()

23.078 sec elapsed

# best candidate:
show_best(wf1_fit)

Warning: No value of `metric` was given; metric 'rmse' will be used.

# A tibble: 5 × 7
   mtry .metric .estimator  mean     n std_err .config             
  <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
1     2 rmse    standard    282.    10   11.1  Preprocessor1_Model5
2     3 rmse    standard    282.    10   10.6  Preprocessor1_Model7
3     8 rmse    standard    282.    10    9.84 Preprocessor1_Model2
4     5 rmse    standard    283.    10    9.41 Preprocessor1_Model3
5     4 rmse    standard    283.    10    9.95 Preprocessor1_Model4

# finalize wf:
wf1_final <-
  wf1 %>% 
  finalize_workflow(select_best(wf1_fit))

Warning: No value of `metric` was given; metric 'rmse' will be used.

wf1_fit_final <-
  wf1_final %>% 
  last_fit(d_split)


# Modellgüte im Test-Set:
collect_metrics(wf1_fit_final)

# A tibble: 2 × 4
  .metric .estimator .estimate .config             
  <chr>   <chr>          <dbl> <chr>               
1 rmse    standard     327.    Preprocessor1_Model1
2 rsq     standard       0.817 Preprocessor1_Model1

Achtung: step_impute_knn scheint Probleme zu haben, wenn es Charakter-Variablen gibt.

Categories:

tidymodels
statlearning
template
string