tidymodels-vorlage3

tidymodels
statlearning
template
string
Published

November 15, 2023

Aufgabe

Schreiben Sie eine prototypische Analyse für ein Vorhersagemodell, das sich als Vorlage für Analysen dieser Art eignet!

Verzichten Sie auf Resampling und Tuning.

Hinweise:

  • Berechnen Sie ein Modell
  • Tunen Sie keinen Parameter des Modells
  • Verwenden Sie keine Kreuzvalidierung.
  • Verwenden Sie Standardwerte, wo nicht anders angegeben.
  • Fixieren Sie Zufallszahlen auf den Startwert 42.











Lösung

# Setup:
library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──
✔ broom        1.0.5     ✔ recipes      1.0.8
✔ dials        1.2.0     ✔ rsample      1.2.0
✔ dplyr        1.1.3     ✔ tibble       3.2.1
✔ ggplot2      3.4.4     ✔ tidyr        1.3.0
✔ infer        1.0.5     ✔ tune         1.1.2
✔ modeldata    1.2.0     ✔ workflows    1.1.3
✔ parsnip      1.1.1     ✔ workflowsets 1.0.1
✔ purrr        1.0.2     ✔ yardstick    1.2.0
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ purrr::discard() masks scales::discard()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
✖ recipes::step()  masks stats::step()
• Dig deeper into tidy modeling with R at https://www.tmwr.org
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.4
✔ lubridate 1.9.3     ✔ stringr   1.5.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ readr::col_factor() masks scales::col_factor()
✖ purrr::discard()    masks scales::discard()
✖ dplyr::filter()     masks stats::filter()
✖ stringr::fixed()    masks recipes::fixed()
✖ dplyr::lag()        masks stats::lag()
✖ readr::spec()       masks yardstick::spec()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tictoc)  # Zeitmessung
library(easystats)   # NAs zählen
# Attaching packages: easystats 0.6.0 (red = needs update)
✔ bayestestR  0.13.1   ✔ correlation 0.8.4 
✔ datawizard  0.9.0    ✔ effectsize  0.8.6 
✔ insight     0.19.6   ✔ modelbased  0.8.6 
✔ performance 0.10.8   ✔ parameters  0.21.3
✔ report      0.5.7    ✖ see         0.8.0 

Restart the R-Session and update packages in red with `easystats::easystats_update()`.
# Data:
d_path <- "https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv"
d <- read_csv(d_path)
Rows: 344 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (6): rownames, bill_length_mm, bill_depth_mm, flipper_length_mm, body_ma...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
set.seed(42)
d_split <- initial_split(d)
d_train <- training(d_split)
d_test <- testing(d_split)


# model:
mod1 <-
  rand_forest(mode = "regression")


# cv:
set.seed(42)
rsmpl <- vfold_cv(d_train)


# recipe:
rec1 <- recipe(body_mass_g ~  ., data = d_train) |> 
  step_unknown(all_nominal_predictors(), new_level = "NA") |> 
  step_naomit(all_predictors()) |> 
  step_dummy(all_nominal_predictors()) |> 
  step_zv(all_predictors()) |> 
  step_normalize(all_predictors()) 



# workflow:
wf1 <-
  workflow() %>% 
  add_model(mod1) %>% 
  add_recipe(rec1)


# tuning:
tic()
wf1_fit <-
  wf1 %>% 
  last_fit(split = d_split)
→ A | error:   Missing data in columns: bill_length_mm, bill_depth_mm, flipper_length_mm.
There were issues with some computations   A: x1
There were issues with some computations   A: x1
Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
information.
toc()
0.594 sec elapsed
collect_metrics(wf1_fit)
NULL

Als Check: Das gepreppte/bebackene Rezept:

rec1_prepped <- prep(rec1)
d_train_baked <- bake(rec1_prepped, new_data = NULL)
d_train_baked |> 
  head()
# A tibble: 6 × 12
  rownames bill_length_mm bill_depth_mm flipper_length_mm    year body_mass_g
     <dbl>          <dbl>         <dbl>             <dbl>   <dbl>       <dbl>
1   -1.24          -1.53          0.386            -0.794 -1.29          3450
2    1.45           1.32          0.386            -0.365  1.14          3675
3   -0.212          0.401        -1.97              0.707 -1.29          4500
4   -0.993          0.343         0.887            -0.294 -0.0757        4150
5    0.530          0.879        -0.566             2.07  -0.0757        5800
6   -0.281         -0.957         0.787            -1.15   1.14          3650
# ℹ 6 more variables: species_Chinstrap <dbl>, species_Gentoo <dbl>,
#   island_Dream <dbl>, island_Torgersen <dbl>, sex_male <dbl>, sex_NA. <dbl>
describe_distribution(d_train_baked)
Variable          |      Mean |     SD |     IQR |              Range | Skewness | Kurtosis |   n | n_Missing
-------------------------------------------------------------------------------------------------------------
rownames          | -5.63e-17 |   1.00 |    1.70 |      [-1.72, 1.68] |    -0.01 |    -1.21 | 257 |         0
bill_length_mm    | -2.97e-16 |   1.00 |    1.68 |      [-2.28, 2.98] |     0.01 |    -0.79 | 257 |         0
bill_depth_mm     |  2.71e-16 |   1.00 |    1.60 |      [-2.02, 2.19] |    -0.11 |    -0.87 | 257 |         0
flipper_length_mm | -9.83e-16 |   1.00 |    1.64 |      [-1.94, 2.07] |     0.32 |    -1.02 | 257 |         0
year              | -6.89e-14 |   1.00 |    2.43 |      [-1.29, 1.14] |    -0.12 |    -1.51 | 257 |         0
body_mass_g       |   4200.97 | 792.54 | 1212.50 | [2700.00, 6300.00] |     0.49 |    -0.69 | 257 |         0
species_Chinstrap | -2.24e-17 |   1.00 |    0.00 |      [-0.50, 1.98] |     1.49 |     0.22 | 257 |         0
species_Gentoo    |  1.64e-17 |   1.00 |    2.07 |      [-0.76, 1.31] |     0.56 |    -1.70 | 257 |         0
island_Dream      | -5.50e-17 |   1.00 |    2.08 |      [-0.75, 1.34] |     0.60 |    -1.66 | 257 |         0
island_Torgersen  |  1.72e-17 |   1.00 |    0.00 |      [-0.41, 2.43] |     2.04 |     2.18 | 257 |         0
sex_male          | -5.86e-17 |   1.00 |    2.00 |      [-0.96, 1.03] |     0.07 |    -2.01 | 257 |         0
sex_NA.           |  1.45e-17 |   1.00 |    0.00 |      [-0.15, 6.46] |     6.35 |    38.63 | 257 |         0

Categories:

  • tidymodels
  • statlearning
  • template
  • string