tidymodels-remove-na2

tidymodels
statlearning
template
string
Published

November 15, 2023

Aufgabe

Das folgende Rezept ist gedacht, fehlende Werte aus dem Datensatz penguins zu entfernen. Allerdings erfüllt es diese Aufgabe nicht.

Finden Sie den Fehler und korrigieren Sie das Rezept.

Hinweise:

  • Verwenden Sie tidymodels.
  • Verwenden Sie Standardwerte, wo nicht anders angegeben.
  • Fixieren Sie Zufallszahlen auf den Startwert 42.











Lösung

# Setup:
library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──
✔ broom        1.0.5     ✔ recipes      1.0.8
✔ dials        1.2.0     ✔ rsample      1.2.0
✔ dplyr        1.1.3     ✔ tibble       3.2.1
✔ ggplot2      3.4.4     ✔ tidyr        1.3.0
✔ infer        1.0.5     ✔ tune         1.1.2
✔ modeldata    1.2.0     ✔ workflows    1.1.3
✔ parsnip      1.1.1     ✔ workflowsets 1.0.1
✔ purrr        1.0.2     ✔ yardstick    1.2.0
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ purrr::discard() masks scales::discard()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
✖ recipes::step()  masks stats::step()
• Use suppressPackageStartupMessages() to eliminate package startup messages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.4
✔ lubridate 1.9.3     ✔ stringr   1.5.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ readr::col_factor() masks scales::col_factor()
✖ purrr::discard()    masks scales::discard()
✖ dplyr::filter()     masks stats::filter()
✖ stringr::fixed()    masks recipes::fixed()
✖ dplyr::lag()        masks stats::lag()
✖ readr::spec()       masks yardstick::spec()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(easystats)
# Attaching packages: easystats 0.6.0 (red = needs update)
✔ bayestestR  0.13.1   ✔ correlation 0.8.4 
✔ datawizard  0.9.0    ✔ effectsize  0.8.6 
✔ insight     0.19.6   ✔ modelbased  0.8.6 
✔ performance 0.10.8   ✔ parameters  0.21.3
✔ report      0.5.7    ✖ see         0.8.0 

Restart the R-Session and update packages in red with `easystats::easystats_update()`.
# Data:
d_path <- "https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv"
d <- read_csv(d_path)
Rows: 344 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (6): rownames, bill_length_mm, bill_depth_mm, flipper_length_mm, body_ma...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# recipe:
rec1 <- recipe(body_mass_g ~  ., data = d) |> 
  step_naomit() 

Als Check: Das gepreppte/bebackene Rezept:

rec1_prepped <- prep(rec1)
d_train_baked <- bake(rec1_prepped, new_data = NULL)
d_train_baked |> 
  head()
# A tibble: 6 × 9
  rownames species island   bill_length_mm bill_depth_mm flipper_length_mm sex  
     <dbl> <fct>   <fct>             <dbl>         <dbl>             <dbl> <fct>
1        1 Adelie  Torgers…           39.1          18.7               181 male 
2        2 Adelie  Torgers…           39.5          17.4               186 fema…
3        3 Adelie  Torgers…           40.3          18                 195 fema…
4        4 Adelie  Torgers…           NA            NA                  NA <NA> 
5        5 Adelie  Torgers…           36.7          19.3               193 fema…
6        6 Adelie  Torgers…           39.3          20.6               190 male 
# ℹ 2 more variables: year <dbl>, body_mass_g <dbl>
describe_distribution(d_train_baked)
Variable          |    Mean |     SD |     IQR |              Range | Skewness | Kurtosis |   n | n_Missing
-----------------------------------------------------------------------------------------------------------
rownames          |  172.50 |  99.45 |  172.50 |     [1.00, 344.00] |     0.00 |    -1.20 | 344 |         0
bill_length_mm    |   43.92 |   5.46 |    9.30 |     [32.10, 59.60] |     0.05 |    -0.88 | 342 |         2
bill_depth_mm     |   17.15 |   1.97 |    3.12 |     [13.10, 21.50] |    -0.14 |    -0.91 | 342 |         2
flipper_length_mm |  200.92 |  14.06 |   23.25 |   [172.00, 231.00] |     0.35 |    -0.98 | 342 |         2
year              | 2008.03 |   0.82 |    2.00 | [2007.00, 2009.00] |    -0.05 |    -1.50 | 344 |         0
body_mass_g       | 4201.75 | 801.95 | 1206.25 | [2700.00, 6300.00] |     0.47 |    -0.72 | 342 |         2

Categories:

  • tidymodels
  • statlearning
  • template
  • string