→ A | warning: 21 samples were requested but there were 12 rows in the data. 12 will be used.
There were issues with some computations A: x1
There were issues with some computations A: x11
→ B | warning: 30 samples were requested but there were 12 rows in the data. 12 will be used.
There were issues with some computations A: x11
There were issues with some computations A: x25 B: x16
→ C | warning: 40 samples were requested but there were 12 rows in the data. 12 will be used.
There were issues with some computations A: x25 B: x16
There were issues with some computations A: x25 B: x25 C: x24
There were issues with some computations A: x26 B: x25 C: x25
There were issues with some computations A: x50 B: x31 C: x25
There were issues with some computations A: x50 B: x50 C: x38
There were issues with some computations A: x50 B: x50 C: x50
toc()
20.49 sec elapsed
fit_tree
# Tuning results
# 2-fold cross-validation
# A tibble: 2 × 4
splits id .metrics .notes
<list> <chr> <list> <list>
1 <split [12/12]> Fold1 <tibble [125 × 7]> <tibble [75 × 3]>
2 <split [12/12]> Fold2 <tibble [125 × 7]> <tibble [75 × 3]>
There were issues with some computations:
- Warning(s) x50: 21 samples were requested but there were 12 rows in the data. 12 ...
- Warning(s) x50: 30 samples were requested but there were 12 rows in the data. 12 ...
- Warning(s) x50: 40 samples were requested but there were 12 rows in the data. 12 ...
Run `show_notes(.Last.tune.result)` for more information.
→ A | warning: There were 11 warnings in `dplyr::mutate()`.
The first warning was:
ℹ In argument: `model = iter(...)`.
Caused by warning:
! 21 samples were requested but there were 12 rows in the data. 12 will be used.
ℹ Run `dplyr::last_dplyr_warnings()` to see the 10 remaining warnings.
There were issues with some computations A: x1
There were issues with some computations A: x7
There were issues with some computations A: x13
There were issues with some computations A: x19
There were issues with some computations A: x25
→ B | warning: There were 11 warnings in `dplyr::mutate()`.
The first warning was:
ℹ In argument: `model = iter(...)`.
Caused by warning:
! 30 samples were requested but there were 12 rows in the data. 12 will be used.
ℹ Run `dplyr::last_dplyr_warnings()` to see the 10 remaining warnings.
There were issues with some computations A: x25
There were issues with some computations A: x25 B: x5
There were issues with some computations A: x25 B: x11
There were issues with some computations A: x25 B: x16
There were issues with some computations A: x25 B: x22
→ C | warning: There were 11 warnings in `dplyr::mutate()`.
The first warning was:
ℹ In argument: `model = iter(...)`.
Caused by warning:
! 40 samples were requested but there were 12 rows in the data. 12 will be used.
ℹ Run `dplyr::last_dplyr_warnings()` to see the 10 remaining warnings.
There were issues with some computations A: x25 B: x22
There were issues with some computations A: x25 B: x25 C: x3
There were issues with some computations A: x25 B: x25 C: x9
There were issues with some computations A: x25 B: x25 C: x14
There were issues with some computations A: x25 B: x25 C: x19
There were issues with some computations A: x25 B: x25 C: x25
There were issues with some computations A: x26 B: x25 C: x25
There were issues with some computations A: x30 B: x25 C: x25
There were issues with some computations A: x35 B: x25 C: x25
There were issues with some computations A: x40 B: x25 C: x25
There were issues with some computations A: x47 B: x25 C: x25
There were issues with some computations A: x50 B: x27 C: x25
There were issues with some computations A: x50 B: x33 C: x25
There were issues with some computations A: x50 B: x39 C: x25
There were issues with some computations A: x50 B: x45 C: x25
There were issues with some computations A: x50 B: x50 C: x27
There were issues with some computations A: x50 B: x50 C: x33
There were issues with some computations A: x50 B: x50 C: x38
There were issues with some computations A: x50 B: x50 C: x44
There were issues with some computations A: x50 B: x50 C: x50
There were issues with some computations A: x50 B: x50 C: x50
Wie man sieht, ist die Modellgüte im Test-Sample schlechter als in den Train- bzw. Validierungssamples; ein typischer Befund.
Categories:
statlearning
trees
tidymodels
string
Source Code
---exname: tidymodels-tree1expoints: 1extype: stringexsolution: NAcategories:- statlearning- trees- tidymodels- stringdate: '2023-11-08'slug: tidymodels-tree1title: tidymodels-tree1---```{r}library(tidymodels)```# AufgabeBerechnen Sie folgende prädiktiven Modelle und vergleichen Sie die Modellgüte:1. Entscheidungsbaum2. Bagging (Bootstrap-Bäume)Modellformel: `am ~ .` (Datensatz `mtcars`)Berichten Sie die Modellgüte (ROC-AUC).Hinweise:- Tunen Sie alle Parameter (die der Engine anbietet). - Verwenden Sie Defaults, wo nicht anders angegeben.- Führen Sie eine $v=2$-fache Kreuzvalidierung durch (weil die Stichprobe so klein ist).- Beachten Sie die [üblichen Hinweise](https://datenwerk.netlify.app/hinweise).</br></br></br></br></br></br></br></br></br></br># Lösung## Setup```{r}library(tidymodels)data(mtcars)library(tictoc) # Zeitmessunglibrary(baguette)```Für Klassifikation verlangt Tidymodels eine nominale AV, keine numerische:```{r}mtcars <- mtcars %>%mutate(am =factor(am))```## Daten teilen```{r}d_split <-initial_split(mtcars)d_train <-training(d_split)d_test <-testing(d_split)```## Modell(e)```{r}mod_tree <-decision_tree(mode ="classification",cost_complexity =tune(),tree_depth =tune(),min_n =tune())mod_bag <-bag_tree(mode ="classification",cost_complexity =tune(),tree_depth =tune(),min_n =tune())```## Rezept(e)```{r}rec_plain <-recipe(am ~ ., data = d_train)```## Resampling```{r}rsmpl <-vfold_cv(d_train, v =2)```## Workflows```{r}wf_tree <-workflow() %>%add_recipe(rec_plain) %>%add_model(mod_tree)``````{r}wf_bag <-workflow() %>%add_recipe(rec_plain) %>%add_model(mod_bag)```## Tuning/FittingTuninggrid:```{r}tune_grid <-grid_regular(extract_parameter_set_dials(mod_tree), levels =5)tune_grid```Da beide Modelle die gleichen Tuningparameter aufweisen,brauchen wir nur ein Grid zu erstellen.```{r}tic()fit_tree <-tune_grid(object = wf_tree,grid = tune_grid,metrics =metric_set(roc_auc),resamples = rsmpl)toc()fit_tree``````{r}tic()fit_bag <-tune_grid(object = wf_bag,grid = tune_grid,metrics =metric_set(roc_auc),resamples = rsmpl)toc()```## Bester Kandidat```{r}show_best(fit_tree)``````{r}show_best(fit_bag)```Bagging erzielte eine klar bessere Modellgüte (in den Validierungssamples) als das Entscheidungsbaum-Modell.## Finalisieren```{r}wf_best_finalized <- wf_bag %>%finalize_workflow(select_best(fit_bag))```## Last Fit```{r}final_fit <-last_fit(object = wf_best_finalized, d_split)collect_metrics(final_fit)```Wie man sieht, ist die Modellgüte im Test-Sample schlechter als in den Train- bzw. Validierungssamples; ein typischer Befund.---Categories: - statlearning- trees- tidymodels- string