purrr-map06

programming

loop

Published

October 24, 2022

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Exercise

Erstellen Sie eine Tabelle mit mit folgenden Spalten:

ID-Spalte: \(1,2,..., 10\)
Eine Spalte mit Namem ds (ds wie Plural von Datensatz), die als geschachtelt (nested) pro Element jeweils einen der folgenden Datensätze enthält: mtcars, iris, chickweight, ToothGrowth (alle in R enthalten)

Berechnen Sie eine Spalte, die die Anzahl der Spalten von ds zählt!

Solution

Hier sind einige Datensätze, in einer Liste zusammengefasst:

ds <- list(mtcars = mtcars, iris = iris, chickweight =  ChickWeight, toothgrowth = ToothGrowth)

Daraus erstellen wir eine Tabelle mit Listenspalte für die Daten:

d <- 
  tibble(id = 1:length(ds),
         ds = ds)

Jetzt führen wir die Funktion ncol aus, und zwar für jedes Element von ds. Wir brauchen also eine Art Schleife, das besorgt map für uns. Viele Funktionen in R sind “auomatisch verschleift” - das nennt man vektorisiert. Vektorisierte Funktionen werden für jedes Element eines Vektors ausgeführt.

Ein Beispiel für eine vektorisierte Funktion ist die Funktion +:

x <- c(1,2,3)
y <- c(10, 20, 30)
x + y

[1] 11 22 33

Man könnte übrigens auch schreiben:

`+`(x, y)

[1] 11 22 33

Was zeigt, dass + eine normale Funktion ist.

Zurück zur eigentlichen Aufgabe. Aber ncol ist eben nicht vektorisiert, darum müssen wir da noch eine Schleife dazu bauen, das macht map.

d2 <- 
  d %>% 
  mutate(n_col = map(ds, ncol)) 

head(d2)

# A tibble: 4 × 3
     id ds                   n_col       
  <int> <named list>         <named list>
1     1 <df [32 × 11]>       <int [1]>   
2     2 <df [150 × 5]>       <int [1]>   
3     3 <nfnGrpdD [578 × 4]> <int [1]>   
4     4 <df [60 × 3]>        <int [1]>

Entnesten wir noch n_col:

d2 %>% 
  unnest(n_col)

# A tibble: 4 × 3
     id ds                   n_col
  <int> <named list>         <int>
1     1 <df [32 × 11]>          11
2     2 <df [150 × 5]>           5
3     3 <nfnGrpdD [578 × 4]>     4
4     4 <df [60 × 3]>            3

Wir können auch gleich map anweisen, keine Liste, sondern eine Zahl (double, reelle ) Zahl zurückzuliefern, dann sparen wir uns das entschachteln:

d %>% 
  mutate(n_col = map_dbl(ds, ncol))

# A tibble: 4 × 3
     id ds                   n_col
  <int> <named list>         <dbl>
1     1 <df [32 × 11]>          11
2     2 <df [150 × 5]>           5
3     3 <nfnGrpdD [578 × 4]>     4
4     4 <df [60 × 3]>            3

Categories:

programming
loop