filter-na4

2023
eda
na
string
Published

May 14, 2023

Aufgabe

Liefern Sie einen visuellen Überblick über fehlende Werte im Datensatz penguins!











Lösung

Setup

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
d_path <- "https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv"
d <- read_csv(d_path)
Rows: 344 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (6): rownames, bill_length_mm, bill_depth_mm, flipper_length_mm, body_ma...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nrow(d)
[1] 344

Weg 1

library(visdat)
vis_dat(d)

Weg 2

d_na_only <- 
  d %>% 
  rowwise() %>% 
  mutate(na_n = sum(is.na(cur_data()))) %>% 
  ungroup()
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `na_n = sum(is.na(cur_data()))`.
ℹ In row 1.
Caused by warning:
! `cur_data()` was deprecated in dplyr 1.1.0.
ℹ Please use `pick()` instead.
d_na_only %>% 
  ggplot(aes(x = na_n)) +
  geom_bar()


Categories:

  • 2023
  • eda
  • na
  • string