bike01

statlearning

tidymodels

num

Published

May 17, 2023

Aufgabe

Kann man die Anzahl gerade verliehener Fahrräder eines entsprechenden Anbieters anhand der Temperatur vorhersagen?

In dieser Übung untersuchen wir diese Frage.

Sie können die Daten von der Webseite der UCI herunterladen.

Wir beziehen uns auf den Datensatz day.

Berechnen Sie ein lineares Modell mit der Anzahl der aktuell vermieteten Räder als AV und der aktuellen Temperatur als UV!

Geben Sie den MSE an!

Hinweise

Lösung

library(tidymodels)
library(tidyverse)

d <- read.csv("/Users/sebastiansaueruser/datasets/Bike-Sharing-Dataset/day.csv")

glimpse(d)

Rows: 731
Columns: 16
$ instant    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
$ dteday     <chr> "2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04", "20…
$ season     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ yr         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ mnth       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ holiday    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
$ weekday    <int> 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4,…
$ workingday <int> 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1,…
$ weathersit <int> 2, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2,…
$ temp       <dbl> 0.3441670, 0.3634780, 0.1963640, 0.2000000, 0.2269570, 0.20…
$ atemp      <dbl> 0.3636250, 0.3537390, 0.1894050, 0.2121220, 0.2292700, 0.23…
$ hum        <dbl> 0.805833, 0.696087, 0.437273, 0.590435, 0.436957, 0.518261,…
$ windspeed  <dbl> 0.1604460, 0.2485390, 0.2483090, 0.1602960, 0.1869000, 0.08…
$ casual     <int> 331, 131, 120, 108, 82, 88, 148, 68, 54, 41, 43, 25, 38, 54…
$ registered <int> 654, 670, 1229, 1454, 1518, 1518, 1362, 891, 768, 1280, 122…
$ cnt        <int> 985, 801, 1349, 1562, 1600, 1606, 1510, 959, 822, 1321, 126…

Data split

set.seed(42)
split_vec <- initial_split(d, strata = cnt)

d_train <- training(split_vec)
d_test <- testing(split_vec)

Define recipe

rec1 <- 
  recipe(cnt ~ temp, data = d)

Define model

m1 <-
  linear_reg()

Workflow

wf1 <-
  workflow() %>% 
  add_model(m1) %>% 
  add_recipe(rec1)

Fit

fit1 <- last_fit(wf1, split_vec)
fit1

# Resampling results
# Manual resampling 
# A tibble: 1 × 6
  splits            id               .metrics .notes   .predictions .workflow 
  <list>            <chr>            <list>   <list>   <list>       <list>    
1 <split [547/184]> train/test split <tibble> <tibble> <tibble>     <workflow>

Model performance (metrics) in test set

fit1 %>% collect_metrics()

# A tibble: 2 × 4
  .metric .estimator .estimate .config             
  <chr>   <chr>          <dbl> <chr>               
1 rmse    standard    1509.    Preprocessor1_Model1
2 rsq     standard       0.411 Preprocessor1_Model1

MSE <- fit1 %>% collect_metrics() %>% pluck(3, 1)
MSE

[1] 1509.477

Solution: 1509.4768321

Categories:

statlearning
tidymodels
num