twitter06

textmining

twitter

programming

Published

October 28, 2022

Exercise

Laden Sie \(n=10^k\) Tweets von Twitter herunter (mit \(k=4\)) via der Twitter API; die Tweets sollen jeweils an eine prominente Person gerichtet sein.

Beziehen Sie sich auf folgende Personen bzw. Twitter-Accounts:

Markus_Soeder
karl_lauterbach.

Bereiten Sie die Textdaten mit grundlegenden Methoden des Textminings auf (Tokenisieren, Stopwörter entfernen, Zahlen entfernen, …).

Nutzen Sie die Daten dann, um eine Sentimentanalyse zu erstellen.

Vergleichen Sie die Ergebnisse für alle untersuchten Personen.

Solution

library(rtweet)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()  masks stats::filter()
✖ purrr::flatten() masks rtweet::flatten()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tidytext)
library(lsa)  # Stopwörter

Loading required package: SnowballC

library(SnowballC)  # Stemming

data(sentiws, package = "pradadata")

Zuerst muss man sich anmelden und die Tweets herunterladen:

source("/Users/sebastiansaueruser/credentials/hate-speech-analysis-v01-twitter.R")

auth <- rtweet_app(bearer_token = Bearer_Token)

tweets_to_kl <- search_tweets("@karl_lauterbach", n = 1e2, include_rts = FALSE)
#write_rds(tweets_to_kl, file = "tweets_to_kl.rds", compress = "gz")
tweets_to_ms <- search_tweets("@Markus_Soeder", n = 1e4, include_rts = FALSE)
#write_rds(tweets_to_ms, file = "tweets_to_ms.rds", compress = "gz")

Die Vorverarbeitung pro Screenname packen wir in eine Funktion, das macht es hinten raus einfacher:

prepare_tweets <- function(tweets){
  
  tweets %>% 
    select(full_text) %>% 
    unnest_tokens(output = word, input = full_text) %>% 
    anti_join(tibble(word = lsa::stopwords_de)) %>% 
    mutate(word = str_replace_na(word, "^[:digit:]+$")) %>% 
    mutate(word = str_replace_na(word, "hptts?://\\w+")) %>% 
    mutate(word = str_replace_na(word, " +")) %>% 
    drop_na()
}

Test:

kl_prepped <- 
  prepare_tweets(tweets_to_kl_raw)

Joining with `by = join_by(word)`

head(kl_prepped)

# A tibble: 6 × 1
  word                     
  <chr>                    
1 tonline⁩                  
2 spreche                  
3 neuen                    
4 pläne                    
5 bundesgesundheitsminister
6 karl_lauterbach⁩

ms_prepped <-
  prepare_tweets(tweets_to_ms_raw)

Joining with `by = join_by(word)`

head(ms_prepped)

# A tibble: 6 × 1
  word         
  <chr>        
1 markus_soeder
2 climate      
3 activists    
4 are          
5 sometimes    
6 depicted

Scheint zu passen.

Die Sentimentanalyse packen wir auch in eine Funktion:

get_tweets_sentiments <- function(tweets){
  
  tweets %>% 
    inner_join(sentiws) %>% 
    group_by(neg_pos) %>% 
    summarise(senti_avg = mean(value, na.rm = TRUE),
              senti_sd = sd(value, na.rm = TRUE),
              senti_n = n()) 
}

Test:

kl_prepped %>% 
  get_tweets_sentiments()

Joining with `by = join_by(word)`

Warning in inner_join(., sentiws): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 6649 of `x` matches multiple rows in `y`.
ℹ Row 3102 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

# A tibble: 2 × 4
  neg_pos senti_avg senti_sd senti_n
  <chr>       <dbl>    <dbl>   <int>
1 neg        -0.313    0.237    3576
2 pos         0.112    0.145    5800

Test:

tweets_to_kl_raw %>% 
  prepare_tweets() %>% 
  get_tweets_sentiments()

Joining with `by = join_by(word)`
Joining with `by = join_by(word)`

Warning in inner_join(., sentiws): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 6649 of `x` matches multiple rows in `y`.
ℹ Row 3102 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

# A tibble: 2 × 4
  neg_pos senti_avg senti_sd senti_n
  <chr>       <dbl>    <dbl>   <int>
1 neg        -0.313    0.237    3576
2 pos         0.112    0.145    5800

Scheint zu passen.

Wir könnten noch die beiden Funktionen in eine wrappen:

prep_sentiments <- function(tweets) {

  tweets %>% 
    prepare_tweets() %>% 
    get_tweets_sentiments()
}

tweets_to_kl_raw %>% 
  prep_sentiments()

Joining with `by = join_by(word)`
Joining with `by = join_by(word)`

Warning in inner_join(., sentiws): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 6649 of `x` matches multiple rows in `y`.
ℹ Row 3102 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

# A tibble: 2 × 4
  neg_pos senti_avg senti_sd senti_n
  <chr>       <dbl>    <dbl>   <int>
1 neg        -0.313    0.237    3576
2 pos         0.112    0.145    5800

Okay, jetzt werden wir die Funktion auf jede Screenname bzw. die Tweets jedes Screennames an.

tweets_list <-
  list(
    kl = tweets_to_kl_raw, 
    ms = tweets_to_ms_raw)

sentis <-
  tweets_list %>% 
  map_df(prep_sentiments, .id = "id")

Joining with `by = join_by(word)`
Joining with `by = join_by(word)`

Warning in inner_join(., sentiws): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 6649 of `x` matches multiple rows in `y`.
ℹ Row 3102 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Joining with `by = join_by(word)`
Joining with `by = join_by(word)`

Warning in inner_join(., sentiws): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 17223 of `x` matches multiple rows in `y`.
ℹ Row 2894 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Categories:

textmining
twitter
programming