class: center, middle, inverse, title-slide .title[ # Programming Assignment 4 ] .subtitle[ ## Data Science ] .author[ ### Andrea Valeria Carbone ] .date[ ### 2025/03/25 (updated: 2025-03-31) ] --- ``` r library(tidyverse) ``` ``` ## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ## ✔ dplyr 1.1.4 ✔ readr 2.1.5 ## ✔ forcats 1.0.0 ✔ stringr 1.5.1 ## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1 ## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ## ✔ purrr 1.0.4 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors ``` ``` r library(dplyr) library(ds4ling) ``` ``` ## ## ds4ling loaded ## Happy coding! ``` ``` r library("here") ``` ``` ## here() starts at /Users/avaleriacarbone/Desktop/pa_4 ``` --- ``` r data <- read.csv(here("data_raw", "fricatives", "fricatives.csv")) ``` --- ``` r #data tidying data_tidy <- data |> pivot_longer(cols = c(s_cog, sh_cog), names_to = "phoneme", values_to = "cog")|> pivot_longer(cols = c(s_skewness, sh_skewness), names_to = "phoneme_skewness", values_to = "skewness")|> select("phoneme", "cog", "skewness")|> mutate(phoneme = gsub("_cog", "", phoneme)) #table of descriptive statistics summary(data_tidy) ``` ``` ## phoneme cog skewness ## Length:200 Min. :2556 Min. :2.102 ## Class :character 1st Qu.:4017 1st Qu.:2.983 ## Mode :character Median :4876 Median :3.458 ## Mean :4847 Mean :3.430 ## 3rd Qu.:5761 3rd Qu.:3.865 ## Max. :7095 Max. :4.752 ``` ``` r print(summary(data_tidy)) ``` ``` ## phoneme cog skewness ## Length:200 Min. :2556 Min. :2.102 ## Class :character 1st Qu.:4017 1st Qu.:2.983 ## Mode :character Median :4876 Median :3.458 ## Mean :4847 Mean :3.430 ## 3rd Qu.:5761 3rd Qu.:3.865 ## Max. :7095 Max. :4.752 ``` ``` r write.csv(data_tidy, "../data_tidy/data_tidy.csv") ``` --- ``` r print(data_tidy) ``` ``` ## # A tibble: 200 × 3 ## phoneme cog skewness ## <chr> <dbl> <dbl> ## 1 s 3285. 2.72 ## 2 s 3285. 2.31 ## 3 sh 2556. 2.72 ## 4 sh 2556. 2.31 ## 5 s 3511. 3.09 ## 6 s 3511. 2.47 ## 7 sh 2781. 3.09 ## 8 sh 2781. 2.47 ## 9 s 4029. 3.33 ## 10 s 4029. 2.48 ## # ℹ 190 more rows ``` --- #Table of descriptive statistics ``` r summary_table <- data_tidy|> group_by(phoneme)|> summarise( mean_skew = mean (skewness), mean_cog = mean (cog), mean_skew = median (skewness), mead_cog = median (cog) ) knitr::kable(summary_table) ``` |phoneme | mean_skew| mean_cog| mead_cog| |:-------|---------:|--------:|--------:| |s | 3.457987| 5600.448| 5629.104| |sh | 3.457987| 4093.861| 4091.038| --- #Tidy data ``` r write.csv(data_tidy, "../data_tidy/data_tidy.csv") ``` --- #Boxpot of cog as a function of phoneme ``` r data_tidy|> ggplot() + (aes(x = phoneme, y = cog, fill = phoneme)) + geom_boxplot() + labs(title = "cog", x = "phoneme", y = "cog") ``` <!-- --> --- #skewness as a function of phoneme ``` r ggplot(data_tidy, aes(x = phoneme, y = skewness, fill = phoneme)) + stat_summary(fun = mean) + labs(title = "skewness as a function of phoneme", x = "phoneme", y = "skewness") ``` ``` ## Warning: Removed 2 rows containing missing values or values outside the scale range ## (`geom_segment()`). ``` <!-- --> #cog as a function of skewness for the s segments --- ``` r data_s <- data_tidy |> filter(phoneme == "s") model <- lm( cog ~ skewness, data = data_s) print(model) ``` ``` ## ## Call: ## lm(formula = cog ~ skewness, data = data_s) ## ## Coefficients: ## (Intercept) skewness ## 2322.4 955.8 ``` --- #scatterplot ``` r data_tidy |> ggplot() + aes(x = cog, y = skewness) + geom_point() + geom_smooth(method = "lm") + labs(title = "gravity as a function of skewness", x = "cog", y = "skewness") ``` ``` ## `geom_smooth()` using formula = 'y ~ x' ``` <!-- --> --- #Table model ``` r knitr::kable(summary(model)$coef) ``` | | Estimate| Std. Error| t value| Pr(>|t|)| |:-----------|---------:|----------:|--------:|------------------:| |(Intercept) | 2322.3856| 386.3135| 6.011660| 0| |skewness | 955.8394| 111.1472| 8.599758| 0| --- #Model diagnostics ``` r diagnosis(model) ``` <!-- --> --- # report There is a positive correlation between skewness and center of gravity, so as skewness increases, the center of gravity also shifts in the same direction, and the increase of COG is statistically significant with pi values around 0. --- ``` r data_2 <- read_csv(here("data_raw", "fricatives", "assumptions.csv")) ``` ``` ## Rows: 142 Columns: 2 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## dbl (2): x, y ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ``` ``` r data_2|> ggplot() + geom_point() + aes(x = x, y = y) + labs (x ="x", y ="y", tile = "Assumptions") ``` <!-- --> --- #explanation There is not real data, but a dinosaur. ---