Completed exercises for the ninth lab
This document is meant to be used to practice after you have completed the tutorial for today’s lab. Make sure to put your name as the author of the document, above!
If you intend to work on these exercises while referring to the tutorial, there are instructions on the wiki on how to do so. You may also want to refer to past labs. Don’t forget that previous labs are linked to on the main labs website.
In the tutorial, we learned about using lm() and summary() for regressions, and cor() and cor.test() for correlations. You’ll use those and the library(ggplot2) functions to plot them to make further sense of the predictions data, including adding regression lines. You’ll also practice (briefly) filter() and a few other functions to clean up the data as provided.
You can find a completed version of these exercises at https://jdbest.github.io/psychRstats/answers.html
Don’t forget to (a) save and (b) knit the document frequently, so you’ll keep track of your work and also know where you run into errors.
As always, you must load packages if you intend to use their functions. Run the following code chunk to load necessary packages for these exercises.
As discussed in the tutorial, we’re using data from Beall, Hofer, & Shaller (2016).
Beall, A. T., Hofer, M. K., & Shaller, M. (2016). Infections and elections: Did an Ebola outbreak influence the 2014 U.S. federal elections (and if so, how)? Psychological Science, 27, 595-605. https://doi.org/10.1177/0956797616628861
Make sure you read the description of the study in the tutorial—it’s important for thinking about what we’re doing in these exercises.
In the tutorial, we used a “cleaned-up” version of the data. But let’s actually use the raw data here: that one is called beall_untidy.csv and should be in the same folder as this document.
The data was downloaded with this file. Load it using the read_csv() command—probably with the code below:
predictions <- read_csv("beall_untidy.csv")
For the questions below, create your own code chunks and insert all code into them.
filter() function:
Date and Month columnDJIA with either select() (putting a - in front of the name will remove it) or by assigning predictions$DJIA to the value NULLcor.test() and your predictions data. (You’ll use the columns Ebola.Search.Volume.Index and LexisNexisNewsVolumeWeek) Then, briefly report the correlation. Is it significant?cor.test(predictions$Ebola.Search.Volume.Index, predictions$LexisNexisNewsVolumeWeek)
Pearson's product-moment correlation
data: predictions$Ebola.Search.Volume.Index and predictions$LexisNexisNewsVolumeWeek
t = 11.759, df = 63, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7331684 0.8923563
sample estimates:
cor
0.8288528
There was a significant relationship between the Ebola-search-volume index and the LexisNexis index, \(r(63)=.83, 95% CI [.73, .89], p < .05\)
ggplot() + geom_point(). Add a theme and label the axes. Add a regression line using geom_smooth() or geom_abline() (you’ll get the data in the next question).ggplot(predictions, aes(x = Ebola.Search.Volume.Index,
y = LexisNexisNewsVolumeWeek)) +
geom_point() +
theme_classic() +
geom_smooth(method = "lm", se = FALSE, formula = "y ~ x") +
labs(x = "Ebola-search-volume index", y = "LexisNexis index")

lm() function to create a regression model of the same relationship. Then use summary() to get the results. Report them succinctly below. Also report what parallels exist between the numbers from this regression and the correlation.model <- lm(Ebola.Search.Volume.Index ~ LexisNexisNewsVolumeWeek,
data = predictions)
summary(model)
Call:
lm(formula = Ebola.Search.Volume.Index ~ LexisNexisNewsVolumeWeek,
data = predictions)
Residuals:
Min 1Q Median 3Q Max
-20.615 -7.050 -1.244 9.823 24.349
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.91116 2.73372 -0.699 0.487
LexisNexisNewsVolumeWeek 0.15516 0.01319 11.759 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.88 on 63 degrees of freedom
Multiple R-squared: 0.687, Adjusted R-squared: 0.682
F-statistic: 138.3 on 1 and 63 DF, p-value: < 2.2e-16
There was a statistically-significant relationship between the two indexes, \(b=0.16,p<.05\), with an \(R^2\) of .69, \(p<.05\).
filter() to select only the scores from the two-week period including the last week of September and the first week of October. You could look at the Month and Date columns… but the third column might be more helpful. Don’t forget to assign this to a new data frame so we can use it.highanxtime <- filter(predictions, Two.weeks.prior.to.outbreak.only==1)
On the full dataset, run the correlation analyses we did in the tutorial, for the association between Ebola search volume index and voter intention index.
With the filtered data from #5, re-run the correlation analyses for the association between Ebola search volume index and voter intention index. Is the correlation higher or lower?
cor.test(highanxtime$Voter.Intention.Index,
highanxtime$Ebola.Search.Volume.Index)
Pearson's product-moment correlation
data: highanxtime$Voter.Intention.Index and highanxtime$Ebola.Search.Volume.Index
t = 15.975, df = 6, p-value = 3.821e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.9351079 0.9979890
sample estimates:
cor
0.9884478
It’s much higher—although note that there are many fewer data points!
ggplot(highanxtime,
aes(x = Ebola.Search.Volume.Index, y = Voter.Intention.Index)) +
geom_point() +
theme_classic() +
geom_smooth(method = "lm", se = FALSE, formula = "y ~ x") +
labs(x = "Ebola-search-volume index", y = "LexisNexis index")

For attribution, please cite this work as
Dainer-Best (2020, Nov. 6). psychRstats: Learning Statistics for Psychology in R: Correlation and Regression (Lab 09) Exercises, Completed. Retrieved from https://jdbest.github.io/psychRstats/answers/09-lab/
BibTeX citation
@misc{dainer-best2020correlation,
author = {Dainer-Best, Justin},
title = {psychRstats: Learning Statistics for Psychology in R: Correlation and Regression (Lab 09) Exercises, Completed},
url = {https://jdbest.github.io/psychRstats/answers/09-lab/},
year = {2020}
}