Final 291
Flashcards
Learn
Test
Match
Terms in this set (66)
A recent Gallup poll asked 3,104 Americans aged 18 and older, "if an FDA-approved vaccine to prevent coronavirus was available right now at no cost, would you agree to be vaccinated?" 65% of those polled said, "yes" identify the sample
the 3,104 Americans surveyed
A recent Gallup poll asked 3,104 Americans aged 18 and older, "if an FDA-approved vaccine to prevent coronavirus was available right now at no cost, would you agree to be vaccinated?" 65% of those polled said, "yes" identify the population
All American aged 18 and older
A recent Gallup poll asked 3,104 Americans aged 18 and older, "if an FDA-approved vaccine to prevent coronavirus was available right now at no cost, would you agree to be vaccinated?" 65% of those polled said, "yes" identify the statistic
the 65% of those surveyed who said they would agree to be vaccinated
A recent Gallup poll asked 3,104 Americans aged 18 and older, "if an FDA-approved vaccine to prevent coronavirus was available right now at no cost, would you agree to be vaccinated?" 65% of those polled said, "yes" identify the parameter
The percent of all Americans aged 18 and older who would agree to be vaccinated
In the read.csv() function in R, what does "strings=T" mean?
Non-numerical data should be considered categorical data
Which of the following R functions returns the frequency distribution for a categorical variable?
summary()
In the mean() function in R, what does "na.rm=T" mean?
ignore any missing data when computing the mean
Which of the following functions tells R to consider the variable zip code as a categorical variable?
factor()
The range of the middle 50% of data is called the
IQR
The 50th percentile called the
median
The typical deviation of observation from the mean is called the
standard deviation
which measure is best used to describe center when the distribution is symmetric with no outliers?
mean
The operations Team at Chicky Inc. is testing a new kitchen design to improve efficiency. They think its success might differ between mall restaurants and free-standing restaurants, so they decide to test in 10 mall restaurants and 30 freestanding restaurants. Which sampling design should they use?
Stratified random sample
The tendency of a sample statistic to systematically under- or overestimate a population parameter is called
bias
Which of the following reduces variability in a statistic from sample to sample?
using a larger sample size
The sampling distribution of the sample mean is
shaped like a normal distribution as long as the sample size is large enough or if it comes from a population that is normally distributed
A telemarketing campaign for a cell phone company has a probability of 0.07 of gaining a new customer. What is the probability that contacting 50 potential customers will result in 5 new customers?
dbinom(5,50,0.07)
A telemarketing campaign for a cell phone company has a probability of 0.07 of gaining a new customer. What is the probability that contacting 50 potential customers will result in at least 4 new customers?
1-pbinom(3,50,0.07)
A telemarking campaign for a cell phone company gains an average of 35 new customers per day. What is the probability of gaining 40 new customers in a day?
dposi (40,35)
Customers who download music for a popular web service spend an average of $26 per month with a standard deviation of $4. The distribution of amount spent is normally distributed. What is the probability that a randomly selected customer spends $30 per month?
none of the above
Customers who download music for a popular web service spend an average of $26 per month with a standard deviation of $4. The distribution of amount spent is normally distributed. What is the probability that a randomly selected customer spends more than $30 but less than $35 per month
pnorm(35,26,4)-pnorm(30,26,4)
Customers who download music for a popular web service spend an average of $26 per month with a standard deviation of $4. The distribution of amount spent is normally distributed. How much do the top 10% of customers spend?
qnorm(90,26,4)
Customers who download music for a popular web service spend an average of $26 per month with a standard deviation of $4. What is the probability that the mean amount spent for a random sample of 40 customers is less than $25 per month?
pnorm(25,26.4/sqrt(40))
In a study of college graduated with at least 3 years of work experience, 80% own a mutual fund, 50% own a stock, and 35% own a mutual fund and a stock. What percent own a mutual fund or a stock?
0.80+0.50-0.35=0.95
In a study of college graduated with at least 3 years of work experience, 80% own a mutual fund, 50% own a stock, and 35% own a mutual fund and a stock. Which of the following statements is true?
Owning a mutual fund and owning a stock are neither mutually exclusive nor independent
Your less-educated cousin says that Clemson fans are just as smart as UofSC fans. You, of course, know this is wrong and want to prove to him that Gamecocks are smarter. You collect a random sample of Clemson fans and a random sample of UofSC fans. You give each fan an IQ test. What type of analysis should you conduct to prove that Gamecock fans are smarter than tiger fans?
Two-sample t-test
Your less-educated cousin says that Clemson fans are just as smart as UofSC fans. You, of course, know this is wrong and want to prove to him that Gamecocks are smarter. You collect a random sample of Clemson fans and a random sample of UofSC fans. You give each fan an IQ test. What type of analysis should you conduct to prove that Gamecock fans are smarter than tiger fans? Is this a dependent or independent samples design?
independent samples
We want the power of a test to be
large
Decreasing the sample size will generally make a confidence interval
wider
Using the same sample of data to construct a confidence interval, a 90% interval will be
narrower than a 95% interval
What is the general formula for a confidence interval?
point estimate +- margin of error
Consider the following hypothesis test:
H0: You do not have COVID-19
Ha: You do have COVID-19
You go to the Student Health Center to get tested. Which of the following scenarios describes a Type II error?
You have COVID-19 but you get a negative test result (false negative)
Before performing a statistical hypothesis test, you decide that the consequences of a Type I error would be more serious than the consequences of a Type II error. What can you do to decrease the likelihood of committing a Type I error?
decrease a
You statistical hypothesis test returns a p-value of 0.23. you should conclude:
There is little to no evidence for the alternative hypothesis
A pharmaceutical company develops a training program to teach its new sales representatives about the drugs they will be selling. To test the effectiveness of the program, each sales rep is given a knowledge test prior to the training program and another test after the program. The company interested in difference between pre-test score and post-test score. Is this a dependent or independent samples design?
dependent
In a recent survey, 50% of students said they have spent less time on coursework since the pandemic began. You think that the true percentage is higher. Identify the appropriate null and alternative hypotheses
Ho: p=0.50, Ha: p>0.50
Suppose a hypothesis test for the difference in two means with Ha: M1-M2<0 gives a p-value of 0.034. A 95% confidence interval:
will contain all negative values
Which ANOVA assumption about the errors can you check with a QQPlot
normal distribution
The incomes of all households in the United States are positively (right) skewed. The US census bureau recently took a sample of 2,000 US households, collected data about their annual incomes, and calculated the sample mean. We know this sample mean comes from a sampling distribution that is:
approximately normally distributed
Based on what she observed over the past several semesters, Ms. Walters believes that, on average, female students have more stickers on their laptops than male students do. After collecting data, her analysis returns a p-value of 0.45. Does the data support her hypothesis?
No, the p-value is large which fails to provide sufficient evidence that female students have more laptop stickers, on average, than male students
Ms. Walters wants to determine the effect of class year and gender on the number of stickers a student has on his/her laptop. What type of analysis should she conduct to answer this question? (Assume the conditions/assumptions for that analysis are met)
two-way ANOVA
Ms. Walters wants to determine the effect of class year and gender on the number of stickers a student has on his/her laptop. When she reads her data into R, she realizes that R has identified class year (1,2,3,4) and gender (0,1) as integer variables as opposed to categorical variables. Which R function will correct this?
factor()
Which of the following is NOT a requirement for a two-sample t-test?
Errors have equal variance
Which of the following R codes will return a 90% confidence interval for the mean difference, for an independent samples design?
t.test (x,y, conf.level=0.90)
A random sample of college freshmen was surveyed about seatbelt usage. A 95% confidence interval for the proportion of all college freshman who always wear a seatbelt when driving was computed to be (0.612,0.668). Which of the following is the correct interpretation of this confidence interval? (Assume all surveyed freshman answered truthfully)
We are 95% confident that the proportion of all college freshman who always wear a seatbelt when driving is between 0.612 and 0.668
A random sample of college freshmen was surveyed about seatbelt usage. A 95% confidence interval for the proportion of all college freshman who always wear a seatbelt when driving was computed to be (0.612,0.668). What does it mean when we say we are 95% confident?
If we were to take many random samples of college freshman and compute a 95% confidence interval for each one. 5% of those intervals would NOT contain the true proportion of all college freshman who always wear a seatbelt when driving
The sampling distribution of the sample mean is centered at the population mean. Because of this, we say the sample mean is
unbiased
You fit a two-way ANOVA model and the interaction term is not statistically significant. What should you do?
Remove the interaction term and return the model with just the main effects terms
Your statistical hypothesis test returns a p-value of 0.0012. You should conclude:
there is strong evidence for the alternative hypothesis
You fit a simple linear regression model and get a p-value of 0.005. You can conclude
there is a significant relationship between X and Y
If a 95% confidence interval for the coefficient of the a predictor contain 0, the p-value for the predictor will be
larger than 0.05
Increasing the sample size will usually yield a
narrower confidence interval
We can increase the power of a test by
increasing sample size
Increasing the confidence level will yield a
wider confidence interval
in multiple regression, an observation has high leverage if
it has an X value that is extreme or an unusual combination of X values
Which statistic is used to identify influential observations in multiple regression?
Cook's D
If using a regression for prediction, it is important to
reduce the standard error
You run a multiple regression and the coefficients have non-intuitive signs and inflated standard errors. You should suspect
collinearity
When adding a predictor to a model, adjusted R-squared
can decrase
A recent analysis of data on 53 patients who tested positive for COVID-19 in January showed that amount of hemoglobin in the blood was a significant predictor of whether or not the patient would develop severe respiratory disease. "Amount of hemoglobin in the blood" is the ______ variable in this study
continuous predictor
A recent analysis of data on 53 patients who tested positive for COVID-19 in January showed that amount of hemoglobin in the blood was a significant predictor of whether or not the patient would develop severe respiratory disease. "Whether or not the patient develops severe respiratory disease" is the ______ variable in this study
Binary response
A recent analysis of data on 53 patients who tested positive for COVID-19 in January showed that amount of hemoglobin in the blood was a significant predictor of whether or not the patient would develop severe respiratory disease. To predict probability of a COVID-19 patient developing severe respiratory disease, using amount of hemoglobin in the blood as the predictor, we could carry out which type of analysis?
logistic regression
In a simple linear regression model, if the assumptions of normality and equal variance are violated, you should try:
using a transformation on one or both of the variables
if the probability of an event is 0.20, the odds are
0.20/0.80=0.25
Which of the following is NOT a good practice for choosing the best regression model?
A relatively small R-square value
Which R function should be used if you want to fit a logistic regression model?
glm()
