Define Statistics
Statistics is the discipline involving concepts and methods of making inferences about populations when there is variation in observed data.
Define Population
The entire group of research interest.
Define Sample
A subset of the population
Define variable
Any observable event that can vary
Define Response Variable
This is a dependent variable. A variable that is affected by another variable.
Define explanatory variable
This is an independent variable. One that affects another variable.
What are descriptive statistics?
It's the gathering, sorting, summarizing/graphing of the data by standard methods. They communicate the results of the sample.
What are inferential statistics?
using data from a sample to draw (infer) conclusions about and estimate the population parameters
Is the following sentence an example of descriptive or inferential statistics? 56 of all statistical consulting appointments held in 2018 were with students from the ANSC department.
Descriptive
Is the following sentence an example of descriptive or inferential statistics? Based upon an internet survey, 40% of all dog owners used flea/tick preventatives
descriptive
What are the measures of spread?
Population Variance, Population Standard Deviation.
What are the two descriptive parameters for a normal distribution?
Mean and standard deviation
What percentage of a normally distributed bell curve is explained by the area within one standard deviation from the mean?
68%
What percentage of a normally distributed bell curve is explained by the area within two standard deviations from the mean?
95%
What does categorical data represent?
characteristics or classes of data based on a qualitative trait. ex. Alive or dead
Give an example of Nominal data
discrete units to label variable with no quantitative value or order. Breed: Hereford, Angus, Charolais, Simmental.
Give an example of Ordinal Data.
Discrete and ordered units. Customer satisfaction on a likert scale (1-5)
Give an example of Binary data
A type of categorical data in which there are only two categories. This data can be nominal or ordinal. Ex. Pregnant or Open.
Give an example of discrete values
Values are distinct, and only certain values are possible. # of embryos flushed.
Give an example of continuous data
Any value within an interval is possible. Body weight, concentrations of N, P, K.
What is one of the main differences between ratio and interval data?
Interval data has no real zero. Ex. Temperature. Ratio does have a true zero. Ex. Weight
What is a Type 1 Error?
This is where we measure a connection outcome but in reality there is no connection. False Positive
What is a Type 2 error?
This is where we are not able to measure a connection but there is one in reality. False Negative.
What is power?
It's the probability of "Correct Positive" rejecting a false null hypothesis. The ability to detect statistical differences.
What does a P value mean?
It's the alpha. It's the rate at which we're willing to accept a Type 1 error.
When do Type 2 errors occur?
When we have insufficient power to find a difference of a given size significance.
How can we reduce the rate of Type 2 errors?
Increase the number of replicates.
What is correlation?
indicates the strength of a linear relationship between two continuous variables.
What's the difference in how Correlation and Regression measure linear relationships?
Correlation shows the strength and direction only. Regression is a model for x predicting y.
What does the Shapiro-Wilk test measure?
Normality. Checks how well the data matches a bell curve.
What is the model for simple linear regression?
Y = Y intercept + Slope * Observation + Error
What does R-Squared mean in simple linear regression?
It's how predictive the model is. >.7 is predictive. Lower R-Squared indicated a lower ability to predict other values.
What is Heteroscedasticity?
the circumstance in which the variability of a variable is unequal across the range of values of a second variable that predicts it.
What does R-squared mean in multiple regression?
It's how much of the variation in the data is being explained by our variables.
What is VIF?
The noise added due to relationships between variables. The more related those variables are, the higher the VIF.
What does First and Second Moment Specification mean?
If the First and second motion test in non significant we can assume equal variance.
What does Cook's D test for?
Influential points.
When looking at a polynomial Type 1 SS results where should you check to see if the quadratic, quartic, or cubic relationships are significant?
The P-value
What is the Central Limit Theorem
With a large enough dataset all data will result in a normal distribution.
If your Y variable is catagorical and your X variable is continuous, what kind of statistical method should you implement?
Logistical Regression
If you have neither a dependent or independent variable, what statistical method should you implement?
Contingency Tables.
;