Tests based on the normal distribution . Requires data from on of the large catalog of distributions that statisticians have described and for data to be parametric certain assumptions must be true. Assumptions of a parametric test:
1) Normally distributed data
2) Homogeneity of variance - variances should be the same throughout the data.
3) Interval data
4) Independence - data from different participants are independent. Behavior of one participant does not influence behavior of another.
p-p plot (probability-probability plot)
Plots the cumulative probability of a variable against the cumulative probability of a particular distribution. Data are ranked and sorted, then each rank's z-score is calculated, next score itself is converted into a z-score. z-score is plotted against expected z-score. If data is normally distributed then actual z-score will be the same as expected z-score.
- to check that the distribution of scores is approximately normal, we need to look at values of skewness and kurtosis.
- Positive values of skewness indicate too many low scores in the distribution, whereas negative values indicate a build up of high scores.
- Positive values of kurtosis indicate a pointy and heavy-tailed distribution, whereas negative values indicate a flat and light-tailed distribution.
-The further the value is from zero, the more likely it is that the data are not normally distributed.
-You can convert these scores to z-scores by dividing by their standard error. If the resulting score (ignoring the minus sign) is greater than 1.96 then it is significant (p<.05)
-Significance tests of skew and kurtosis should not be used in large samples, (because they are likely to be significant even when skew and kurtosis are not different from normal.)
Kolmogorov-smirnov test and Shapiro-Wilk test
They compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation . If the test is non-significant (p>.05) it tells us that the distribution of the sample is not significantly different from a normal distribution. (i.e. its probably normal). If test is significant (p<.05) then the distribution in question is significantly different from a normal distribution (It is non-normal).
Used in Kolmogorov Test. Similar to a P-P plot except that it plots the quantiles of the data set instead of every individual score in the data. Interpreted the same as a P-p plot but it will have less points on it because rather than plotting every single data point, it plots only values that divide the data into equal parts (so they can be easier to interpret if you have a lot of scores)
Values that split a data set into equal portions. Can have other quantiles like percentiles
Points that split data into 100 equal parts
Points that split the data into nine equal parts
- The K-S (Kolmogorov-Smirnov) test can be used to see if a distribution of scores significantly differs from a normal distribution.
- The Shapiro-Wilk test does much the same thing but it has more power to detect differences from normality (so you might find this test is significant when the K-S test is not).
- In large sample these tests can be significant when the scores are only slightly different from a normal distribution. Therefore, they should always be interpreted in conjunction with histograms, P-p and Q-Q plots and the values of skew and kurtosis.
Part of testing the homogeneity of variance. Tests the null hypothesis that the variances in different groups are equal (i.e. the difference between the variances is zero). Uses one-way ANOVA. Conducted on the deviation of score, that is the absolute difference between each score and the mean from the group that it came from. If Levene's test is significant at p< (or equal to).05 then we can conclude that the null hypothesis is incorrect and that the variances are significantly different- therefore the assumption of homogeneity of variances has been violated. If Leneve's test is non-significant, p> .05, the variances are roughly equal and the assumption is tenable.
Hartley's F(max) or Variance ratio
The ratio of variances between the group with the biggest variance and the group with the smallest variance. This ratio was compared to critical values in a table published by Hartley. The largest group variance divided by the smallest.
Warning against large samples.
Dealing with Outliers
Options for reducing the impact of outliers:
1) Reduce the case: Deleting the data from the person who contributed the outlier only if you have a good reason to.
2) Transform the data: skews in data can be reduced by applying transformatives to the data.
3) Change the score: If transformation fails, then you can consider replacing the score. If score you change is very unrepresentative of the data, it is okay to change.
--How to change scores: change the score to be one unit above the next highest score in the data set.
--Convert back from a z-score: see equation on p. 153
-- The mean plus two standard deviations.