45 terms

Statistics

provide a way of understanding, illustrating, or otherwise making sense of quantitative data.

Randomized control trials (RCTs)

are true experimental designs where the subjects are randomly assigned to control and treatment groups.

systematic reviews

are processes whereby published research from RCTs are pulled together on a specific topic using strict inclusion criteria, reviewed collectively, and presented in a meaningful way so the reader understands the topic in light of many studies viewed together.

the effects of a sex education program (independent variable) on the rate of teen pregnancy (dependent variable) in a high school.

Simply put, an experiment examines the effects of __________ (independent variable) on __________ (dependent variable).

Population

If we use a hospital for an experiment and include everyone in that hospital, then we are dealing with a population.

Sample

If we are using this hospital to represent a network of similar hospitals, then we are dealing with a sample.

Simple random sampling

is the strongest method because it randomly selects a sample from a larger group.

convenience sampling.

This approach uses a group for the simple reason of accessibility to the researcher.

Descriptive statistics

are test results that describe or characterize the data.

Inferential statistics

are used to imply something (or predict something) of a larger group based on the results from a sample.

Nominal level data

is the lowest order. It is a naming level such as sex (male or female), race (African American, Caucasian, Hispanic, Pacific Islander, etc.), and blood type (A, B, AB, O).

Ordinal level data

is one step above nominal data. Ordinal level data is a ranking level, that is, the numbers indicate placing, but do not have a significant value otherwise. You cannot perform mathematical functions on the numbers. Examples are placing in a contest (1st, 2nd, 3rd, etc.) or class rank.

Interval level data i

s one of the two higher order levels. With interval level data, the numbers have a mathematical value and the intervals between two numbers have value, but interval data does not include an absolute zero.

Ratio level data

is the other higher order level. Ratio level data is the same as interval data except it has an absolute zero.

critical probability.

In statistics, probability is used to represent the point at which the stated outcome is considered to be an unlikely the result of chance alone. It is usually represented as p < 0.01 and means the outcome of the experiment is expected to occur less than one time in 100 chance (p = 1/100).

Z scores

simply reflect the standard deviations. The mean of a distribution is given a z score of zero. A z score of +1 is saying the score is one standard deviation above the means, etc. A z score can be calculated using the following formula:

z = (raw score - mean score) / standard deviation.

z = (raw score - mean score) / standard deviation.

T scores

simply use increments of 10. The mean is given a T score of 50 and the standard deviations increase/decrease by 10. A T score of 60 is one standard deviation above the mean.

skew of a distribution

refers to how the curve leans. When a curve has extreme scores on the right hand side of the distribution, it is said to be positively skewed.

When the tail of the curve is pulled downward by extreme low scores, it is said to be negatively skewed.

When the tail of the curve is pulled downward by extreme low scores, it is said to be negatively skewed.

Sampling error

is the error that results when using a sample mean to estimate a population characteristic.

sampling distribution.

This not an error in sample selection, but rather a random phenomenon (the randomness of how the fruit settles in the barrel). The way each of these sample means cluster around the population mean is called the

Central Limit Theorem

(p. 96) states the means of a larger number of samples drawn randomly from the same population will be normally distributed (follow a normal distribution curve as presented in the previous unit).

standard error of the mean

If one were to calculate a standard deviation of these sample means, it would be referred to as the

Confidence interval

is often reported in literature (articles) written about research the author has conducted. When working with samples of populations in research, you must understand the results are estimates and you make inferences about the population based on those estimates. In general, a confidence interval indicates how accurate one believes the estimate to be.

directional hypothesis

(one-tailed hypothesis)

null hypothesis

(non-directional, two-tailed hypothesis).

type I error

is the rejection of a true null hypothesis. If, for example, you conducted a study on Phen-Phen and stated your null hypothesis as the drug being ineffective and unsafe, a type I error would be to reject this null and proclaim the drug as safe and effective.

type II error

is the failure to reject a false null hypothesis (remember II and two F's as a helpful hint). If, for example, you conducted a study on a new diet drug and state in the null that the drug would be ineffective and unsafe, a type II error would be to accept the null when in fact the drug was safe and effective. A type II error represents a missed opportunity.

Degrees of freedom (df)

are based on the t-distribution. The t-distribution is a way of reflecting our confidence in a sample mean and standard deviation while accurately reflecting a population. This confidence is based on sample size: the smaller the sample, the less confident

chi-square tests.

This type of test is common with nominal level data and is considered a lower order statistical test. Basically, it tells if there is or is not a statistically significant difference between the highest and lowest modes among the groups.

correlation

look for a statistically significant relationship between two variables. Strengths of correlation can range from -1.00 to +1.00. A correlation of +1.00 is a perfect positive correlation

Regression analysis

basically evaluates how one set of data relates to another. A mathematical approach is used to create a best fit line through a scattergram called a regression line. This line represents the relationship between x and y. This procedure is particularly useful when evaluating one factor as a predictor of another such as the amount of drug (x- axis) and human response (y-axis). _______should not be used to make such predictions beyond two variables being evaluated. Further, regression analysis should not be confused with correlations. A correlation such as Pearson evaluates the strength of a relationship while regression analysis quantifies the association (rate of change in y per unit of x).

Kolmogorov-Smirnov test.

or

Mann-Whitney U test.

or

Mann-Whitney U test.

This unit looks at measures of difference. Measures of difference compare two group's mean (average) scores on a variable. Such tests are generally asking the question: is there a statistically significant difference in the mean scores between group 1 and group 2? Here again, the type of test used depends on the level of data and the relation of the groups.

Mann-Whitney U test.

If you have ordinal level data and the two groups are unrelated (independent of each other), you would use a

Kolmogorov-Smirnov test.

If you have ordinal level data and the samples are related, you would use a

ANOVA

An ______allows us to look at both the average amount of difference between groups (same as a t-test) as well as average amount of difference within each group. The additional advantage of an_____ is it can look at differences between more than two groups (t-test can only compare two group mean scores).

Prevalence

the proportion of the population that has a disease in question at a specific point in time

Incidence

the number of new cases identified during a particular time period.

Relative risk

the ratio of the incidence rates among exposed to unexposed individuals in a population.

2x2 tables

(discussed in an earlier unit) are used to assess treatments with dichotomous outcomes (yes or no; did or did not; etc.).

Experimental event rate (EER)

a measure of how often a particular event (response or outcome) occurs within the experimental group during a study.

Control event rate (CER)

a measure of how often a particular event (response or outcome) occurs within the control group during a study.

Absolute risk reduction (ARR):

also known as attributable risk reduction; the difference in the risk of the outcome between patients who have undergone one therapy and those who have undergone another. Again, using the 2x2 table as an example, the formula for determining ARR is: [C/(C+D)] - [A/(A+B)].

Relative risk reduction

an estimate of the percentage of baseline risk that is removed as a result of the therapy and is calculated as the ARR between the treatment and control groups divided by the absolute risk among patients in the control group (see ARR) and the formula is: [C/(C+D)] - [A/(A+B)]/[C/(C+C)]

Odds ratio

simply the odds of an event occurring. The formula is (A/C)/(B/D

Number Needed to Treat (NNT):

the number of patients who need to be treated to prevent one adverse event. It is the reciprocal of the ARR (1/ARR).