Terms in this set (98)
Statistics
A set of procedures used by social scientists to organize, summarize and communicate information.
Data
Information represented by numbers, which can be the subject of statistical analysis..
Research Process
1. Asking the research question
2.Formulating the hypotheses
3. Collecting Data
4. Analyzing data
5. Evaluation the hypotheses
Research Process
A set of activities in which social scientists engage to answer questions, examine ideas, or test theories.
Empirical Research
Research based on evidence that can be verified by using our direct experience.
Theory
An elaborate explanation of the relationship between two or more observable attributes of individuals or groups.
Hypothesis
A tentative answer to a research problem.
Variable
A property of people or objects that takes on two or more values.
Units of analysis
The level of social life on which social scientists focus. Examples of different levers are individuals and groups.
Dependent Variable
The variable to be explaned.
Independent Variable
The variable expected to account for ( the "cause" of) the dependent variable.
Nominal measurement
Numbers or other symbols are assigned to a set of categories for the purpose of naming, labeling, or classifying the observations.
Ordinal measurement
Numbers are assigned to rank-ordered categories ranging from low to high.
Internal - ratio measurement
Measurements for all cases are expressed in the same units.
Cumulative Property of Levels of Measurement
Variable that can be measured at the interval-ratio level of measurement can also be measured at the ordinal and nominal levels.
Descrete variables
Have a minimum-sized unit of measurement, wich cannot be subdivided. Exp: children per family, wages cannot differ by less that 1 cent- the minimum sized unit.
Continuous variables
Do not have minimum-sized unit of measurement, their range of values can be subdivided into increasingly smaller fractional values. Exp. lenght.
Population
The total set of individuals, objects, groups, or events in which the researcher is interested.
Sample
A relatively small subset selected from a population.
Descriptive statistics
Procedure that help us organize and describe data collected from either a sample or a population.
Inferential Statistics
The logic and procedures concerned with making predictions or inferences about a population from observations and analyses of a sample.
Theories
Offer specific concrete predictions about the way observable attributes of people or groups would be in real life.
Frequency Distributions
A table reporting the number of observations falling into each category of the variable.
Proportion
Is a relative frequency obtained by dividing the frequency in each category by the total number of cases.
Percentage distribution
A table showing the percentage of observations falling into each category of the variable.
Cumulative frequency distribution
A distribution showing the frequency at or below each category( class interval or score) of the variable.
Cumulative percentage distribution
A distribution showing the percentage at or below each category ( class interval or score) of the variable.
Rate
A number obtained by dividing the number of actual occurances in a given time period by the number of possible occurances.
First step in the statistical analysis data
Constructing a frequency distribution .
Pie Chart
A graph showing the differences in frequencies or percentages among categories of a nominal or an ordinal variable. The categories are displayed as segment of a circle whose pieces add up to 100 percent of the total frequencies.
Bar graph
A graph showing the differences in frequencies or percentages among categories of a nominal or an ordinal variable. The categories are displayed as rectangles of equal width with their height proportional to the frequency or percentage of the category.
Histogram
A graph showing the differences in frequencies or percentages among categories of an interval-ratio variable. The categories are displayed as contiguous bars, with width proportional to the width of the category and height proportional to the frequency or percentage of that category.
Frequency polygon
A graph showing the differences in frequencies or percentages among categories of an interval-ratio variable. Points representing the frequencies of each category are placed above the midpoint of the category and are joined by a straight line.
Time seris chart
A graph displaying changes in a variable at different points in time. It shows time( measured in units such as years or months) on the horizontal axis and the frequencies ( percentages or rates) of another variable on the vertical axis.
Measures of central tendency
Numbers that describe what is average or typical of the distribution.
Mode
The category or score with the largest frequency ( or percentage) in the distribution . Is used to describe nominal variables.
Mode
The mode is always a category or score, not a frequency. Do not confuse the two.
Median
The score that divides the distribution into two equal parts so that half the cases are above it and half below it.
Median
Median is measure of central tendency that can be calculated for variable that are at least at an ordinal level of measurement.
Median
Represents the exact middle of a distribution, it is the score that divides the distribution into two equal parts so that half the cases are above it and half below it.
Percentile
A score below which a specific percentage of the distribution falls.
Mean
The arithmetic average obtained by adding up all the scores and dividing by the total number of scores.
Mean
The arithmetic mean is by far the best-know and most widely used average. what most people call the "average".
Mean
The mean is typically used to describe central tendency in interval-ratio variables such as income, age, and education. Can only be calculated for variables measured at the interval-ratio level.
Symmetrical distribution
The frequencies at the right and left tails of the distribution are indentical;each half of the distribution is the mirror image of the other.In a unimodal symmetrical distribution the mean, median, and mode are identical.
Skewed distribution
A distribution with a few extreme values on one side of the distribution.
Negatively skewed distribution
A distribution with a few extremely low values, the mean will be pulled in the direction of the lower scores.
Positively skewed distribution
A distribution with a few extremely high values; the mean will pulled toward the higher scores.
Center of Gravity
Because the mean (unlike the mode and the median) incorporates all the skores in the distribution, we can think of it as the center of gravity of the distribution.
Mean
Is typically used to descrive central tendency in interval-ratio variables, such as income, age, or education. We obtain the mean by summing all the scores and dividing by the total (N) number of scores.
Measures of variability
Numbers that describe diversity or variability in the distribution.
The index of qualitative variation (IQV)
Is used to measure variation in nominal variables. It is based on the ratio of the total number of differences in the distribution to the maximum number of possible differences within the same distribution IQV can vary from 0.00 to 1.00
The range measures
Variation in intervalratio variable and is the difference between the highest( maximum) and lowest ( minimum) scores in the distribution. To find the range, subtract the lowest from the highest score in a distribution.
The interquartile range (IQR)
Measures the widht of the middle 50 percent of the distribution. It is defined as the difference between the lower and upper quartiles( Q1 and Q3).
Box Plot
Is a graphical device that visually presents the range, the intrquartile range, the median, the lowest ( minimum), score, and the highest( maximum) score. The box plot provides us with a way to visually examine the center, the variation, and the shape of a distribution.
The variance
The variance and the standard deviation are two closely related measures of variation for inteval-ratio variables that increase or decrease based on how closely the scores cluster around the mean.
The variance
Is the average of the squared deviations from the center ( mean) of the distribuion; the standard deviation is the square root of the variance.
Bell Shaped Curve
Normal curve to represent distribution of data across a population . Mean, median, mode all have the same value.
Mode
The value tha occurs more frequently than any other value, if all are different, there is no mode.
Range
Difference between higherst and lowest values.
Standard deviation
Positive square root of the variance, where variance is the average of the square of the deviations of the measurements about their mean.
1Standard deviation
68% of all measurements in a normal distribution fall within _____ standard deviation on either side of mean.
2 Standard Deviation
95.5% of all measurements in a normal distribution fall within the ____ standard deviation on either side of meam.
3 Standard Deviation
99.7% of all measurements in a normal distribution fall within___ standard deviations, on either side of mean
Statistical quality control
Substitutes the inspection of only a sampling of products in a given batch for inspection of every piece. It is representative of the entire batch.
Universe
Sometimes called the populations, is the total number of cases the statistician is interested in.
Stratified sampling
Is a techique by which several small samples may be used to discover fact about a very large universe. Example: Using alhpabetical lists from each department, select every 100 th employee. The people or object included in the sample are chosen to represent various classes or groups within the universe.
Index number
Are numbers artitrarily chosen by the statistician to make comparisons easier.
Weighted average
In any set of data some items are likely to be more important than others. The weighted average is computed to take account of this fact.
Correlation
Correlation between two items is the extent to which one changes as the other changes. Correlation is expressed by coefficient number from -1 to 1.
Scatter diagrams
Are used to plot- and show visually - the relationship ( or correlation ) between two variables.
Standard deviation
Is the deviation from normal due to change ( or uncontrollable variations). In simple term, it's what you can expect to occur in almost every measurable activity in life.
Extrapolation
Is a method of predicting the future on the basis of the assumption that past trends will continue.
Primary data
Are data gathered fo a specific study.
Secondary data
Are data that have already been collected by some other person or group, probably for another purpose.generally they have been published.
Standard deviation
It provides a measure of how the values ( or numbers) are distributed around the normal.
Frequency
To extent to which a numbe or variable occurs
A frequency distribution
is an array of numbers which show how ofter each one appears. Arrange the numbers in order from lowest to hihest and then note how ofte each one occurs.
Extrapolation
The method of predicting the future based on the assumtion that present trends continue. There are thre different types- Straigh line, accelerating and decelerating.
Sample
A collection of some of the elements. example 200 HHSA employess.
Random Sample
A sample of the population where every member of the population has an equal chance of being selected. E.g. in an alphabetical list of HHSA employess, every 10th pesron is selected for the study.
Stratified Sample
Samples selected from each stratum( or layer) of a population in proportion to their frequency in each level. E.g. using the alphabetical lists for each HHSA section, select every 10th person for the study.
Ratio
A way of expressing the relationship between 2 numbers. If there 20 contract administrators and 5 manageres, then the ratio of CAs to managers is 20/5 or 4 to 1.
Weghted Average
A method of computing the average value of a set of data. The idea is that you give each value its weight accordking to its frequency in the data. the calculation is essentially the same as doing any average.
Statistical quality control
Is a fancy name for sampling.
Statistical Validity
The degree to which an observed result, such as a difference between two measurements, can be relied upon and not attributed to random error in sampling or in measurement. Statistical Validity is important to the reliability of test results, particularly in Multivariate Testing methods.Validity can be defined as the degree to which a test measures what it is supposed to measure.
Statistical Validity
has no single agreed definition but generally refers to the extent to which a concept, conclusion or measurement is well-founded and corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong. The validity of a measurement tool (for example, a test in education) is considered to be the degree to which the tool measures what it claims to measure.
Realiability
The reliability of a research instrument concerns the extent to which the instrument yields the same results on repeated trials. Although unreliability is always present to a certain extent, there will generally be a good deal of consistency in the results of a quality instrument gathered at different times. The tendency toward consistency found in repeated measurements is referred to as reliability (Carmines & Zeller, 1979).
Standard Deviation
This the deviation from normal ( or mean) due to change. It provides a measure of how the values ( or numbers) are distributed around the normal.
Parameter
Is a value, usually unknown, used to represent a certain population charecteristic. For a example, the population mean is a parameter that is often used to indicate the average value of a quantity.
Statistical Inference
Makes use of information from a sample to draw conclusions(inferences) about the population from which the sample was taken.
Normal or Bell Curve
This is a graph or plot wich shows the distribution of numbers. The numbers are plotted on the X-axis and their frequencies are plotted on the Y-axis. The great majority of the numbers will cluster around the middle with a few low and high values in each tail of the distribution. In a normal distribution, the mean, median and mode coincide. Bell cuve is synonymous with normal curve.
Survey
Another type of non -experimental, descriptive study, does not involve direct observation by a researcher.
Correlation
Does not necessarily imply cause and effect. Could have a third variable causing both effects.
Parcentiles
The median is a special case of a more general set of measures of location.called percentiles. A percentile is a score at or below which a specific percentage of the distribution falls.
Five measures of variability
1. The index of qualitative variation
2. The range
3. The interquaritle range
4. Standard deviation
5. Variance
Box plot
A graphic device that can visually present the range, the intequrtile range, the median , the lowest score, and the maximum score. Provides us with a way to visually examine the center, the variation and the shape of distributions of interval-ration variables.
Dichotomous Variable
A variable that has only two values.
