Upgrade to remove ads
Data Analysis Main
Terms in this set (98)
A set of procedures used by social scientists to organize, summarize and communicate information.
Information represented by numbers, which can be the subject of statistical analysis..
1. Asking the research question
2.Formulating the hypotheses
3. Collecting Data
4. Analyzing data
5. Evaluation the hypotheses
A set of activities in which social scientists engage to answer questions, examine ideas, or test theories.
Research based on evidence that can be verified by using our direct experience.
An elaborate explanation of the relationship between two or more observable attributes of individuals or groups.
A tentative answer to a research problem.
A property of people or objects that takes on two or more values.
Units of analysis
The level of social life on which social scientists focus. Examples of different levers are individuals and groups.
The variable to be explaned.
The variable expected to account for ( the "cause" of) the dependent variable.
Numbers or other symbols are assigned to a set of categories for the purpose of naming, labeling, or classifying the observations.
Numbers are assigned to rank-ordered categories ranging from low to high.
Internal - ratio measurement
Measurements for all cases are expressed in the same units.
Cumulative Property of Levels of Measurement
Variable that can be measured at the interval-ratio level of measurement can also be measured at the ordinal and nominal levels.
Have a minimum-sized unit of measurement, wich cannot be subdivided. Exp: children per family, wages cannot differ by less that 1 cent- the minimum sized unit.
Do not have minimum-sized unit of measurement, their range of values can be subdivided into increasingly smaller fractional values. Exp. lenght.
The total set of individuals, objects, groups, or events in which the researcher is interested.
A relatively small subset selected from a population.
Procedure that help us organize and describe data collected from either a sample or a population.
The logic and procedures concerned with making predictions or inferences about a population from observations and analyses of a sample.
Offer specific concrete predictions about the way observable attributes of people or groups would be in real life.
A table reporting the number of observations falling into each category of the variable.
Is a relative frequency obtained by dividing the frequency in each category by the total number of cases.
A table showing the percentage of observations falling into each category of the variable.
Cumulative frequency distribution
A distribution showing the frequency at or below each category( class interval or score) of the variable.
Cumulative percentage distribution
A distribution showing the percentage at or below each category ( class interval or score) of the variable.
A number obtained by dividing the number of actual occurances in a given time period by the number of possible occurances.
First step in the statistical analysis data
Constructing a frequency distribution .
A graph showing the differences in frequencies or percentages among categories of a nominal or an ordinal variable. The categories are displayed as segment of a circle whose pieces add up to 100 percent of the total frequencies.
A graph showing the differences in frequencies or percentages among categories of a nominal or an ordinal variable. The categories are displayed as rectangles of equal width with their height proportional to the frequency or percentage of the category.
A graph showing the differences in frequencies or percentages among categories of an interval-ratio variable. The categories are displayed as contiguous bars, with width proportional to the width of the category and height proportional to the frequency or percentage of that category.
A graph showing the differences in frequencies or percentages among categories of an interval-ratio variable. Points representing the frequencies of each category are placed above the midpoint of the category and are joined by a straight line.
Time seris chart
A graph displaying changes in a variable at different points in time. It shows time( measured in units such as years or months) on the horizontal axis and the frequencies ( percentages or rates) of another variable on the vertical axis.
Measures of central tendency
Numbers that describe what is average or typical of the distribution.
The category or score with the largest frequency ( or percentage) in the distribution . Is used to describe nominal variables.
The mode is always a category or score, not a frequency. Do not confuse the two.
The score that divides the distribution into two equal parts so that half the cases are above it and half below it.
Median is measure of central tendency that can be calculated for variable that are at least at an ordinal level of measurement.
Represents the exact middle of a distribution, it is the score that divides the distribution into two equal parts so that half the cases are above it and half below it.
A score below which a specific percentage of the distribution falls.
The arithmetic average obtained by adding up all the scores and dividing by the total number of scores.
The arithmetic mean is by far the best-know and most widely used average. what most people call the "average".
The mean is typically used to describe central tendency in interval-ratio variables such as income, age, and education. Can only be calculated for variables measured at the interval-ratio level.
The frequencies at the right and left tails of the distribution are indentical;each half of the distribution is the mirror image of the other.In a unimodal symmetrical distribution the mean, median, and mode are identical.
A distribution with a few extreme values on one side of the distribution.
Negatively skewed distribution
A distribution with a few extremely low values, the mean will be pulled in the direction of the lower scores.
Positively skewed distribution
A distribution with a few extremely high values; the mean will pulled toward the higher scores.
Center of Gravity
Because the mean (unlike the mode and the median) incorporates all the skores in the distribution, we can think of it as the center of gravity of the distribution.
Is typically used to descrive central tendency in interval-ratio variables, such as income, age, or education. We obtain the mean by summing all the scores and dividing by the total (N) number of scores.
Measures of variability
Numbers that describe diversity or variability in the distribution.
The index of qualitative variation (IQV)
Is used to measure variation in nominal variables. It is based on the ratio of the total number of differences in the distribution to the maximum number of possible differences within the same distribution IQV can vary from 0.00 to 1.00
The range measures
Variation in intervalratio variable and is the difference between the highest( maximum) and lowest ( minimum) scores in the distribution. To find the range, subtract the lowest from the highest score in a distribution.
The interquartile range (IQR)
Measures the widht of the middle 50 percent of the distribution. It is defined as the difference between the lower and upper quartiles( Q1 and Q3).
Is a graphical device that visually presents the range, the intrquartile range, the median, the lowest ( minimum), score, and the highest( maximum) score. The box plot provides us with a way to visually examine the center, the variation, and the shape of a distribution.
The variance and the standard deviation are two closely related measures of variation for inteval-ratio variables that increase or decrease based on how closely the scores cluster around the mean.
Is the average of the squared deviations from the center ( mean) of the distribuion; the standard deviation is the square root of the variance.
Bell Shaped Curve
Normal curve to represent distribution of data across a population . Mean, median, mode all have the same value.
The value tha occurs more frequently than any other value, if all are different, there is no mode.
Difference between higherst and lowest values.
Positive square root of the variance, where variance is the average of the square of the deviations of the measurements about their mean.
68% of all measurements in a normal distribution fall within _____ standard deviation on either side of mean.
2 Standard Deviation
95.5% of all measurements in a normal distribution fall within the ____ standard deviation on either side of meam.
3 Standard Deviation
99.7% of all measurements in a normal distribution fall within___ standard deviations, on either side of mean
Statistical quality control
Substitutes the inspection of only a sampling of products in a given batch for inspection of every piece. It is representative of the entire batch.
Sometimes called the populations, is the total number of cases the statistician is interested in.
Is a techique by which several small samples may be used to discover fact about a very large universe. Example: Using alhpabetical lists from each department, select every 100 th employee. The people or object included in the sample are chosen to represent various classes or groups within the universe.
Are numbers artitrarily chosen by the statistician to make comparisons easier.
In any set of data some items are likely to be more important than others. The weighted average is computed to take account of this fact.
Correlation between two items is the extent to which one changes as the other changes. Correlation is expressed by coefficient number from -1 to 1.
Are used to plot- and show visually - the relationship ( or correlation ) between two variables.
Is the deviation from normal due to change ( or uncontrollable variations). In simple term, it's what you can expect to occur in almost every measurable activity in life.
Is a method of predicting the future on the basis of the assumption that past trends will continue.
Are data gathered fo a specific study.
Are data that have already been collected by some other person or group, probably for another purpose.generally they have been published.
It provides a measure of how the values ( or numbers) are distributed around the normal.
To extent to which a numbe or variable occurs
A frequency distribution
is an array of numbers which show how ofter each one appears. Arrange the numbers in order from lowest to hihest and then note how ofte each one occurs.
The method of predicting the future based on the assumtion that present trends continue. There are thre different types- Straigh line, accelerating and decelerating.
A collection of some of the elements. example 200 HHSA employess.
A sample of the population where every member of the population has an equal chance of being selected. E.g. in an alphabetical list of HHSA employess, every 10th pesron is selected for the study.
Samples selected from each stratum( or layer) of a population in proportion to their frequency in each level. E.g. using the alphabetical lists for each HHSA section, select every 10th person for the study.
A way of expressing the relationship between 2 numbers. If there 20 contract administrators and 5 manageres, then the ratio of CAs to managers is 20/5 or 4 to 1.
A method of computing the average value of a set of data. The idea is that you give each value its weight accordking to its frequency in the data. the calculation is essentially the same as doing any average.
Statistical quality control
Is a fancy name for sampling.
The degree to which an observed result, such as a difference between two measurements, can be relied upon and not attributed to random error in sampling or in measurement. Statistical Validity is important to the reliability of test results, particularly in Multivariate Testing methods.Validity can be defined as the degree to which a test measures what it is supposed to measure.
has no single agreed definition but generally refers to the extent to which a concept, conclusion or measurement is well-founded and corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong. The validity of a measurement tool (for example, a test in education) is considered to be the degree to which the tool measures what it claims to measure.
The reliability of a research instrument concerns the extent to which the instrument yields the same results on repeated trials. Although unreliability is always present to a certain extent, there will generally be a good deal of consistency in the results of a quality instrument gathered at different times. The tendency toward consistency found in repeated measurements is referred to as reliability (Carmines & Zeller, 1979).
This the deviation from normal ( or mean) due to change. It provides a measure of how the values ( or numbers) are distributed around the normal.
Is a value, usually unknown, used to represent a certain population charecteristic. For a example, the population mean is a parameter that is often used to indicate the average value of a quantity.
Makes use of information from a sample to draw conclusions(inferences) about the population from which the sample was taken.
Normal or Bell Curve
This is a graph or plot wich shows the distribution of numbers. The numbers are plotted on the X-axis and their frequencies are plotted on the Y-axis. The great majority of the numbers will cluster around the middle with a few low and high values in each tail of the distribution. In a normal distribution, the mean, median and mode coincide. Bell cuve is synonymous with normal curve.
Another type of non -experimental, descriptive study, does not involve direct observation by a researcher.
Does not necessarily imply cause and effect. Could have a third variable causing both effects.
The median is a special case of a more general set of measures of location.called percentiles. A percentile is a score at or below which a specific percentage of the distribution falls.
Five measures of variability
1. The index of qualitative variation
2. The range
3. The interquaritle range
4. Standard deviation
A graphic device that can visually present the range, the intequrtile range, the median , the lowest score, and the maximum score. Provides us with a way to visually examine the center, the variation and the shape of distributions of interval-ration variables.
A variable that has only two values.
THIS SET IS OFTEN IN FOLDERS WITH...
Statistics Chapters 1-3
Social Stats final
YOU MIGHT ALSO LIKE...
SOCL 211 EXAM 1
Comm 87 UCSB Metzger Midterm 1
OTHER SETS BY THIS CREATOR
Operation Plan Glossary
OTHER QUIZLET SETS
Chapter 1. Variables and Level of Measurements
AP Psychology Unit 2 - Methods