sample selection and sample size MKTG 470

1 / 93
why is sampling super important?
Click the card to flip 👆
Terms in this set (93)
1. almost never possible to do research with the entire population → you need a subset of the population (sample)
2. poor sample will result in poor results, no matter how large your sample is. If you took a bad sample (no representativeness) your results will be a poor reflection of the actual world, no matter how many subjects you included in your sample
sampling errorany error in research that occurs because a sample is used2 causes of sampling error(1) (FACT THERE IS SAMPLING)simply because you sample, you'll always have sampling error. Because, in order to sample, you need a sample frame (list of the people/units in the population), but since this list is inherently flawed, you'll consequently also have a sample that is flawed 2. (SIZE OF THE SAMPLE)also influences sample error, because the smaller your sample size will be, the larger your sample error will benonsampling errorsall other types of errors (like data collection errors, data analysis errors, data interpretation errors, etc) that have nothing to do with samplingwhy take a sample:1.Practical: cost & population size 2.Inability to analyze huge amounts of data generated by census 3.Small samples can be representative too(when drawn well can deliver quite precise and accurate results)probability samplingwhen members of the population have a known RANDOM chance (probability theory) of being included in the samplenon-probability samplingthe chances (probability) of being included in the sample are unknown (not random)probability sampling exsSimple random sampling Systematic sampling Cluster sampling Stratified samplingnonprobability sampling- Convenience sampling - Judgment sampling - Referral sampling - Quota samplingsimple random samplingprobability of being selected is EQUAL for all members of the population prob of selection= sample size/popsizeexs of simple random samplingrandom device method random numbers methodrandom device method (blind draw)all units have equal chance of being chosen ex:flipping a coin, roulette in a casino, lotteryrandom numbers methodif there are too many in the population, a device can not be used, so then a computer draws random numbers from a huge population, at randomadvantage of simple random samplingevery unit has a "known and equal" chance of selection(theoretical) if you don't know the exact population you can not calculate the exact chance/probability, but theoretically you could.disadvantages of simple random sampling1.Complete accounting of population needed: each population member has to be in! if incomplete or inaccurate, there is of course sample frame error 2.Time consuming to provide unique designations to every population member when the population is large(it is a lot of work to manually provide unique designations for each population member )random digit dialing (RDD)list of telephone/cell phone numbers generated ( in telephone surveys)example of SRSrandom digit dialingsystematic sample-way to select a random sample from a directory/list with a skip interval skip interval= pop list size/sample size -used to be more poplar than SRS because it requires less effort -become less popular as computer technology developed -ex:every 3rd person, every 20th person, etc. always systematic -"random" → refers to starting point that is random (random point on the list picked as a starting point from where the systematic skipping starts).advantages of systematic samplingApproximate known & equal chance of selection, because random starting point Efficient, less expensive & faster than simple random samplingdisadvantages of systematic samplingComplete listing of the population needed (always sample frame error)cluster samplingPopulation divided into subgroups (clusters) Each cluster is representative of the population exs: football clubs across the US (all clusters): pick one as the cluster that serves as a representative sample for the whole population of all football clubsarea samplingform of cluster sampling - the geographic area is divided into clusters (cities, neighborhoods, etc.)one-step area sample1 cluster is chosen at random to represent the entire population. Then a census of that cluster is taken! (all of the members within that cluster)two-step area sampleseveral clusters (1) randomly select a sample of clusters (2) randomly select units within chosen clusters (more expensive and time consuming)advantages of cluster samplingEconomic efficiency, faster, cheaper than SRSdisadvantages of cluster samplingCluster specification error: the more different the clusters, the less precise the sample results (neighborhoods not equally rich, differences between streets in neighborhoods → especially severe problem in one-step area clustering)stratified sampling- It separates the population into subgroups, and then samples all of these subgroups. -use when you have skewed populations (populations that contain unique subgroupings: groups with characteristics that are not distributed symmetrically across the normal curve) ex:drug use in the US population: probably different for different subgroups (college students, 50+ housewives, babies, straight edgers, punks, etc.)2 types of stratified samplingproportionate stratified sample and disproportionate stratified sampleproportionate stratified samplingsample from each stratum is proportionate to the size of the stratum in the population ex:If Freshmen = 20% of the university, then a sample of 400 should contain 80 Freshmendisproportionate samplingnot proportionate because of variance differences ex:Seniors would be sampled LESS than their proportionate share of the population (because they have lower variance, more internal agreement) and freshmen would be sampled more (because they have higher variance)Strata with high variance in stratified samplingneed to be oversampled (more sample size)Strata with low variance in stratified samplingcan be undersampled (less sample size)advantages of straitified samplingMore accurate overall sample of skewed population: the less variance in a group, the less sample size it takes to produce a precise answerdisadvantage of stratified samplingMore complex sampling plan requiring different sample size for each stratumnonprobability sampling-chances (probability) of selecting members from the population into the sample are unknown -selection not based on fairness/equal chance -may not be representative, but still used very often (faster & cheaper)convenience samplingconvenient, fast "Take whatever you can get"bias of convenience samplingexclusion of infrequent/non-usersbias of judgement samplingsubjectivity (because certain members will have smaller chance of selection than others)judgement sampling-'Purposive sampling', 'exemplar sampling' -Requires judgment/"educated guess" -Subjective, because certain members will have smaller chance of selection than others. -Often used in qualitative research ex:focus groups participants are chosen based on an educated guessreferral sampling-'Snowball sampling' -respondents provide names of additional respondents or spread survey further ex:A researcher is studying environmental engineers but can only find five. She asks these engineers if they know any more. They give her several further referrals, who in turn provide additional contacts.bias of referral samplingMembers of the population who are less known, disliked, or whose opinions conflict with the respondent have a low probability of being selectedquota sampling-samples that use specific quota of certain types of individuals to be interviewed -often used to ensure that convenience samples will have desired proportion of different respondent classes -Set by research objectives (screening criteria) -Often used as a way to adjust/improve a convenience sample ex:often used by Mall Intercept firms to balance 50% Male - 50% FemaleSampling size matters b/ a lot of costs if you calculate the correct sample size (overdoing it is a waste)sample size decision is usually...a compromise between what is theoretically perfect and what is practically feasible.sample size does NOTaffect representativeness ('larger' does not mean 'more representative')sample accuracycloseness of findings to the true population value (i.e., similar results with repeated measurement? how close to the truth?)sample size is related to..accuracy ('larger' means more accuracy or less sample error)what determines whether your sample will be representative?sampling methodwhat will help boost the accuracysampling size (larger sample size=more accurate) ex:a probability sample of 100 people is more accurate than a probability sample of 10.what are the crucial steps to create optimal sample size:Sample size Accuracy Variability Confidence intervalcensusEVERY member of the population → no sample frame needed → no sample error → maximum accuracy -has the highest possible accuracy -only perfectly accurate sample is a censusnonspampling errorll other errors (question bias, wrong problem definition, mistakes in analyses, etc.)sampling errorconsequence of sampling, and is influenced by sample size (error larger if sample is smaller)a probability sample will always..-have some inaccuracy: always some "sample error" -No random sample's value is a perfectly accurate reflection of the population's valuethe larger a probability sample is..-the more accurate (aka the less sample error) -Relation between sample size & accuracy Asymptotic curve that will never achieve 0% errorlaw of diminishing returns-once a sample is larger than 1000, large gains in accuracy are not realized even with large increases in sample size (little additional accuracy is possible).sample error can calculate..with a simple formula 1.96 = a constant p x q = variability (%) (p + q = 100, so q = 100% - p) n = sample size sample error + sampling error %=1/96X sqrt p*q - ----- nvariability (p)the amount of disagreement (dissimilarity vs. similarity; variance) in respondents' answers to a specific question refers to how similar or dissimilar responses are to a given questionHigh consensus within grouplow variabilityLow consensus within grouphigh variabilitythe more variability in the population being studiedthe higher the sample size needed to achieve a stated level of accuracyTHE HIGHEST/MAXIMUM POSSIBLE VARIABILITY50%-50% (50 * 50 = 2500)p + q together always....make 100% (q = 100% - p)a topic of high variability:a topic that stirs up controversy, a topic that has a "pro" and "con" camp. For example: people in favor versus against legalizing marijuanavery low variability (most lopsided combination)is let's say 99 x 1 = 99variability in formulap*q in the numerator of the fractionCentral Limit Theoremallows us to assume a normal (bell-shaped) curve of the results of repeated surveysconfidence intervalrange whose endpoints define a certain % of the responses to a question (based upon the normal distribution curve)You can take any finding in the survey...replicate the survey with a probability sample of the same size, and be very likely to find the same result with a +- % range of the original sample's findingsHow to interpret the idea of confidence intervals?It's based on a theoretical notion (i.e., the central limits theorem): Take many, many, many samples, plot the p of each sample (sample percentages), and then you'll know that 95% of them will fall inside those boundaries, inside that confidence interval.95% confidence level means? (to 1.96 standard errors)-95% of samples drawn from a population will fall within the range defined by +- 1.96 × sample error -Based on this, we can say that we are 95% confident that the true population value falls within this range95% CI meansthe boundaries of the C.I. are set by + or - 1.96 x sample errorwhat influences sample error?Sample size NOT population sizeIf you use a probability sample that is drawn wellthe population size does not matter (except in very small populations) -population size only matters in very small populations. A small population is one in which the sample size exceeds 5% of the population → in those cases the sample size formula needs some adjustment (a finite multiplier), an adjustment factoriIn almost(*) all cases, the sample error of a probability sampleis independent of the size of the populationA probability sample can be a very tiny % of the population size-still be very accurate (have little sample error) -The sample's relative size compared to the population does not influence the sampling errorSize of a probability sample depends between desired accuracy and cost of data collectionThe "Confidence Interval" method of determining sample size (proper same size)1.The variability believed to be in the population (more variability → larger sample needed) 2.The acceptable sample error (smaller sample error → larger sample needed) 3.The level of confidence interval required (99% confidence? 95% confidence?)sample size formulan=z squared (pq) divided by e squared n=sample size z=standard error associated with thechosent level of confidence p=estimated percent in the population q=100-p e=acceptable sample error expressed as percentHow to estimate variability (p times q) in the population?-Expect the worst case (p = 50; q = 50) - Estimate variability (educated guess) - Previous studies? Secondary literature? - Conduct a pilot study? Exploratory study?How to determine the amount of acceptable sample error?-Researchers should work with managers to make this decision. How much error is the manager willing to tolerate? → Accuracy of the results: truthful reflection of the actual population's results?How to decide on the level of confidence to use?Researchers typically use 95% or 99%How to balance sample size with cost of data collection?-Researchers should work with managers to take cost into consideration in this decision e.g., if you give $ 20 incentive per person, then with n = 1,000 → $ 20,000 budget?Other methods of sample size determination1)Arbitrary "percent rule of thumb" sample size e.g., "our sample should be at least 5% of the population" 2) Conventional sample size specification Follows conventions or beliefs, adopting past sample sizes (e.g., always 1200) 3) Cost basis of sample size specification "all you can afford"