sampling biases: measurement errorsrecording the wrong responsewhen to use graphs: bar graphused with qualitative data, bars DO NOT touchwhen to use graphs: histogramused with quantitative data, bars do touchwhen to use graphs: pie chartused to show percentage/part of a wholewhen to use graphs: scatterplotused with TWO quantitative variables and its purpose is to show the relationship between the variablesdata sourcing methods: point of saleinformation collected at the store, receiptsdata sourcing methods: clickstreamtracking online navigation, Amazon/Googledata sourcing methods: social mediainformation shared on social mediadata sourcing methods: sensorsdevice that tracks environmentcontent analysis (when to use it and purpose)researching and analyzing QUALITATIVE datar (correlation coefficient)/identifying relationships between quantitative variablesstrength is based on how close points are to linear form on a scatterplot
stronger relationship is closer to one, weaker relationship is closest to 0
ex: -.021 is weak, .951 is stronghow to find a relative frequencyfrequency is proportion or part of a whole
example: 13 out of 40 participants are males
13/40 = .325, so relative frequency of male participants is .325data visualization vs. visual analyticsdata visualization- data being displayed
visual analytics- adding prediction to thisparts of a graph
*be able to identify parts and what would make the graph misleadingtitle, axis labels, scale, key, data valuespurpose of a graphto tell an accurate storyparts of a consent formpurpose, confidential/anonymous, expectations of participant, voluntary/involuntary, risk, contact for questionsethics principles
*given a scenario, be able to identify scenario usedownership, consent, privacy, openness, currencyreport vs dashboard
*given a scenario, know when to use eachreport- static, historic data, snapshot, like a Powerpoint
dashboard- dynamic, interactive, up-to-date, real-time updates, special software like Tableaugiven a graph, box plot, scenario or mean/median values, be able to recognize if approximately normal, skewed left, or skewed rightskewed right when: mean is larger than median, median is larger than mode
skewed left when: mean is smaller than median, median is smaller than mode
approximately normal when graph is symmetrical
link to pictures: https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.statology.org%2Fleft-skewed-vs-right-skewed%2F&psig=AOvVaw2s7RM-jhpUSyTt0yxfrueg&ust=1619552836893000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCIDrstHWnPACFQAAAAAdAAAAABADbe able to find mode, Q1, Q2, Q3, and IQR of datasetQ1- 25th percentile
Q2- 50th percentile/median
Q3- 75th percentile
IQR= Q3-Q1
mode- appears most frequently in data setdescriptives and excel commands to find themmean =average
median =MED
mode =mode
standard deviation of population =STDEV.P
standard deviation of sample =STDEV
variance =VAR.Puse population mean and standard deviation to find a z-scoreZ= (x-mean)/standard deviation
(x is your value)
z-score shows us the number of standard deviations that a value is from the meanfind dataset value that lies 1,2, or 3 standard deviations above or below the mean1 SD above: mean + standard deviation
1 SD below: mean- standard deviation
2 SD above: mean + (standard deviation x 2)
2 SD below: mean - (standard deviation x 2)
3 SD above: mean + (standard deviation x 3)
3 SD below: mean - (standard deviation x 3)know the 68, 95, 99 rule for finding the percent of values between -3 and 3 standard deviations68% of data lies within 1 standard deviations above/below the mean
95% is within 2 standard deviations
99% is within 3 standard deviationsknow when to use mean or median
and range or IQRmedian and IQR are not affected by outliers so use this when data is skewedknow when to choose to use percent vs. percentilepercent is part of a whole, percentile is a positioning of where something is in the data set
percentile: the percentage of the score in the distribution that are equal to or less than the ranking
ex: 80th percentile, 80% of values lay below this point, 20% of values lay above this pointgiven dataset, find percentile value (find datapoint at a given percentile)1. put dataset in increasing numerical order
2. calculate the index(i). index is the position of the point. Index=(p/100) x n.
p= percentile, n=sample size
if Index is not an integer, you need to round UP
if index is an integer, find average of value at index and index+1
LECTURE ON MARCH 16--20 MINUTES INgiven box plots, be able to describe dataset or compare box plotsparts of box plot Q1, Q2, Q3, min, max (anything beyond min and max is outlier)
compare skewness
size of our box is IQR. small box=small standard deviation, large box=larger standard deviationbe able to find outliers OR find that there are NO outliersany number greater than Q3 + (1.5 x IQR) is an outlier
any number less than Q1 - (1.5 x IQR) is an outlierfind sample proportionsample proportion= count of successes/size of population
count of successes refers to what you're looking for.
ex: of 200 people that entered a store, 78 of them purchased something. 78/200= .39
sample proportion is .39Find probabilities less than, greater than, or between zscores using normal distributionuse z-score with normal distribution table. number listed in normal distribution table shows the probability that a z is LESS THAN the z you are using.
do 1-the number in the table to find probability z is GREATER THAN the z you are using.
for between, you subtract the probability of lower z-score from probability of greater z-score
LECTURE FROM APRIL 6Find probabilities if given the raw score (x) using normal distributionmust convert x to z-score to find probability.
z= (x-mean)/standard deviationFind 90%, 95%, and 99% confidence intervals for proportions.
*formula will be given, but know how to use ituse the formula to find upper and lower limits
parts of the equation you should understand so you can plug it in:
p-hat : sample proportion
z a/2 : standardized factor based on confidence interval (given)
n : sample size
APRIL 15 at 10 minutes (can't write formula on quizlet)Find the margin of error
*know how to find using confidence interval formulamargin error is the right of the equation-- everything to the right of the +or-Identify the interpretation of a confidence intervalconfidence interval means probability that the estimated interval will actually contain the population parameter