Descriptive Statistics
Descriptive statistics or summary statistics
Quantities that capture important features of a frequency distribution, like location and spread for numerical data, and proportion for categorical data.
proportion
measures the fraction of observations in a given category
location and spread
The location informs us about the average or typical individual in a frequency distribution - where observations are centered. The spread delineates how variable the measurements are from the center.
sample mean
The average of measurements in a sample, or the sum of all the observations divided by the number of observations.
standard deviation
A measure of the spread of a distribution from the mean, calculated as the square root of the variance.
For normal or bell shaped frequency distributions the standard deviation has a straightforward interpretation.
About two-thirds of the observations will fall within one standard deviation of the mean, and about 95% will fall within two standard deviations from the mean.
For many traits standard deviation and mean change together when comparing organisms of different sizes. We often care more about the relative variation among individuals - for example comparing the variability in mass of mice and elephants, or in comparisons of the variability in traits that do not share the same units like body mass and lifespan.
For these reasons it is sometimes useful to express the standard deviation relative to the mean.
median
the middle observation in a set of data
Quartiles are values that arrange data into quarters. The first quartile is the middle value of the measurements lying below the median. The second quartile is the median. The third quartile is the middle value of the measurements larger than the median.
The interquartile range (IQR) is the middle half of the data, from the first quartile to the third quartile.
box plot
A box plot displays the median and interquartile range along with other quantities of the frequency distribution.
Whiskers
In box plots, whiskers extend outward from the box at each end to show the smallest and largest "non-extreme" values in the data (non-extreme values fall within no more than 1.5 times the interquartile range). Extreme values are plotted as single points past the ends of the whiskers.
proportion
The proportion is the most important descriptive statistic for a categorical variable. It is calculated by dividing the number of observations in the category of interest by the total number of observations in all categories combined.
correlations
reliably observed associations between two or more events. For example, the relative width of rings in a tree trunk are a reliable indicator of annual rainfall based on the growth of the tree. Trees produce wider rings in rainy years than in dry years.
correlation studies do not confirm cause and effect by themselves
Scientists look for a large amount of evidence before they accept a hypothesis.
