Module 2 Organizing, Visualizing, and Describing Data

Lesson 1: Categorizing, Organizing, Summarizing, and Visualizing Data

Identify and compare data types.
Click the card to flip 👆
1 / 31
Terms in this set (31)
- easily organized and presented as arrays of variables or dimensional data tables

- do not follow conventional organization approaches. EX: filings with regulators, posts in social media, other types of non-data financial news, management earnings calls, and analyst presentations. Words or images that are difficult to logically categorize, manipulate, or use in financial modeling.

Frequency Polygon

Cumulative Frequency Distribution ( tend to flatten out when returns are extremely negative or extremely positive.)

Bar chart ( like a histogram, but for categorical rather than numerical data)

Grouped bar chart (known as a clustered bar chart)
Stacked bar chart

Tree map ( consists of colored rectangles that represent categories or intervals of data, identify subgroups )

Word cloud ( represents textual data with the value of observations represented by the size of the words and the category or the sentiment represented by the color of the words, It's for unstructured data)

Bubble chart (a frequency polygon representing multi-dimensional data)

scatter plot ( common in a linear regression)

heat map ( assigns different colors indicating the magnitude of the variable
Image: Charts and Graphs:
Lesson 2 Measures of Central Tendency, Quantiles, and Dispersion

LOS: Calculate and interpret measures of central tendency.

LOS: Evaluate alternative definitions of mean to address an investment problem.

LOS: Calculate quantiles and interpret related visualizations.

LOS: Calculate and interpret measures of dispersion.

LOS: Calculate and interpret target downside deviation.
Know how to calculate these: - arithmetic mean, median, mode -weighted mean - geometric mean(or called the compound annual growth rate or CAG) - Harmonic mean - Relationship: XG=XH × XA. or XA > XG > XH - relationship between XG and XA( what's this formula, just in case you're confused) - Quantiles -Facts Trimmed mean: includes only some percentage of the middle of the values to avoid the outlier problem Winsorized mean: replaces outliers to the highest or lowest observation in some percentage of the middle data. A trimmed mean excludes extreme outliers from the calculation, thus interrupting the integrity of the data set. The harmonic mean would allow the analyst to use the existing data set without trimming or replacing the outlier while still giving less influence to the outlier.weighted mean (just a quick reminder, I added after my first review)geometric mean (or called the compound annual growth rate or CAG) (just a quick reminder, I added after my first review)geometric meanHarmonic Mean (just a quick reminder, I added after my first review)Harmonic MeanQuantilesLy=(n+1)* y/100LOS Calculate and interpret measures of dispersion.Mean Absolute DeviationTarget downside deviation (also known as target semi-deviation) L1R02LA-BP022_2106 *** (review this question)= [ (actual - B)^2/n-1]^1/2Know how to calculate these: - MAD (mean absolute deviation) - Variance and Standard deviation - relationship between the geometric and arithmetic means( same formula as prior) - Target downside deviation L1R02LA-BP022_2106 *** (review this question) -coefficient of variation= ∑ of ∣Actual - expected I/ n-1 = skip = XG = XA - Variance/2 = [ (actual - B)^2/n-1]^1/2 = Standard deviaiton/Avg or XACoefficient of Variation (CV)the standardized measure of the risk per unit of return; calculated as: standard deviation / expected returnLOS: Interpret skewness. LOS: Interpret kurtosis. LOS: Interpret correlation between two variables.M2L3Positive Skewmean > median > mode asymmetry with longer tail on right high outlierNegative SkewMean < median < mode low outlierKurtosisthe "peakedness" of the distribution We want meso, but platy is ok. We don't want lepto. - Leptokurtic gives more uncertainty, meaning that adjustments should be made to avoid having unexpectedly large gains and losses in a security or a portfolio. Thus, we want mesokurtic or platykurtic.Know how to calculate these: Covariance of X, Y Correlation coefficient of X, Y - -1 < CC < 1 - COV: gives us an idea of the relationship but doesn't show the strenth of the relationship. - CC: gives us the strength of the relationship. 1 being no diversification and - 1 being max diversification.Covariance formula= E(x-Xbar)(y-Ybar)/n-1Correlation Coefficient formula= (Cov of A and B) / [(STD of A) x (STD of B)