Upgrade to remove ads
Exploratory Analysis R
Terms in this set (23)
functions to see the list of factors in a df
Geom to make a bar chart
compare 2 columns in a dataframe
table(1st column, 2nd column)
compare 2 columns proportions in a df
what do you pass to prop.table to condition the proportion on a row OR a column?
where 1 or 2 is the second proportion
what is unique about the aes when setting up geom bar?
you just need an x, not a y variabe
function to create multiple charts in ggplot
difference between geom bar and geom dotplot?
geom_dotplot breaks the individual values as dots in a bar
difference between geom histogram and geom bar
geom bar takes counts where histogram breaks down counts by area.
geom for a density chart and what is it?
geom_density makes a line chart similar to a bar chart
how would you pivot or 'transpose' a chart?
add the function coord_flip
geom to make a boxplot
in a density chart how would you set the bin width?
geom_density(bw = 5)
what attribute would you like to change the labels of a facet grid
4 types of bell curves in standard deviations
unimodal, bimodal, multimodal, uniform
Two major types of skews and the rule of them to remember
left skewed and right skewed, its where the tail is in how it gets its name.
If you are mapping multiple variables in geom density what do you have to do to ensure you can see them all
see an alpha layer
What does log do in charting and what is its main flaw
log spreads the data out in a chart to get a better understanding of the data. It has to have a value and can't be 0 so you need to add a tiny base number when you do it, e.g. aes(x=log(name+.001))
A factor has some meaningless or small count levels. What do you do the eliminate those levels?
first, filter out the rows containing those levels, then use droplevels() function to eliminate the levels. They still exist as levels even though there isn't any values.
When breaking down a chart into several smaller ones what is the function and what does it need in the attribute
facet_wrap is the function. It needs the field to break it down by as the argument with the tilde which is basically saying broken down by facet_wrap(~name). Note that these are x y coordinates. So you could have the tilde between two variabes
function that is similar to SQLs distinct in dplyr
function to add a title in ggplot
function to get the intraquartile range
YOU MIGHT ALSO LIKE...
Com Sci. Exam 2
CGS 2518 Chapter 7
Excel Module 3 Study guide
OTHER SETS BY THIS CREATOR
Supervised Learning in R
Correlation and Regression