hello quizlet
Home
Subjects
Expert solutions
Create
Study sets, textbooks, questions
Log in
Sign up
Upgrade to remove ads
Only $35.99/year
Exam 3
Flashcards
Learn
Test
Match
Flashcards
Learn
Test
Match
Terms in this set (36)
What is Automated Machine Learning (AutoML)?
- Process of automating the time-consuming, iterative tasks of machine learning model development
- Two main types:
1) supervised model (see separate card)
2) unsupervised model (see separate card)
Which companies are actively using AutoML?
Facebook:
Determines which ads to display
AirBnB:
Predicts customer lifetime value (CLV) for hosts and guests
What are key steps in the Automated Machine Learning Process?
Data Preparation:
May include handling: missing data, outliers, variable selection, data transformation, data standardization in order to maintain a common format
Model Building:
- Many models are built automatically after the analyst specifies dependent variable
- Purpose of a model is to extract insights from data
- AutoML uses pre-existing modeling approaches to make data science accessible to beginners through professionals
Creating Ensemble Models:
- Continuous target variables -> one method is to take the average of predictions from multiple models
- Categorical target variables -> most common category of "majority" rule can be used
- More advanced technique -> involves using weighted average (data with higher quality would be given more weight b/c of it's importance)
Advanced Ensemble Models:
- Bagging (see separate card)
- Boosting (see separate card)
Model Recommendation:
- Multiple predictive models are examined and the model with the most accurate predictions is recommended
- Accuracy is determined by how well a model identifies relationships and patterns in a dataset and uses this knowledge to predict outcomes
- Higher levels of accuracy are measured based on better predictions of observations
- Most accurate prediction model(s) is then used to make better decisions
Automated Machine Learning (AutoML) extra facts:
- Forty percent of companies report already using machine learning to improve sales/marketing performance
- The adoption rate for AutoML is expected to increase substantially
Bagging
- short for "Bootstrap Aggregating", involves two main steps
-
Step 1:
draws multiple random small samples from original data (bootstrap sampling)
-
Step 2:
is to execute a model on each sample and then combine results
- For continuous outcomes, combined results are based on taking the average of all data set samples
- For categorical variables, majority voting technique is used
Boosting
- Reducing misclassification in the model
- Observing the data points that are not sufficiently drawn in the previous model and then oversampling missing data points in the next model
- During the first step, the model is applied to a sample of the original data
- A new sample is drawn that is more likely to select data points that were misclassified in the first model
- Next, second model is applied to the new sample
- Steps are repeated multiple times until the best predictive model is generated
Ensemble model
- Combines different algorithms, blending information from more than one model into a single "super model"
- Ensemble model usually generates the best predictive results (how different variables have contributed to an outcome can be difficult)
Supervised model
- The target (dependent) variable of interest is known and available in a dataset
- Uses labeled datasets to train algorithms and predict outcomes
Unsupervised model
- Has no target (dependent) variable
- Works on their own to find inherent patterns of unlabeled data
What is cluster analysis?
- Refers to segmenting a market using shared characteristics
- Marketers use these insights to: improve marketing strategies, better allocate resources, gauge new product development, and select the most receptive test markets
How is cluster analysis used in practice?
- American Express segmented their market into loyal customers, tailored products for them, and increased their market share
- Auto companies use segmentation to target specific consumers (Mercedes-Benz's CLA model)
- Best Buy segmented customers into five segments and focused on loyal customers to gain a 10 percent growth in sales
How does a cluster analysis function?
- Goal is to model the underlying distribution of characteristics in data to separate a dataset into homogenous subgroups
- Similarities between observations are calculated using several measures of the distance between observations: within cluster (intra-cluster) homogeneity or similarities, dissimilar characteristics (heterogeneity) between groups (inter-cluster)
What are the types of cluster analysis?
K-Means clustering:
- Uses the centroid (average data point) in the cluster and minimizes the distance to individual observations
- The number of clusters (k) is initially specified by the analyst
- To obtain the most accurate results, begin with data that has been standardized using z-scores or min-max
- K-means clustering can only be applied to numerical data
Silhouette score
:
- Silhouette score is calculated after the cluster algorithm has assigned each observation to a cluster
- Analyst determines the average distance between each observation in the cluster and the cluster centroid
- Average distance of an individual observation to the centroid of its assigned cluster is then compared to the average distance of that observation to the centroid of the next nearest cluster
- -1 < silhouette score < +1
Hierarchical clustering:
Four types:
Agglomerative clustering, Divisive clustering, Linkage criterion, Ward's method (see separate cards)
Agglomerative clustering
- Each observation is initially considered to be a separate cluster
- Linkage method assigns each observation to a cluster that has common characteristics
Average linkage (linkage criterion)
- the group average of observations from one cluster to all observations from another cluster
Complete linkage (linkage criterion)
- the maximum distance between observations in two different clusters
Divisive clustering
- All records are initially assigned to a single cluster
- In a step-by-step process, the most dissimilar observations are separated
Ward's method:
merges the two clusters as a complete link but instead of looking for the diameter of the result, it calculates aggregate deviation of the resulting cluster
Euclidian method
- measures the true straight-line distance between two points
Hierarchical clustering
Two types:
1) Agglomerative clustering
2) Divisive clustering
Jaccard's approach
- measures how dissimilar two observations are
Manhattan method
- the distance between two points is not straight, but a path with right turns
Matching method
- measures the values that reflect the minimum differences between two points
Single linkage (linkage criterion)
- the shortest distance from an object in a cluster to an object from another cluster
What is market basket analysis?
- uses purchase data to identify associations between products or combinations of products/services that frequently occur together
- Ex: if a customer purchases milk and soda, they will probably purchase bread, beer, and salty snacks
How does a market basket analysis identify product relationships?
- If
association
rule: IF {item A} THEN {item B}
- "If soda (antecedent) is purchased, then milk (consequent) will also likely be purchased."
- "If wine is purchased, then cheese will also be purchased."
- Apriori algorithm (see separate card)
Apriori algorithm
- identifies combinations of items in datasets that are associated
- associations are identified based on the frequency in which the products occur together in the basket
Association rule
- helps define relationships in a transaction using if-then statements
- if {item a} THEN {item b}
Collaborative filtering (special topic in market basket analysis)
- examines users' preferences for a set of items
- the goal of collaborative filtering is to recommend new products and services to a customer who has not purchased them before
- Two main types of collaborative filtering:
1) Item to item filtering (item-based): "Customers who liked this item also liked..."
2) User to item filtering (user-based): "Customers who are similar to you also liked..."
- While collaborative filtering is very useful, there are limitations, such as: cold start and popularity bias
- To measure similarities, a distance measure needs to be selected: Calculating the average Pearson Correlation: if value is close to 1, many similarities, if close to -1, not many similarities, or Cosine Similarity
Confidence
- measures probability of the consequent actually occurring given that the antecedent occurs
Cold start
- a limitation of collaborative filtering
- a new item cannot be recommended until enough users have rated it
Differential market basket analysis (special topic in market basket analysis)
- uses market basket analysis techniques across stores, locations, seasons, days of the week, etc.
- An analyst would run a market basket analysis in one location, then compare it to other locations to see if the results differ
Lift
- enables us to evaluate the strength of the association
Popularity bias
- a limitation of collaborative filtering
- items w/ a lot of recommendations will be assigned a higher weight (popular items are recommended more often than items that do not have a lot of recommendations)
Support
- measures the frequency of the specific association rule
What questions might arise in AutoML?
- How did the model arrive at a particular conclusion?
- How was the data collected and prepared for analysis?
- Why did the model arrive at a particular conclusion?
- What variables had the greatest impact on the predicted outcome?
- What patterns exist in the data?
- Are there data issues that could be impacting the validity of the model?
- Is the model consistent in its predictions?
- Why is the model a good predictor?
- How accurate is the model?
How is market basket analysis used in practice?
-
Amazon:
(frequently bought together)
-
Traditional supermarkets:
(the top five products are located further apart, and complimentary products like beer and snack/nuts are placed together)
Students also viewed
A&P Exam 5
46 terms
A&P Exam 3
124 terms
Mental Health Exam 3
46 terms
Ast 309L - Search. for Ext. Life EXAM 2
36 terms
Other sets by this creator
Innovation Exam 2
37 terms
South Africa Quiz
27 terms
LIB320 Quiz #1
21 terms
Exam 4
28 terms
Other Quizlet sets
Spirurids and Rhabditids (Strongyloides…
47 terms
INTERVIEW
38 terms
Suicide assessment
15 terms
Development Perry and Potter Chapter 13 Review Que…
15 terms