The dimension/direction of largest variation in the data can be captured with thefirst principle componentPCA can be used as a pre-processing step before implementing prediction or clustering algorithms T/FTruePCA is an example of unsupervised learning T/FTrueThe cumulative proportion of variance explained increases as more principal components are considered. T/FTrueyou want to apply PCA for crypto dataset to generate (1)________________ new feature variables with (2)________________ dimension.1. uncorrelated 2. reducedmarket basket analysis is an __________ method that identifies latent patterns in transactional data. It is an unsupervised machine learning technique used for knowledge discovery. This analysis results in a set of _____________ that identify patterns of relationships among items.association rulesK-means is one of the popular __________________clustering algorithmsWe can determine an appropriate number, k, of clusters, with a technique commonly known as the _________________elbow methodYou can decide the epsilon ("eps") of DBSCAN in a heuristic approach by using ___________________k-nearest neighbor (kNN) distance and kNN distance plotFor anomaly detection (e.g., outlier detection, risk prediction), can we use clustering methods? T/FTrueFor anomaly detection (e.g., outlier detection, risk prediction), can we also use classification methods (i.e., prediction of discrete classes)? T/FTrueMachine learning algorithms map input variables (i.e., feature, independent variables) to output variable (i.e., class, label, dependent variable, output) are ______________________supervised learning(1) Classification and (2) Regression are supervised learning algorithms. In particular, when we try to predict or classify discrete values such as Spam or Not-Spam, we need to use __________________classificationIn order to apply classification for anomaly detection, we need to
(a) (training step) teach our machine learning models (including AI) with the (1)______________ (in-sample data).
(b) (prediction step) and apply trained machine learning models to predict (2) ________________ (out-of-sample data).(1) total training dataset, (2) test dataset(Step 1) First, we need to train our models with the train dataset
(Step 2) Second, we predict outputs in ____________________- using trained models with the train datasets.
(Step 3) If the classification results are good enough in step 2, then, we train the model with the optimal hyperparameter using the total train dataset to predict the label in test dataset.validation datasetThe popular evaluation criteria for classification algorithms are (1) accuracy (2) prevision (3) recall (4) F1-score.
In general, if your dataset is an imbalanced dataset, you might need to use the F1 score to measure the performance of your machine learning models.
Especially, when classes of your dataset are not evenly distributed, we call this type of dataset as _____________________imbalanced datasetDensity-based spatial clustering of applications with noise (DBSCAN) is a clustering method. Two important hyper-parameters for DBSCAN are _____________epsilon ("eps") and minimum points ("MinPts").Given a matrix X, the expression USV* (* is a transpose) denotes the Singular Value Decomposition (SVD) of X.
The singular values are the diagonal entries of the _____________ matrix.Syou want to apply PCA for crypto dataset to generate (1)________________ new feature variables with (2)________________ dimension. In this case, we wishes that new variables contain (3) _________ amount of infomation related to the original high dimensional dataset.1. uncorrelated 2. reduced 3. moreWhen using PCA we create new variables that are ___________________ of the original variableslinear combinationsPCA is for ____________ variable; therefore, we should apply _____________ as a preprocessing to reduce scale and/or unit effects.1. continuous variable 2. Normalization/StandardizationPCA can be used as a pre-processing for supervised learning, unsupervised learning, and regression analysis T/FTrueDuring the data pre-processing state, data values can be scaled into the range of [0,1] or [-1,1], in a process called _________________normalizationCan we use PCA for embedding from text and image data to reduce dimension and/or visualization T/FTruePC1 and PC2 are perpendicular (orthogonal) and they are ________________uncorrelatedIn natural language processing (NLP), is the smallest unit (e.g., a word) that a corpus is made up of _____________A tokenIn natural language processing (NLP), ______________ is a collection of text data (e.g., the set of books, reviews, and news) used for the NLP task.a corpusIn natural language processing (NLP), _________________ is the set of unique words used in the text corpus.a vocabularyIn natural language processing (NLP), __________________ is a set of co-occurring N-words within a given window.n-gramIn natural language processing (NLP), ________________ is a way of breaking the raw text into smaller units called tokens (e.g., separating a sentence into words).tokenizationIn natural language processing (NLP), _____________ is a way of reducing multiple variants of a word to a common core. For example: switching "traveling", "traveled" to "travel".stemmingTF-IDF is ____________ where a rare term is present or frequent in a documenthighTF-IDF is ____________ where a term is absent from a document, or abundant across all documentsnear 0___________________- is a popular topic modeling method to extract latent topics from a given text corpus.Latent Dirichlet Allocation (LDA)The key hyperparameter of LDA topic modeling isThe number of topicsLDA topic modeling considers that :
(a) every document is a mixture of (1)_________
(b) every topic is a mixture of (2) ___________(1) topics (2) wordsWe can use LDA topic modeling to extract ____________________, called beta. Beta value for each word indicates word probability for a given topic.per-topic-per-word probabilitiesWe can use LDA topic modeling to extract ____________________, called gamma. Gamma value for each topic indicates topic probability for a given document.per-document-per-topic probabilities