ITM 320 DATA SCIENCE FINAL REVIEW
Terms in this set (85)
Interval data are variables that can be measured on interval scales.
Data source reliability means that data are correct and are a good match for the analytics problem.
A node where data is stored and processed.
In a Hadoop "stack," what is a slave node?
In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings.
Data is the contextualization of information, that is, information set in context.
Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining.
Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications?
Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance.
If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining."
Structured data is what data mining algorithms use and can be classified as categorical or numeric.
Why is it happening?
Which type of question does visual analytics seeks to answer?
Categorizing a block of text in a sentence
In text mining, tokenizing is the process of
What type of analytics seeks to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible?
In the FEMA case study, the BureauNet software was the primary reason behind the increased speed and relevance of the reports FEMA employees received.
In the evolution of social media user engagement, the largest recent change is the growth of creators.
Use an English lexicon appropriate to the project at your discretion.
Sentiment analysis projects require a lexicon for use. If a project in English is undertaken, you must generally make sure to
In the Opening Vignette (ch2) on Sports Analytics, what type of modeling was used to predict offensive tactics?
Which characteristic of data means that all the required data elements are included in the data set?
Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales.
The term "Big Data" is relative as it depends on the size of the using organization.
Visual analytics is aimed at answering, "What is it happening?" and is usually associated with business analytics.
Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system.
A core engine that could operate seamlessly in another domain without changes.
In the opening vignette(ch7), the architectural system that supported Watson used all the following elements EXCEPT
The cost of data storage has plummeted recently, making data mining feasible for more firms.
In the opening case, police detectives used data mining to identify possible new areas of inquiry.
Analyzing the vast data amounts routinely collected.
Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from
The process aspect means that data mining should be a one-step process to results.
All of the following statements about data mining are true EXCEPT
Predictive algorithms generally require a flat file with a target variable, so making data analytics ready for prediction means that data sets must be transformed into a flat-file format and made ready for ingestion into those predictive algorithms.
Big Data is being driven by the exponential growth, availability, and use of information.
Visualization differs from traditional charts and graphs in complexity of data sets and use of multiple dimensions and measures.
Data is the main ingredient for any BI, data science, and business analytics initiative.
One of SiriusXM's challenges was tracking potential customers when cars were sold.
Statistics and data mining both look for data sets that are as large as possible.
Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel.
The ideas behind it are relatively new.
All of the following statements about data mining are true EXCEPT:
The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit.
Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes?
Prediction problems where the variables have numeric values are most accurately defined as
Ratio data is a type of categorical data.
MapReduce runs without fault tolerance.
All of the following statements about MapReduce are true EXCEPT
Which of the following sources is likely to produce Big Data the fastest?
Nominal data represent the labels of multiple classes used to divide a variable into specific groups.
To respond to its market challenges, SiriusXM decided to focus on manufacturing efficiency.
When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach.
Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings?
Descriptive statistics is all about describing the sample data on hand.
Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica.
Managing data warehouses requires special methods, including parallel computing and/or Hadoop/Spark.
The use of dashboards and data visualizations is seldom effective in identifying issues in organizations, as demonstrated by the Silvaris Corporation Case Study.
MapReduce can be easily understood by skilled programmers due to its procedural nature.
Converting continuous valued numerical variables to ranges and categories is referred to as discretization.
Which operating system is running the dashboard server software.
Contextual metadata for a dashboard includes all the following EXCEPT
Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?
Dashboards provide visual displays of important information that is consolidated and arranged across several screens to maintain data order.
An increasingly challenging task for today's enterprises.
Using data to understand customers/clients and business operations to sustain and foster growth and profitability is
Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments.
Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security.
the Visual Cube Level.
Dashboards can be presented at all the following levels EXCEPT
Data mining requires a separate, dedicated database.
Which of the following is a data mining myth?
Which of the following is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies?
word sense disambiguation
When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________.
Dividing up a text into individual words in English.
All of the following are challenges associated with natural language processing EXCEPT
Categorization and clustering of documents during text mining differ only in the preselection of categories.
Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment.
The processing power needed for the centralized model would overload a single computer.
Big Data often involves a form of distributed storage and processing using Hadoop and MapReduce. One reason for this is
Clustering partitions a collection of things into segments whose members share
Regional accents present challenges for natural language processing.
BOTH A & B
In the financial services industry, Big Data can be used to improve
Big Data simplifies data governance issues, especially for global firms.
Pure Big Data systems do not involve fault tolerance.
Which of the following statements about Big Data is true?
In data mining, classification models help in prediction.
In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals.
During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality.
Data accessibility means that the data are easily and readily obtainable.
Hadoop and MapReduce require each other to work.
What type of analytics seeks to determine what is likely to happen in the future?
The growth in hardware, software, and network capacities has had little impact on modern BI innovations.
In the Dallas Cowboys case study, the focus was on using data analytics to decide which players would play every week.
In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document.
Ensuring that the required information is shown clearly on a single screen
What is the fundamental challenge of dashboard design?
The use of statistics in baseball by the Oakland Athletics, as described in the Moneyball case study, is an example of the effectiveness of prescriptive analytics.
In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way.
Using data mining on data about imports and exports can help to detect tax avoidance and money laundering.
Operational data that identify what actions to take to resolve a problem
What is the management feature of a dashboard?
In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.