MGMT 3200 Exam 1

In/Out Tools
Click the card to flip 👆
1 / 64
Terms in this set (64)
Allow the input and output of data and generally start and end a workflow.
Tools:
Input data - Connect and import data from a file or database
Text input - Manually add a bit of data
Directory - input names of files, etc. from a directory
Date time now - Insert current date and time
Browse - Look at the data
DoubleData Type Number: A double-precision floating point number is a 64-bit approximation of a real number. (double-precision floating-point) 8 bytes -1.79769313486232E308 to -4.94065645841247E-324 for negative values; 4.94065645841247E-324 to 1.79769313486232E308 for positive values +/- 1.7E +/- 308 (15 digits) where 308 is the exponent and 15 digits references fifteen digits of accuracy.StringData type Character: Fixed Length String. The length must be at least as large as the largest character value contained in the field. Limited to 8192 characters. Any string whose length does not vary much from value to value.V_StringData Type Character: Variable Length. Length of field will adjust to accommodate the entire string within the field. If the string greater than 16 characters and varies in length from value to valueWStringData Type Character: Wide String will accept unicode characters. Limited to 8192 characters. Any string whose length does not vary much from value to value. Æ.ç.ß..Ð.Ñ... Any string that contains unicode charactersV_WStringData Type Character: Variable Length Wide String If the string greater than 16 characters and varies in length from value to value. If the string contains unicode and is longer than 16 characters, use V_WString, such as a "Notes" or "Address" field.DateData Type Character: A 10 character String in "yyyy-mm-dd" formatTimeData Type Character: A 8 character String in "hh:mm:ss" formatDateTimeData Type Character: A 19 character String in "yyyy-mm-dd hh:mm:ss" formatBlobData Type Blob: Binary Large Object: A large block of data stored in a database. A BLOB has no structure which can be interpreted by the database management system but is known only by its size and location. an image or sound fileSpatialObjData Type Blob: The spatial object associated with a data record. There can be multiple spatial object fields contained within a table. A spatial object can consist of a point, line, polyline, or polygon.Tool: ImputationUpdate specific values in a numeric data field with another selected value. Useful for replacing NULL () values Problem: data contains values that are missing/nullCategorical (Nominal) DataNo meaningful order Ex/ occupation, marital status, customer id Binary Data -2 categorical states Symmetric - equally valuable Asymmetric - one state more valuable -ex/ HIV statusOrdinal DataMeaninful order, but distance is unequal ex/ Likert scales, army rank, gradesNumerical DataMeasurable - Integer or Real4 V's of Big DataVolume Velocity Variety VeracityVolumeScale of DataVelocityanalysis of streming dataVarietydifferent types/form of data Most common: textVeracityuncertainty of data Is this data actually measuring what we think it is?Tool: Record IDAssigns unique numeric value to each row of data Problem: you have some data w/o unique id numbersTool: FormulaCreate new data fields based on an expression or by assigning a data relationship or to update an existing field based on new formula. Can also reorder fields and change fields type Problem: need to do some stuff with dataConditional PatternsIf c THEN t ELSE f ENDIF IF c THEN t ELSEIF c2 THEN t2 Else f ENDIFTool: Multi-Field FormulaApplies the same expression to all selected columns _CurrentField_ represents every relevant column Problem: have some data with multiple columns of the same datatype you'd like to do the same "thing" toTool: TileGroup records by number of records or totals, etc. Problem: want to split the data into "affinity" groupsTool: Text to ColumnsAble to split field by delimiter into a set of Columns and then can rename using select node Problem: date field is messy and needs to be separatedRegExlanguage specifying patterns of strings, which are then used to find, replace, or extract all the strings that match the patterns in a given data setReplaceRegEx Replaces a string that matches the specified regex with another string specified in the replace optionTokenizeRegEx separates all the sub-strings that matches the specified regex and places each in a separate column. EG if the string is 020316 then the regex \d\d matches any two consecutive digits, so the string is divided into three separate substrings, 02 03 16, each of which is put into a new columnParseRegEx Extracts any group specified in the given string and puts it in a separate field. Eg The regex (\d*)\s(\w*), would extract from 313 Pine St. the street address 313 and the street name Pine St. and put each into a new field with associated data typeRegEx Functions in Formula ToolREGEX_MATCH(string,pattern,icase) ex/ REGEX_MATCH(123-45-6789, "\d{3}-\d{2}-\d{4}") Returns - 1 (True) REGEX_REPLACE(string, pattern, replace, icase) REGEX_COUNTMATCHES(string, pattern, icase)Tool: FilterAllow you to filter out data you want by choosing specific criteria to filter the whole data set. The tool then outputs a set of data that is true to the criteria. The brows node shows the data that is relevant to the true (T) output of the filter tool Problem: full data set contains data that you want to be identified separately from the whold data set. Ex/ filter out null and incomplete dataTool: SortAllows the sorting of data in ascending or descending order. If mult fields are selected, it sorts based on the 1st field, then sorts by the second, etc. Problem: data in the wrong orderTool: SampleLimit data to a certain number, %, or random set of records out of the whole data setTool: Random % SampleGenerate random number or % of records passing through the data stream. If you want to base your analysis on 35% of the data, it will randomly return dataTool: UniqueSeparate data into 2 streams (duplicate and unique) record(s) based on the fields of the user's choosing Problem: duplicate data use to find only unique locationsInner JoinBoth sides have to match only those records returnedLeft UnjoinShows records (and their attributes) that only exist in the left tableRight UnjoinShow records (and their attributes) that only exist in the right tableOuter Joins in AlteryxLeft Outer Join Right Outer Join Full Outer JoinLeft Outer Joinall records from left table and all records in the right table that join with the inner inner join+left unjoinRighter Outer Joinall rocords from right table and all records in the left table that join with the right Inner join + right unjoinFull Outer JoinEverything from both tables Inner Join + Left unjoin + right unjoinMachine Learning Life Cycle1. Define Project Objectives 2. Aquire and Explore Data 3. Model Data 4. Interpret and Communicate 5. Implement, Document, and Maintain1. Define Project Objectives1. Specify business problem 2. Acquire subject matter expertise 3. Define unit of analysis and prediction target 4. Prioritize modeling criteria 5. Consider risks and success criteria 6. Decide whether to continue1. Define Project Objectives --> 1. Specify Business Problem --> Criteria-Stated language of business (rather than language of modeling) -Specifies action that might result from this modeling project -Includes specifics such as number of customers affected, costs caused by this problem, etc. -Helps reader by explaining how project will improve the bottom line1. Define Project Objectives --> 2. Acquire subject matter expertiseWhy? -Early indication of obstacles and or opportunities -Suggests data collection and modeling ideas Provides knowledge useful in data processing steps Helps set expectations for model performance Clarifies alternatives to building model How - Talk to colleagues or subject matter experts -Read1. Define Project Objectives --> 3. Define Unit of Analysis and Prediction TargetTool: SummarizeAllows you to create subgroups, and extract statistical info on each group Problem: Data is not properly aggregatedTool: CrossTabQuite similar to summarize, but moves results to similar columns rather than rows Problem: aggregate data by columnTool: TransposeSimilar to CrossTab, but moves results into columns rather than rows Problem: Aggregate data by column1. Define Project Objectives -->4. Prioritize modeling criteria-Predictive accuracy -Familiarity with model -Prediction speed -Speed to build model -Insights1.Define Project Objectives -->5. Success CriteriaWho uses the model? How much value can the model drive? What modeling criteria will help get you there?1. Define Project Objectives -->5. RiskPlay devil's advocate, be creative1. Define Project Objectives -->6. Decide whether to continueEstimate resources required Understand alternatives to creating model Consider technical risks Estimate models business value