Search
Create
Log in
Sign up
Log in
Sign up
Exam 1
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (81)
Definitions of GIS
Collection of tools and systems that handle spatial data
Components of GIS
Hardware, Software, Humanware,
Data
Spatial Data in GIS
Conceptualized as either Discrete or Continuous
Discrete Data
Road signs (points), Roads (lines), Parking lots (polygons)
Continuous Data
Elevation, Temperature
Vector Data Structure
Models data as discrete features
Raster Data Structure
Models data as continuous
Fundamental Questions asked by Geographers
What is where?
Why is it there?
Is it there?
How to get there?
What should go where?
Three Concepts of Spatial Analysis
1) Spatial Analysis uniquely considers the spatial aspect of data
2) Spatial Analysis turns data into something more meaningful, by considering the spatial aspects of the data
3) Spatial Analysis uniquely considers the effects of spatial scale in understanding phenomena
Value-added Products
Maps, graphs, equations, etc
Queries
The most basic spatial analysis
No changes occur to a GIS database, and no new data are created
Ex: What towns are within 1 km of this point?
Measurements
Numerical values associated with geographic data
Length, Area, Shape, Distance, Direction
Ex: What is the area of New York State
Transformations
Change datasets to produce new datasets or new insights into geographic phenomena
May include, applying geometric, arithmetic, logical rules, converting raster data to vector data, or vice-versa
Descriptive Summaries
Capture general characteristics of a dataset
May include, Mean, Standard Deviation, Variety
Ex: What is the mean population of counties in New York State
Optimization
Select ideal locations for objects
Ex: Where should I locate my business
Hypothesis Testing
Uses statistical methods to test for relationships or differences between variables
Can also be used as the basis for predicting the distribution of geographic phenomena
Workflow for Spatial Analysis
"PPDAC"
PROBLEM: framing the question
PLAN: formulating the approach
DATA: data acquisition
ANALYSIS: analytical methods and tools
CONCLUSIONS: delivering the results
Data Capture
Direct data input
Two types, Primary and Secondary
Primary Data Sources
Data collected in digital format for specific use in a GIS
Secondary Data Sources
Datasets (digital or analog) that need conversion for use in a GIS
Data Transfer
Data input form other sources/systems
PPDEE Flow
PLANNING
PREPERATION
DIGITIZING & TRANSFER
EDITING & IMPROVEMENT
EVALUATION
Remote Sensing
refers to the collection of primary raster data via aerial or satellite platforms
Collects data on physical, chemical, and/or biological properties of objects, without directly touching them
Passive Systems of Remote Sensing
Collect data on naturally reflected or emitted electromagnetic energy
ex: Visible light - aerial photography
ex:Thermal infared- can map temperature
Active Systems of Remote Sensing
Beam energy and see how that energy is reflected back
Land Surveying
The practice of determining the exact locations of features on Earth's surface using angles and distances from points with known coordinates
GPS
Global Positional System
Uses satellites in orbit to determine locations on Earth
Location is determined using the distance between GPS satellites and receivers
Differential GPS
Is used to locate features with higher accuracy
A GPS base station with known location transmits "corrections" to GPS receiver device
~15m without DGPS
~10cm with DGPS
Scanning
Secondary raster data capture mainly refers to scanning
Ex: Paper Maps/ Aerial Photographs
Heads-up Digitizing/ Vectorization
The process of converting raster data into vector data
Advantages: Efficient, uses publicly-available raster data sources for tracing
Disadvantages: Positional error in vectorized features and topological errors
Coordinate Geometry (COGO)
A technique that uses survey-style bearings (directions) and distances to delineate line and polygon vector features
Measurements made on the map are converted into real-world distances and then translated into features in a GIS
Examples of Sources of GIS data
USGS Earth Explorer
OpenStreetMap
US Census Bureau
Euclidean Distance
Is the straight distance between two points
Manhattan Distance
Is the distance between two points, along a path made up of strictly north, south, west, and/or east directions
Perimeter
The length of the outer edge of a polygon
Area
The region enclosed by a polygon
Descriptive Statistics
Include measures of central tendency such as mean, median, and mode
Also include measures of dispersion, such as standard deviation
Descriptive Spatial Statistics
Are similar to other statistics, in that they measure traits as central tendency and dispersion
Uniquely consider the spatial characteristics of data when describing the data
Examples are Mean Center and Standard Distance
Mean Center
Measures the central tendency in spatial data
Calculated by separately averaging the x and y coordinates of a set of points
Standard Distance
Describes the dispersion of points around the mean center
About 68% of points fall within one standard distance of the mean center
Spatial Autocorrelation
Measures if values of a phenomenon are clustered, dispersed, or randomly distributed
Positive SAC means nearby values are more similar than distant values
Negative SAC mean nearby values are less similar than distant values
Moran's I
measure of spatial autocorrelation
Ranges from -1 (negative SAC) to +1 (positive SAC)
Zero indicates a random spatial distribution
Significance Testing
Calculated using randomization null hypothesis test
The values in a dataset are randomly shuffled among the geographic features, and the Moran's I is calculated
Generally, p<0.05 is considered statistically-significant
Randomization Null Hypothesis Test
...
Point Pattern Analysis
Used to understand the distribution of points
Spatial Data are often collected as points
As opposed to measures of SAC, point pattern analyses focus upon the point locations themselves
Quadrat Analysis
Examines the variability or uniformity in the distribution of points per cell
A quadrat is a user-defined geographic area that is square
Nearest-Neighbor Point Anlaysis
Is another way to understand if points are clustered, dispersed, or randomly distributed
This analysis uses the average distance between each point and its nearest point and the expected distance between nearest points if the points were randomly distributed
A nearest neighbor test statistic can be calculated in order to determine if points are significantly clustered or dispersed
Chi-square Test
Determines whether an observed distribution is the same as an expected distribution
a Chi-square test is performed by comparing the distribution of the number of points per quadrat to a uniform distribution
Hot Spot Analysis (Getis-Ord Gi*)
used to study whether there are cluster in high or low values in a spatial phenomenon
The Getis-ord Gi stat is a common inferential statistic . The higher a features values and its neighbors values are when compared to the mean, the more likely it is to be significant
Interpolation
Creates a continuous raster surface
Assumptions of Interpolation
Nearer things are more related to each other than farther things (Tobler's First Law of Geography)
Points closer to the predicted location have more importance (weight) than points farther away
Interpolation Workflow
#1 Control Points are collected with geographic coordines (x,y,z)
#2 The cell size of the raster for interpolated values is chosen
#3 The method of predicting the values at each cell is chosen
#4 The method is calibrated and validated using control points
#5 The Method is finally applied using the control points to produce a continuous surface
Inverse-Distance Weighting
A method in which a control point's weight is inversely related to its distance: the closer it is, the more importances its value
Natural Neighbor
A method based upon Voronoi Cells
The percentage of a control points original Voronoi cell, within the new Voronoi cell created by the location whose value is being estimated, is its weight
Voronoi Cells
Regions in which every location within it is closest to the control point at its center
Splining
The predicted surface is a "rubber sheet", and the control points are "anchors" against which the sheet is stretched
An equation or equations describe this stretched surface
Kriging
An advanced interpolation method
Weights values of nearby control points to predict values, but does so based on distance and the spatial arrangement of data
Considers redundancy
Kriging is a 'geostatistical' method (previous methods are 'deterministic' because they are based upon strict methods that use exact values of control points
Uses a model of the semivariance created from the entire set of control points
The Semivariance trend is used to calculate the weights
Semivariance
A measure of dissimilarity between control points
Semivariogram
Graphs semivariance vs. distance
How Control Points are Chosen
Options:
Choose the nearest x control points
Choose all control points within a specified radius
Choose nearest x control points from each of y sectors
etc.
Leave-one-out cross-validation
Step1) the interpolation method options are chosen
Step2) One control point is omitted
Step3) Interpolation is performed with options from Step 1
Step4) The actual value of the control point is compared to the value at the point's location predicted by the interpolation
Step5) Step2-4 are repeated for each control point
Step6) Actual versus predicted values are graphed
Step7) Steps1-6 are repeated to improve the interpolation
Hypothesis testing
1) Comparing means
2) Comparing observed versus expected distribution
3) Assessing relationships between variables (covariance)
Parametric Statistics
For test distribution with known parameters
Values of what you are studying are normally distributed (a "bell curve" historgram)
Non-parametric statistics
For test distributions with unknown parameters
Dependent variable is not normally distributed
The Sample size for the dependent variable is small
Comparison of Means
Are two means equal or different?
Statistical tests: normal data and non-normal data
Ex: river otter dens are nearer to water bodies than would be expected by chance. Measure distance from dens to water, measure distance from random points to water, perform test to compare means
Observed versus expected distribution
Does the observed phenomenon match an expected distribution
Ex: Sparrow bird nests are clustered. Calculate number of nests per grid cell. Calculate "expected" number per grid cell. Perform X^2 test
Relationship between variables
Are two variables related?
Dependent variable: the variable whose variations in values are studied
Independent variables: the variables that cause the variation in the dependent variable
Linear Regression
Ordinary least-squares linear regression
Fits a best fit trend line through points on a graph and plots the dependent vs. independent variable
Assumes dependent variable is normally distributed and relationships between variables is linear
Linear regression slope
shows relationship between variables (+ or -)
Linear regression p-value
whether or not a variable is significantly related to another; p<0.05 generally accepted as significant
Linear Regression R^2
Strength of relationship between variables
0 to 1, 0 meaning no relationship, 1 meaning a perfect relationship
Logistic Regression
related to linear regression
assesses relationship between a binary dependent variable and the independent variables
Binary dependent variable: "1"s and "0"s representing:
Present ("1") or absent ("0"), Suitable ("1") or unsuitable("0"), Occurred ("1") or didn't occur ("0")
Research vs. Management Models
As research or management tools
Physical vs. empirical Models
Physical based on physical laws, first principles
Empirical based on observations, exact mechanism unknown
Linear vs. Non-Linear Models
Linear are first-degree equations
Non-linear models are one or more equations but are not first-degree
Inductive vs. Deductive models
Inductive create general models using observations of phenomenon
Deductive models are based upon prior knowledge, expert opinion
Deductive Modeling Approaches
Select independent variables related to the dependent variable
Describe relationship between dependent and independent variables
Weight the importance of variables
1)Ask each expert independently for an opinion on questions
2) Calculate the median range of opinions and report these back to the experts for another round of estimates
3) Repeat set 1 and 2 for a few rounds
4) Use the median of the final round as the best answer
The "Delphi Process"
Develop a model based upon expert judgements to maximize the accuracy of model estimates
Inductive Modeling Approaches
Models can also be developed from observed relationships between dependent and independent variables
Regression and other methods that generate equations can be used for prediction
Model Evaluation
Predict and compare values of dependent variable in another dataset
For models with interval/ratio data: calculate root-mean-square error (RMSE)
Find spatial autocorrelation of model residuals (clustered = bad )
For dependent variables with binary outcomes: construct a "confusion matrix" and calculate accuracy measures
How is GIS used for hypothesis testing and spatial modeling
Storing data for analysis
Data preparation
Creating or calculating independent/dependent variables
Preparing/exporting a data table for analysis (Ex: Zonal statistic and Extract multi values to points tools)
Performing aspects of the actual testing or modeling (Ordinary least squares and Moran's I tools)
Applying a model to generate a prediction across space (Raster Calculator tool)
;