research methods final
Terms in this set (150)
standard error of measurement (SEM) is key measurement for
responsiveness +
absolute reliability
if a test has a high ability to obtain a positive result when the condition is present so you can be confident when you get a NEGATIVE to rule OUT the disorder, the test is high in
sensitivity
if a test has a high ability to obtain a negative result when the condition is absent, you can be confident when you get a positive that you can rule IN a disorder
specificity (spin)
true positive rate
sensitivity
true negative rate
specificity
if the berg balance scale is high specificity what does this mean
it has a high ability to obtain a negative result when the condition is NOT present therefore when it is a positive result, you are able to rule in the condition (spin)
a positive predictive value + negative predictive value are influenced by the ___ of the condition
prevalence
as prevalence increases, the positive predictive value ___ and negative predictive value ____
increases; decreases
when specificity increases.... PPV ___
increases
when sensitivity inc.... NPV _____
increases
___ combines sensitivity + specificity into one index
likelihood ratio; indication of HOW much a given test score will raise or lower the pre-test probability of the condition being present
a high positive likelihood ratio indicates
you can be more certain that a positive test means the person has the disorder (spin)
(i.e., if someone has a LR+ >5 you have greater odds that the person has the condition and you can rule it in)
a LOWER negative likelihood ratio means
you can be more certain that a negative test means the person does NOT have the disorder
(i.e., if someone scored a negative likelihood ratio of <0.02, you are better able to conclude the person does not have balance impairment)
the receiver operating curve (ROC)
graphically examines balance between sensitivity + specificity
it takes into account true positive scores + false positive scores
y-axis is sensitivity
x axis is 1-specificity
greater area under curve, more perfect test
helps determine best sensitivity/specificity ratio of the test (optimal cut off point)
receiver operating curve (ROC)
when appraising literature, sandy thought about the critical appraisal questions of:
1. were subjects of all levels/stages of condition included?
2. was reliability (consistency) of target test established?
3. did investigates compare target test to gold standard?
4. did all subjects undergo testing with comparison test?
5. were investigators interpreting test results blinded?
6. was the time between application of target + gold standard short enough to min. opportunity for change?
7. were the methods for performing tests described in sufficient detail to permit replication of study?
if 50% pretest + a positive likelihood ratio of 5...
there is inc post test probability (80%) that this person has the disorder --> rule in
if 50% pretest + a negative likelihood ratio of .3...
there is a dec post test probability that this person has the disorder (better for ruling out)
if 25% pretest + LR of -0.10, what is the dec post test probability?
3% that this person has the disorder, this is better for ruling out
measure has face validity if....
it appears to measure what it is intended to measure
measure has content validity if...
it represents aim of instrument
____ + ____ validity need to be determined first
face + content
if berg balance scale was developed to measure balance, does it measure balance is an example of?
construct validity; does operational definition of test differ form other tests
convergent validity (construct)
do instruments measuring same construct have similar results
i.e., how well does the mini-BEST relate to 6MWT (delicious); as scores go up on mini-BEST, they do with 6MWT
discriminant validity (construct)
do instruments measuring different constructs have similar results
criterion validity
target test compared to gold standard
concurrent validity (criterion)
target test + criterion measures
predictive criterion validity
does target test predict future outcomes
if there is high reliability but no validity
all hit one spot on target but not center
if high reliability and high validity
all hit one spot on center of target
an unreliable measure can or cannot be valid
cannot be valid (if not consistent cannot be true)
what is measurement validity?
extent to which a test measures what it is intended to measure?
research validity?
how well does the study control extraneous factors so that we are confident that the observed results are due to impact of IV?
threats to internal + external validity are parts of ___ validity
research
there is a low SEM, we can be ____ certain that someone has had a chance in there balance
more certain
the smaller the SEM (MDC),
we dont have see a huge drastic change (the measure is more responsive to any sort of change)
responsiveness measures (SEM + MDC) are ____ specific
context specific (based on study design, outcome measure, population, setting)
if there is a ceiling effect there is a ____ skew
negative skew (peak of curve to R)
scores higher at baseline
if there is a flooring effect, there is a ____ skew
positive skew (peak of curve to L)
score lower at baseline
if the starting patients scores are too high or too low, it may
limit the ability of the test to detect change over time
reported measures are acceptable if < ______ of the sample score highest or lowest on the test
15%
distribution-based approaches include
minimal detectable change, effect size and standardized response mean (SRM)
standardized response mean
an indicator of responsiveness based on the difference between two scores on an outcomes instrument (SD different)
effect size
reflects the size of the association between two variables independent of the sample size (looking more at variability at baseline)
Minimal Detectable Change (MDC)
the amount of change that just exceeds the standard error of measurement of an instrument
true difference
in order to use the MDC, pts
cannot change b/n testing sessions
must have normal distribution of scores
if a pt demonstrates a change > than the MDC (+ or -),
it can be assumed true change has occurred
MDC equation
1.96
2
SEM
according to cohen's criteria, an effect size of >.80 means,
there was a large change and variability at baseline was smaller
smaller effect size means ____ variability at baseline
more variability
standardizes response mean (SRM) takes into account the
SD difference between scores instead of at baseline
the minimal clinically important difference (MCID) is based on
anchors that are meaningful to the patient
the global rating of change scale (GRG) is an example of a ___ measurement
minimal clinically important difference (MCID)
______ values on a scale lead to _____ opportunity to detect change
more; greater
key measure of responsiveness is
standard error of measurement (SEM) which is the extent to which observed scores vary around the TRUE score;
estimate of expected variation in set of stable scores;
absolute reliability
a lower SEM means there is ____ variation and ____ responsiveness
less variation + more responsiveness
to be clinically useful the measure must have a _____ ICC + _____ SEM
HIGH ICC + LOW SEM
you want a ____ SEM for a measure
LOW; dont want too much variability (error)
the measure of relative reliability of measures is
intra-class correlation coefficient (ICC)
for test-retest reliability you want a ICC > _____ for clinical application
for intra/inter-rater reliability you want a ICC > _____ for excellent
0.90
0.75
the ICC is based on...
1. repeated measurement agreement
2. correlation of measurement scores
Cronbach's coefficient alpha
a measure of internal consistency that estimates the average correlation among all of the items on a scale
(correlation among test items AND each individual test item with the total score)
Cronbach's alpha is best if
0.7-0.9
you dont want too low bc test items are measuring different constructs
you dont want too high bc test items are measuring redundant constructs
instrument reliability includes
test-retest reliability
rater reliability includes
intra + inter rater reliability
do the instruments have homogeneity in measurement is which reliability?
internal consistency
Fiems study on reliability of balance app (SWAY) looked at which reliability measure?
test-retest reliability (is the measure reproducible with same subjects on different occassions)
is the measure recorded by one tester consistent across multiple trials?
intra rater reliability
is there consistency of measures among different raters in assigning scores to same subjects at same time point
inter rater reliability
systematic errors are _______ where the issues is with ___ instead of _____
predictable errors of measure; issue with validity instead of reliability
random errors are ______ that can be a result of
happening by chance; unpredictable
participant, rater or test
if the rater is tired, this is a ___ error
random error
if the timer is slow this is a ___ error that affects ____
systematic error; validity
do the berg relate to minibest measurements since both are measuring balance?
convergent validity
analyze convergent validity + criterioin validity with a
person product or spearman rho
does the comfortable walk test and PDQ39 measure different things since inverse,
divergent validity
Fiem's project, sway balance related to gold standard inertial measurement units?
concurrent validity
does GPA/SAT predict academic success? Does BBS predict falls?
predictive criterion validity
ethnography is the
analysis of culture focused on knowledge, beliefs, behaviors
phenomenology is the
understanding of experience of phenomenon form those inside lived experience
grounded theory is the
focus on developing theory from inductive approach
always want consistency with research question through...
1.overall approach
2.participants
3.data collection
4.data analysis
5.trustworthiness
6.reflexivity
purposive sampling
selecting paricipants who can provide insight into question
snowball sample
ask for recommendations
maximum variation of sampling
look at wide variety of the sample (expert to novice)
semi structured interviews
most common: some questions but open to direction of convo
member checking
back to participants + verify findings; read transcript/share results to make sure it is what they were expressing
triangluation
use multiple angles for methods or researchers
data saturation
analyze data along the way and get to point where you are no longer getting new information
reliability of coding
multiple people code + then look at reliability of 2
comparing the reproducibility of a test on different occasions with the same subjects is an example of:
test-retest reliability
which of the following is a test of relative reliability?
ICC: intraclass correlation coefficient
an ICC of .7 for intra-rater reliability indicates:
good reliability across time for one rater
which statement is true?
reliability + validity are always context specific
minimal detectable change score...
indicates change beyond measurement error
measures of absolute reliability
express values in same units as the original measure
if the SD is 4.3 and ICC Is .90, what is the SEM?
1.36
researchers want to know if 2 measures have similar operational definitions. this is an example of
convergent validity
analysis of relationships between tests can be done with
person product
examining construct validity of new outcome measure will utilize why type of research design?
exploratory study
to establish concurrent validity, the target test must
detect similar outcomes to the gold standard
does the berg balance scale predict falls with persons with chronic stroke? what is the DV?
falls
how can results from a test that has high sensitivity (>90) be used?
negative test rules out the condition
the tests ability to obtain a positive result when the condition is present
sensitivity (you can be confident when you get a negative result that they do not have it bc they are able to obtain a positive result when condition is present
negative and positive predictive values are influenced by
prevalence of condition
Large LR+ are helpful for
ruling in a condition
smaller LR- are helpful for
ruling out a condition
slow running stopwatch is example of which error?
systematic error
simple mistakes, fatigue, inattention, tester inaccuracy are examples of which error?
random error
fiem's study evaluating reliability of SWAY balance app in person with parkinson disease 2 sessions, one week apart was measuring which reliability?
test-retest reliability
what is the drawback of intra rater reliability
expertise does NOT always = reliable
which reliability was calculated when looking at AM-PAC 6 clicks basic mobility and daily activity short forms by therapist pairs?
inter-rater reliability
which general approach to reliability is relevant to measures of surveys/questionnaires/PROs
internal consistency (homogeneity of instrument)
a scale of physical function reflecting physical performance, not emotional function is an example of
internal consistency
a study looking at the Neuro-QoL short forms and targeted scales with people who have MS, looked at _____ when comparing short forms and reported it using ______
internal consistency; cronbach's alpha
the ____ reliability is measured with a correlation coefficiency and the ____ reliability is measured with the standard error of measurement (SEM)
relative; absolute
the ICC reflects
1. degree of correspondence + 2. agreement among ratings
for test retest you need a ____ ICC to have acceptable reliability in research and for intra/inter you need a ____ for excellent reliability
.70
>.75
an ICC of .316 between neuro service + daily activity scores means
there is poor reliability of the measure for daily activity short forms in the neuro population
if cronbachs alpha is 0.8,
this is a desired correlation between test items
if cronbachs alpha is 0.98,
this is too high of a correlation, meaning the measuring constructs are redundant
if a measure has a low SEM,
you can assume the measure is MORE responsive to change and this is GOOD (smaller measurement error)
a measure with low reliability WILL have ____ validity and ___ clinical utility
low; low
measure must have ___ ICC and ___ SEM to be clinically useful
HIGH; LOW
if looking at proportion of agreement beyond expected by chance between gender, or assistance levels, use a ____
kappa statistic
when looking at limits of agreement between alternative methods of measurement for the SAME impairment, look at
band altman plots
a 0.9 ES or SRM per Cohen's criteria can be interpreted as
a larger change
which 2 distribution based approaches to measure change use Cohen's criteria?
effect size + standardized response mean
what is the difference between effect size and standardized response mean?
effect size looks at SDpre and standardized response mean looks at SD difference
the global rating of change scale is used for which method of measuring change?
minimal clinically important difference
what are measures of responsiveness (distribution based)
SEM + MDC
what are measures of responsiveness (change) that are anchor based?
MCID per GRC
if the MDC is 6.2, and those that report they had moderate improvement had an average score of 7.0, it can be concluded that
this is a good measure because it is increasing with the patients idea of increasing
what validity is considered truth in measurement + what the outcome measurement really means?
measurement validity
an unreliable measure can or cannnot be valid
CANNOT bc measurements with a lot of error have little meaning or utility
what're the 2 types of construct validity
convergent + discrimnant
what're the 2 types of criterion validity?
1. concurrent validity
2. predictive validity
which validity looks at how well items of an instrument are representing the aim of the instrument
content validity
professionals and patients with balance impairments identified groups of items important for inclusion in the scale for BBS is an example of which validity
content
which validity looks at degree an instrument reflects the operational definition of the construct it is meant to represent? + what are the types
construct validity
types: convergent + discriminant
construct validity is analyzed with
pearson product, spearmans rho, kendall's tau
a correlation of .4 for convergent validity is considered a(n)
adequate correlation
a correlation of .62 for convergent validity for miniBest + 6MWT is?
excellent correlation
which validity compares a measure to a measure with established validity (gold standard)
criterion validity;
types:
1. concurrent
2.predictive
which validity was looked at when comparing sway balance app with inertial measurement units (gold standard)
concurrent validity (criterion)
looking at the ability of the BBS to predict fall risk is looking at which validity?
predictive validity
what are all of the validity indexes?
sensitivity/specificity
+ and - LR
ROC Curves
in a 2x2 table.... true positive is ___ false positive is ____ false negative is ____ and true negative is ____
a; b; c; d
sensitivity is calculated as
100% * (a/a+c)
specificity is calculated as
100% (d/b+d)
