CSD 315s Exam 4


Terms in this set (...)

What frequencies contribute most information for intelligibility?
200 - 4000 Hz ( band of greatest sensitivity of the human auditory threshold )
The ear is most sensitive to ranges of
Speech is highly resistant to
types of disturbances to integrity of speech signal (ex: peak clipping, filtering, unfavorable signal/noise ratios)
peak clipping
eliminating max/min parts of speech waveform
You can still identify sounds even after
they are clipped &
You can pick out signals even in
adverse situations
bottom-up speech perception is captured by what progressive stages of analysis?
acoustic to auditory transduction to phonetic categorization to syntactic analysis to semantic understanding
sound existing in air
what brain produces in terms of perception
it's much easier for neurons to
do fourier analysis
language by eye (reading/writing) =
a cipher or alphabet
4 parts of language by eye
1. sensory channel operates with simultaneous input (visual)
2. symbolic code is consistent, context-free = alphabet
3. reading/writing are optional for language users; many primitive societies have well formed, sophisticated, spoken languages but amidst 100% illiteracy
4. reading/writing are not language universals, but need to be taught; more difficult task than "learning" to speak
language by ear: spoken language =
a code
3 parts of language by ear
1. sensory channel operates serially, sounds arrive at the ear in succession
2. universal ability: all cultures have spoken languages
3. with normal sensory exposure/experience, a child does not need to formally learn how to speak & perceive his/her natural language. it 'happens' with little or no formalized learning
speed of transmission
speech processing rates of > 20-25 phonemes per sec are common and effortless
non-speech sounds
can no be processed anywhere near as fast as speech
speed of temporal transmission is
unique in speech
the absence of sound can be
an efficient cue in speech perception (ex: slit-split study and rabid-rapid study)
what are the most pertinent perceptual cues for vowels?
1. spectral shape
2. fundamental frequency
3. duration
4. time course of F0
5. phonetic context of surrounding vowels
what question did the Ladefoged study (1962) ask?
do vowels in surrounding words influence perception of a given target vowel?
ladefoged study (1962)
"the bat was big" - raised F1-F3 of the/was/big, subjects report hearing the word "bat" as "bet"
result of ladefoged study (1962)
we estimate vowels by conversation/other vowels, CONTEXT
what is a fricative?
continuants, constriction maintained throughout a relatively lengthy duration of noise
a stop exists only as
part of the entire CV unit, cannot be produced alone
what is the main cue for perception of each fricative?
spectral shape of fricative noise
what fricatives are spectral cues not distinctive enough to serve as unique perceptual cues?
f, v, voiced th, and voiceless th
What are the perceptual cues for f, v, voiced th, and voiceless th?
differences in coarticulatory dynamics creating different transition trajectories
why was the pattern playback machine invented?
study of how speech sounds are perceived; invent a method to synthesize a respectable audio speech signal under the experimental and manipulative control of the investigator
The Pattern Playback Machine
machine created by Frankline Cooper that turns a spectrogram into speech
what did the burst study (pi-ka-pu experiment) question?
what perceptual role does burst play in stop consonant perception
what did the burst study (pi-ka-pu experiment) conclude?
the same acoustic cue can serve as the same perceptual cue for different sounds. speech perception is NOT a SIMPLE OR STRAIGHT-FORWARD decoding process
we have __ formants, but the first __ do all the work, they are all you need to determine vowels
6, 2
what are perceptual cues for semi-vowels?
steady state onsets, short horizontal resonance needed to simulate the steady vocalization that occurs in semi-vowels before the tongue quickly moves to the following vowel position
F2 transition study (liberman, delattre, cooper, & gerstman 1954)
original study that discovered & documented the "non-invariance" dilemma in speech perception
F2 transition study (liberman, delattre, cooper, & gerstman 1954) results
voiced stops - F2 changes

unvoiced stops - same F2, F1 cutback needed

nasals - nasal resonances added
acoustic variance & perceptual invariance question
how do listeners arrive at an invariant phonemic categorization in the face of highly variable acoustic information characterizing the same segment in different phonetic contexts? (how do we hear the same stops when it's acoustically all over the place?)
the locus study 1955
focus on articulation. syllables all seemed to be "pointing" to a fixed starting frequency of F2 - they dubbed this point in frequency space the "locus"
result of the locus study
velar tokens showed no invariant locus and labials showed a different locus for each vowel context - DIDNT WORK.
theory discarded, but continued to search for articulatory based explanation for perception of stop+vowel units (how motor theories began)
____ acoustic cues are ultimately responsible for perceptual identity
what are the redundant perceptual cues for the voicing dimension of stops?
VOT, low-frequency energy/rapid spectrum change, burst loudness, fundamental frequency, segmental duration, prevoicing
voicing onset is delayed > 20-25 msec
voiceless stop heard
VOT is < 20 msec
voiced stop heard
there is a ___ ___ relationship between VOT and the presence or absence of a significant F1 transition following voicing onset
perceptual trading
the peak intensity & duration of the burst of frication noise are ___ at the release of a voiceLESS plosive
the F0 averaged over the duration of a vowel placed between voiceless plosives is about __% higher than in a vowel placed between voiced plosives
a vowel followed by a voiceLESS stop is significantly __ in duration than it would be before a voiced stop
a plosive will be ___ when vocal folds are positioned for voicing
utterance initial voiced stops are usually not
voiced stops preceded by a vowel or voiced consonant are
voiced characteristics
presence of F1 transition; small VOT interval (<20 msec)
voiceless characteristics
absence of F1 transition; large VOT interval (>= 30 msec)
the presence of F1 overrides
a long VOT overcomes presence of
F1 transition
perception is continuous when
discrimination of stimuli that belong to the same category is as good as it is between stimuli that belong to different categories
perception is categorical when
there is a sharpening of discrimination ability across the boundaries of adjacent categories, i.e. stimuli that belong to different categories are discriminated better than those belonging to the same category even when the physical differences between stimuli are the same
A person's ability to discriminate is much more ___ than their ability to absolutely identify a stimulus
sensitive (better)
In speech, you can only discriminate what you can
absolutely identify
When the acoustic and articulation systems go their separate ways, perception follows what?
articulation (start towards motor based theory of speech perception)
categorical perception
basis of motor theory
the critical acoustic correlate of stop place identity is the
F2 transition
voiceless stimulus series
VOT varied in 10 msec steps along bilabial, alveolar, velar series. Subjects asked to identify CVs as either voiced or voiceless stop.
phonetic boundary
50%, guessing bc don't know what you heard
labeling function creates an
identification curve
discrimination function shows
how listeners can distinguish between members of a stimulus series
A-B-X method
A= one speech sound
B = another speech sound
X = A or B

was X A or B?
In every ABX triad, the A and B stimulus is always ___ ___ but not necessary ___
physically different, perceptually
anything within category is
vowel stimuli yield a more ___ perception than ___ perception
continuous, categorical
discrimination data do not show
any peaks along the continuum
During the perception of vowels, listeners can detect differences between adjacent stimuli, regardless of whether the pair is
within a category or across a category
stops are produced
vowels are different than consonants bc
F1 and F2 are steady
why is categorical perception useful?
assesses the perceptual abilities of children diagnosed with various types of speech/language disorders
ID of a typically developing child
steep, sharp boundary functions
ID of Developmental Apraxia of Speech
unusual labeling of stop place categories and more gradual transitions from one category to another
Children with developmental apraxia of speech experience
acoustic confusion; internal phonological representations are fuzzy; can't recognize rhymes
the brain doesn't pay attention to (inhibits) things that
aren't phonetically important
what is a theory for DAS?
maybe their brains don't inhibit things that aren't phonetically important, and there is too much going on
(Formal tenant of Motor Theory) the objects of speech perception are the intended phonetic gestures of the speaker which are represented in the brain as
invariant motor commands
(Formal tenant of Motor Theory) speech production and speech perception are ___ linked
(Formal tenant of Motor Theory) perception of the gestures occurs in a specialized __ hemisphere of the brain
(Formal tenant of Motor Theory) you can't define a phonetic category in purely acoustic terms; invariance cannot be found at the acoustic level of speech; it has to exist somewhere; that somewhere is the
speech module
(Formal tenant of Motor Theory) the specialized human perception module is very efficient at
using the acoustic speech pattern resulting from coarticulation to recover the discrete and invariant gestures
(Formal tenant of Motor Theory) stimulus variation is a source of information about articulation that provides
important guidance to the perceptual process in determining a representation of the gesture
(Formal tenant of Motor Theory) Acoustic patterns that conform to possible articulation engage the specialized phonetic module, those that don't conform are
heard as nonspeech through auditory (not speech) areas of brain
What is the McGurk effect?
recording of "ga", audio of "ba", listener hears "da"

if close eyes and listener hears "ba"
McGurk effect conclusions
motor/articulatory info is used to establish perception even though it doesn't appear in audio signal
Duplex perception
Synthesized spectrogram of "da" is spliced. Subject hears a "da" in RE and a chirp in LE.
Duplex perception conclusion
two types of perception exist; a speech mode (phonetic) and an auditory mode (non-speech)
If shown that nonspeech stimuli can induce categorical perception in listeners (ID with steep and abrupt boundary shifts between opposing categories) then
Motor theory is weakened and speech is not as special as thought
nonspeech sounds had ___ boundary as speech continuum
nothing is purely speech-like in categorical dichotomy, but rather
based on acoustic structure of sounds per se
if animals show human-like category boundaries between phonemes then the perceptual processes must be dependent
on natural acoustic/auditory processing in the brain
Miller/Kuhl experiment with the chinchilla
does a chinchilla have a VOT boundary for +/-V distinctions that is similar to the human voicing boundary?

used operant conditions (learn when to jump based on shocks and treats)
Conclusion of the experiment with the chinchilla
phonetic boundaries are the same for humans & chinchillas, ID function boundaries are not "special" to human auditory properties, but rather to general psycho-physical attributes of the acoustic signals and auditory transduction systems in general, across species
the experiment with the chinchilla was a ___ to motor theory
innate view/nativist position of speech perception
we are "pre wired" to be especially sensitive to the sound patterns of human speech. genetically endowed.
learning view/empiricist view of speech perception
developmentally, the human auditory system is complete at birth. the auditory system of a 20 week (5 mo) fetus is similarly developed to an adult cochlea.
Heart Beat Deceleration Procedures
1. find normal heart rate
2. present first stimulus, cardiac deceleration occurs
3. recovery of heart rate back to baseline (habituation)
4. present 2nd stimulus.
If the heart rate slows down to 2nd stimulus, the interpretation is that infant perceived the difference between the sounds.
Results of Heart Beat Deceleration
Group 1 babies never habituated, didn't show statistically significant effect.

Other two groups normal. Nobody knows why
Criticisms of heart beat deceleration
1. group 1 fail
2. they weren't responding to speech
3. heart deceleration changes to heart acceleration, change-over is variable; technique is unreliable
high amplitude sucking paradigm
pressure transducer measure the air pressure inside pacifier, sends signal into computer. infants sucking rate correlates with arousal.
high amplitude sucking paradigm results
20D group - increased response, heard phonemes!
20S group - drop in responses, heard allophones

1 month old had immature phonological response (small increase to allophonic shift)
what is the flaw to high amplitude sucking paradigm?
there is another reinforcer with sucking & how do you know they aren't just tired of sucking?
why is conditioned head turning a good paradigm?
the reinforcer is independent of the dependent variable
Head turn technique
kid sits on lap, listens to sound over and over, new sound, if turn head, rewarded by puppet show (operant conditioning)
the intrusion sound is the ___ variable and the puppet show is the ___
dependent, reinforcer
Infant vowel normalization study results
by the age of 6 months, infant has already developed the neural algorithm to normalize across vocal tract size differences to keep vowel perception constant
Janet Werker: non-native contrasts study
at 6 months, babies distinguish between sounds of any language. at 8 months, they lose discrimination of other languages.

demonstrates "use it or lose it"
what are feature detectors?
groups of specialized neurons in the brain that especially sensitive to specific physical parameters of complex input sensory signals that are important to the organism
example of feature detectors
"bug detectors" in frogs - moving, curved black spots resemble flies and make ganglion cells fire

"edge detectors" in pigeons - cells detect horizontalness
Tolerance limits
sub-groups of cells (feature detectors) responded to a specific & limited range of variation. neurons having a range of stimuli they respond to is important for feature detectors for human speech; allophones
lateral inhibition
ex: horsehoe crab light detectors, sense of touch

lateral inhibition is valuable to influence perception (sharpens)
meow detectors in a cat
single cell recording at inferior colliculus of midbrain

speech sounds change in frequency over time, just like a cat's meow
every possible frequency change is coded by
feature detector definition
groupings of neurons innately (stimulated but natural) predisposed to respond to particular parameters of sensory changes over time
what is important about "bug" detectors in a frog?
what the frogs eyes tell brain/ ganglion cell recording/ movement essential for response
whats important for edge detectors in pigeons?
prescribed limits of stimulus = tolerance limits (resemble allophone)
whats important about lateral inhibition in horseshoe crab?
sharpens perception at boundaries
whats important about cats inferior colliculus
single neurons responsive to specific frequency/intensity changes "meow detectors"

resembles formant transtions
whats important about auditory fibers of VIIIth nerve of bullfrog to species
specific mating calls
tolerance limits and lateral inhibition account for
phenomenon of categorical perception (where withing-category allophonic differences are ignored & only phonemic differences detected)
stabilized retinal images
cause the image to no longer be seen, a "fatigue" type of depletion due to repeat stimulation

cogitate - cut the stake
phonetic boundary moves toward
the end of the adopting stimulus (boundaries are VARIABLE)
Selective Adaptation: cooper study
to see what happens when three speech categories are used for selective adaptation
how did they test where fatigue is happening (brain or cochlea)?
do adaptation monotically, "ta" in RE over and over, then ID boundary
what was the result of testing fatigue localization?
same new shifted boundary; so fatigue has to be central b/c LE was never adapted (fatigued)
cross-series effects
subsequent selective adaptation experiments showed that boundary shifts could occur even when the "adaptor" stimulus was from a different category relative to ID test series
cross-series interpretation
FDs becoming fatigued were voicing VOT detectors, not specifically a given VOT msec interval but a CLASS of VOTs being short or long
cross-series conclusion
adaptation is a phonetic phenomenon, not simply acoustic
Ades split formant study
tried to resolve controversy about acoustic vs. phonetic
binaural fusion
auditory pathways fuse two segments of signal into a naturally sounding speech signal
If two segments are delivered asynchronously, arriving at slightly different times, then the subject hears
two non-speech chirp-like sounds
what is the conclusion of the split formant study
adaptation occurs at both levels; a lower acoustic/auditory level + a higher phonetic level, that receives projections from lower neurons
contrast effect
decision criteria, based on context
if you know y-int. you can ____% accuracy predict category