A subproblem: getting from raw input to "meanings", getting from
"meanings" to output
Language is (roughly) compositional: minimal meaningful units
(morphemes) combine to form larger meaningful chunks
A smaller subproblem: getting from raw input to morphemes,
getting from morphemes to output
Morphemes are sometimes words (e.g., cat) and sometimes
less than words (e.g., the s in cats, the un
in untie, and the PAST (?) in cut when it's
past).
The problem of recognizing and producing morphemes
Morphemes that are words
Each of the tens of thousands of words needed has potentially an
infinite number of shapes.
We need an efficient coding scheme for words.
Words need an invariant form which is not specific to recognition
or production.
Phonology: a solution to the problem of representing words
(Monomorphemic) words share internal structure.
Each word is coded in terms of a small alphabet of abstract elements,
which are not specific to perception or production.
There is a hierarchy of several levels of elements of different
sizes: phonemes, syllables, metrical feet.
Elements at one level combine to form elements at the next level up.
Polymorphemic words
nilimwona ->
ni + li + m + on + a ->
SUBJ=1s + PAST + DIR-OBJ=3s + see + INDIC
Languages combine morphemes in a number of ways: suffixation,
prefixation, infixation, mutation, templates, reduplication,
(deletion), (metathesis).
In many cases (infixation, mutation, templates, reduplication,
metathesis, deletion), it is not possible to simply segment.
This has phonologists and morphologists to posit abstraction representations
on different tiers which combine to yield a surface representation.
Complex changes can take place in the shape of morphemes when
then combine.
The "surface" form may differ greatly from the presumed "underlying"
one.
Traditional phonology: a sequence of context-sensitive rules
to get from underlying to surface forms
A simple example: Japanese past tense
Some underlying and surface forms
ATE: tabe + ta -> tabeta
WON: kat + ta -> katta
DIED: sin + ta -> sinda
BORROWED: kas + ta -> kasita
CRIED: nak + ta -> naita
SWAM: oyog + ta -> oyoida
Rules
Voicing: t -> d / <b m n g> ___
Epenthesis: 0 -> i / <s k g> ___ <t d>
Velar deletion: <k g> -> 0 / ___ i <t d>
(in some environments only)