Lecture given in Birmingham, Tuesday 6 March, 2001, in a series marking the retirements of Tim Johns and Tony Dudley Evans from the English for Overseas Students Unit of Birmingham University.
Tim and I first met in 1971 at the final board for a vacancy in the English Department of the University of Birmingham. I don’t think the EOSU (English for Overseas Students Unit) had been christened then, but these were its birth pangs. As we all know, Birmingham had the great wisdom to appoint Tim. He has always told me that he attributes his success to the fact that I was wearing for the interview a suit in a shade of deep cinnamon red that was too advanced for the sartorial tastes of the University at that time. I disagree; I am sure that Birmingham made the right appointment purely on academic grounds.
Meanwhile I joined the British Council, and was sent to lie abroad for my country in Turkey, Thailand, Egypt and Yugoslavia. I am reminded of the couplet in the house of Master Crane Robe in Robert van Gulik’s novel, The Chinese Maze Murders, quoted from an anonymous 6th century Zen Buddhist writer:There are but two roads that lead to the gate of eternal life:
either one bores his head in the mud like a worm,
or like a dragon flies high in the sky.
Van Gulik’s book makes it clear that boring one’s head in the mud is a much better solution than flapping around like one of these darned dragons. Not that Tim has stayed at home all the time; he seems to be well known and admired in all the corners of the world.
Our next meeting was in 1980 at a BAAL conference in Colchester. Tim was demonstrating his Jumbler program and the program that later became Textbag, and I had just started to teach myself a little BASIC on the British Council’s office mainframe. Inspired by his talk, I showed him some of my efforts, and invited him to give a paper at the British Council and to contribute to a collection of articles (Johns, 1982). Later that year we were invited to a British Council/Goethe Institute seminar in Paris, at which we got to know kindred spirits Peter Roe and Burkhard Leuschner. We both had our unexpanded ZX81s at that point, and Tim had already written S-ENDING, the program which can add an appropriate –s, -es or -ies ending to any input word, real or nonsense. I remember challenging him to do something similar with the rules for A and AN, which he did between midnight and breakfast.
In 1982 I took advantage of something that the British Council calls a “training year” to plant myself on Tim and John Sinclair. I went through old photo albums looking for pictures of us at that time, but the only one I could find is this one of Tim impressing Randolph Quirk with something on a Newbrain at an English-Speaking Union conference in London; the program was probably Two Sticks. We both loved the Newbrain and wrote quite a lot of stuff for it, not all of which has resurfaced on the PC.
We had by then been asked by Annette Capel of Collins to do a book on CALL, but we made little progress between September and Christmas. After that I invited Tim to our home in Essex where for two weeks we spent each morning in separate rooms drafting, each early afternoon reading each other’s stuff, and the rest of the day revising and extending, with family standing by to produce coffee or whisky on demand. By mid-January we had the whole book finished apart from the programming examples. Several people asked me afterwards: “How did you get a book out of Tim? We’ve been trying for years.” My answer was: “I kidnapped him.” But Tim will be remembered not so much for his book publications as for his personal presentations, his software, his pioneering of classroom concordancing, and for the astonishing work he has done with his web site. To my mind, his Kibbitzer pages are worth a dozen books of grammar and methodology.
Belonging to an older generation of applied linguists, I was trained in the days when linguistics meant phonetics first, followed by morphology and some syntax, with a bit of semantics if there was any time left. Real linguists didn’t eat quiche. They bought themselves tickets to Indonesia or Burma, and found a hill tribe and a student who had left the tribe and got enough education to act as translator and informant. For six months the linguist would sit at the feet of the tribal chief and record creation myths and other moral tales, or make recordings of his wives naming various animals and plants and saying how to cook them. Every week there would be a quick dash back to the nearest city to buy fresh batteries for the Uher tape recorder and post a parcel of 3 inch tape reels home. Then, with the data assembled, there would be the narrow transcription of the recordings and, with the help of the informant, the furious hunt for minimal pairs with which to establish the phoneme inventory.
Those were also the days when we were supposed to do the same with English. All those linguistic textbooks by people like Charles Hockett and Robert A Hall started with a chapter in which we had to imagine ourselves as invaders from Mars, trying to make sense of the jumble of noises we heard around us. This, too, would lead us towards making a phoneme inventory. We would be told about the problem of /h/ versus /ŋ/ which never occur in the same environments so could be considered as mere allophones of each other, were they not so dissimilar. We would also learn about assimilation and neutralisation which turned distinct pairs into indistinguishable homophones, “I scream” and “ice-cream”, or “the sky” and “this guy”. It was fascinating but it was also concrete and assimilable.
My specific interest in minimal pairs has three likely origins. Firstly it may be partly due to my name; I am so often asked, jocularly or sometimes sincerely, how much I have been influenced by Shaw’s Professor Henry Higgins, that I have stopped denying that he was my grandfather.
Secondly, it may be due to an accident of geography. Working in Turkey in 1971, my wife and I lived in a village ten miles outside Istanbul called Etiler, on a cliff overlooking the Bosphorus. From the living-room window of our flat we could look out ac
But thirdly, and most important, I have a real phonetician ancestor. My great-great-grandfather, David Cargill, was sent to Tonga by the Church Missionary Society in 1834 to preach and translate. He was a graduate of Aberdeen University, the only graduate in the team, and seems to have mastered the Tongan language well enough to preach in it within a year or so. In 1835 he was sent with a colleague, William Cross, to Fiji to establish a mission, the first one in the islands. The Tongan headquarters was supported by several shipfuls of tradespeople, one of the most important of whom was a printer equipped with a press, producing bible extracts and prayer books as fast as the missionaries could give him the translations. The local legend, not supported by any first hand documentation unfortunately, says that when Cargill had finished his first Fijian document, a catechism, he told the printer that he would need Greek thetas to represent the dental fricative in Fijian. “Sorry, Mr Cargill,” the printer is supposed to have said. “Haven’t got any of them. But here’s a whole case of letter C; nobody seems to want them.” The outcome of this is that /θ/ became <C> in Fijian orthography from then on. Cargill also noted that /b/ and /mb/, /d/ and /nd/, and /g/ and /ŋg/ were never in contrast; the oral versions occurred initially and the nasalised versions occurred medially. (Fijian has CV structure, so there are no final consonants.) Since these were allophones, they did not need to be represented. Consequently a name like “CAKOBAU” is pronounced /θækmbau/. This is something for which the memory of Cargill has been blessed by Fijian schoolchildren learning to read and cursed by colonial administrators ever since (Fijian Legislative Council, 1937.) Cargill seems to have stumbled on something close to the phoneme principle, some 35 years before it was formalised and christened in Europe by the grandly named French poet and scholar A. Dufriche-Desgenettes (see Abercrombie, 1991: 24).
So, for whatever reasons, I have spent some of my career thinking about and playing with these things. In East Africa in 1970 I prepared a report on pronunciation teaching that, among other things, enumerated almost everything that could be done in a classroom with minimal pairs (Higgins, 1970). More recently, I have worked on test formats in which, using minimally distinct sentences randomly selected and printed out by computer, learners ‘test’ teachers and thereby test themselves (Higgins, 1989). I made up some material for the British Council which set learners playing with sounds in pairs. I am not sure how much real difference this has made to anyone’s listening and speaking ability, but I suspect that if there has been fun there has been learning. Minimal pair exercises which are funless and extended are unlikely to have much effect.
In all of this it occurred to me to wonder how much of a problem any particular minimal pair would set. This must have some relation to the frequency or otherwise of the constituent sounds in the language and the frequency of actual minimal pairs. In 1973 J.D.O’Connor wrote:
A language can tolerate quite a lot of homophones provided they do not get in each other’s way, that is provided they are not likely to occur in the same contexts. This may be a grammatical matter: if the homophones are different parts of speech they are not likely to turn up in the same place in a sentence … If they are the same part of speech, e.g. site sight; pear, pair they can be tolerated unless they occur in the same area of meaning and in association with a similar set of other words. Site may be ambiguous in It’s a nice site though a wider context will usually make the choice plain. … If homophones do interfere with each other the language may react by getting rid of one or by modifying one.
The main justification for all the ear-training exercises using minimal pairs is that a failure to distinguish two phonemes will overload a learner’s language with a number of new and false homophones. When you tell a German speaker that the crowd jeered David Beckham’s recent performance, will they know whether the crowd approved or disapproved? But how many extra homophones constitute an overload? O’Connor’s argument seems to suggest that language is self-repairing, and that repair mechanisms will start to apply in particular cases. For instance the word quean for a prostitute has dropped out of the language, and the Americans have replaced the innocent senses of cock and rubber with rooster and eraser. An example I like is the way that the homophone pair oral and aural survives in the language at large but has been modified by us applied linguists. After all the public has very little use for the word aural, and uses oral mainly for dentistry and promises. We language teachers, needing to talk about aural discrimination and oral production in the same paragraph, tend to use a Germanic pronunciation for aural, /aurәl/, to make sure we are not misunderstood.
But such examples do not answer the quantity question. How many homophones does the native speaker have to cope with? How many extra homophones are added when a learner confuses two phonemes? How many of each minimal pair are there? These are questions that would be extremely tedious to answer by hand, but should be solvable with a little help from a computer, so I gave some of my spare time to thinking about a solution. Obviously what I needed was a wordlist and an algorithm, and around 1992 I managed to find one and devise the other.
Selecting a word list should a principled matter. The principle I used was the simple one of availability. No-one was paying for this, so I had no administrative assistant who could have looked after correspondence if I had applied to a commercial dictionary for permission to use their source files. Therefore I went for the one dictionary which existed as a plain ASCII text, with pronunciations in a low ASCII transcription that could be processed in old-fashioned QuickBASIC, and which was available then as a free download with no restrictions other than the obvious restriction that one should not make any money by re-publishing it. This was Text 710.DAT, now re-christened Text 0154, in the Oxford Text Archive. It is the wordlist of the 1974 edition of the Oxford Advanced Learners Dictionary, revised by Roger Mitton, a rather cumbersome download of nearly 9 megabytes, three-quarters of which is blank space. Never mind. Once you have got it, never let it go, as it is a gem. Roger Mitton created the document as part of a research project into spelling correction, and released it “so that researchers who need a reasonably large computer-usable dictionary do not need to spend months, as I did, putting one together.” God Bless Him, and do buy his excellent and very readable book (Mitton 1996).
The original headword list was around 35,000 words, rather limited by today’s standards. It is worth remembering, though, that a larger dictionary is not always a better dictionary, especially for such purposes as spelling correction or the measurement of confusability. Take a word such as flong. This is not even in the Shorter OED, though it is in the full dictionary. To know its meaning, “a rubberised paper used as an intermediate stage in the printing of newspapers on rotary presses”, you will probably need to have been a printer or, like me, to have been brought up next door to one who occasionally gave us discarded sheets of flong to insulate our hen-house. The flong supplied a fascinating and time-wasting diversion to a bookish boy in the shape of news stories in mirror-writing. However, any good spelling checker will highlight the word flong and rightly so; outside the printing trade it is far more likely to be a spelling mistake for flung than a meaningful word.
Mitton extended the list by adding all the inflected forms of nouns and verbs, and about 2500 proper names including common personal names, countries, nationalities, states, major world cities and British towns. He then examined words occurring in the LOB Corpus, and added several thousand which were not in the original dictionary, bringing the total up to just over 70,000. Each entry has five fixed-length fields padded out with spaces where necessary, 23 characters for spelling, 23 for pronunciation, 23 for grammatical tags, 1 for syllable count, and 58 for verb patterns, giving a total record length of 128 characters. The pronunciation field uses a form of low ASCII phonetics related to the alphabet proposed by John Wells for the Alvey Project. (Appendix A)
When I first downloaded the dictionary in 1992, I used a form of BASIC called PDS 7, which incorporated database support called ISAM (indexed sequential access method). The first thing I did was to make a table of overall phoneme frequencies. (Appendix B) This does not, of course, show the frequencies of the sounds in consecutive speech, only in the list of available words, but that seems to me to be the right basis for looking at the potential for confusion. If you compare the frequencies in speech, the obvious difference is the ranking for /δ/, which is pretty rare in the dictionary but very common in the high frequency function words like the and that. This list is posted on my web site.
Next
I used my ISAM indexes to find all the homophones and homographs in the dictionary,
and anyone who has seen my internet pages may know those. The list of
homographs, in particular, is by now about as complete as could be. I am aware,
by the way, of recent discussion of the nomenclature homophone, homograph, homonym and heteronym, but am very unconvinced by the argument for introducing
the term allonym that Fred Riggs
proposes (1999). Having flagged all the homophones, the next stage was to write
a program which would substitute a dummy character for every occurrence of two
phonemes in the pronunciation field. I could then re-index the pronunciation
field and list all the additional homophones so created, since these would be
the minimal pairs. This was not as automatic as it sounds, and a lot of
editorial decisions had to be taken on the hoof. These included the following.
·
Linking –r: The dictionary includes a
symbol for potential linking -r, so cheetah/heater
did not show up as minimal, while cheetahs/heaters
did. I wondered whether to delete this symbol, and did so in the second trawl.
·
Secondary stress: Are super
and suture minimal?
·
Two occurrences of the same distinction
within one word, such as purple
versus burble: Should that be counted
minimal? (In practice, what we have here is a single distinctive feature,
breath versus voice, extended over the whole word; the vowel and the [l] are
also quite different in purple and burble, though the differences would be
listed as allophonic in any description while the [p] and [b] differences are
counted as phonemic.)
·
Inflected forms of pairs like pat and pad, which will not show up in the list since in theory we add a -s inflection to pat and a -z to pad? A student of mine once investigated
this phenomenon with some sound analyser software and found that the fortis
inflection in pats has a lot of
zedness about it, while the lenis /z/ in pads
is under close analysis indistinguishable from an /s/. (Torikian, 1992).
Making the full set of 466 lists took me from
1993 to 1998, working at a rather desultory rate, at which point I was able to
publish on the web, along with the lists themselves, a tentative figure for a
grand total of minimal pairs, 75,615. But I had misgivings. My algorithm
supplied only one minimal pair regardless of whether either word was a member
of a homophone set. Thus heal/hole would
appear only once, even though it represents four potential confusions: heal/hole, heal/whole, heel/hole and heel/whole. If the totals were to
represent some kind of index of confusability, surely they must take account of
such pairs. I went back to the drawing board and taught myself just enough
about Mic
·
The spellings (since a great deal of what
passes for pronunciation problems are the direct result of our spelling
system). For this I have used Edward Carney’s magisterial work (1994), as well
as going back to Axel Wijk (1966) occasionally.
·
Which learners might, because of their
first language, have a problem with any given distinction. For this I have
consulted Swan and Smith (1987) and O’Connor (1967) as well as my own
experience.
·
Any particular cases of difficulty such
as sets involving homographs or problems with linking –r.
·
Problems with rude words like pinnace/penis or crept/crapped.
·
Finally the pairs I judged “interesting”,
which I shall have more to say about later.
I also calculated two statistics. Firstly I
worked out the density of minimal
pairs. If every word containing one of the sounds were matched by one containing
the other, that would be a density of 100%. In practice, other than the special
case of two inflections in contrast, I have found no densities greater than 5%
with most less than 2%.
I
also lemmatised the lists, so that where both members of the pair were the same
part of speech or took the same inflections, they are grouped into one lemma. I
have used the term semantic loading
to describe the ratio of headwords or lemmas to all words in the list. Each pair of lemmas represents only one
contrast of meaning. However the sets that can be grouped in this way are
probably more confusable than those that cannot, by O’Connor’s original
conjecture. A low semantic loading, therefore, say around 50% or less, probably
indicates rather a lot of same-part-of-speech pairs and could be taken as a
danger sign.
And
for his sheep he doth a steak.
which Quirk discusses in The Use of English. (Errors of this type are known as Mondegreen’s,
from a famous mis-hearing of the Scottish Ballad:
For they have slain the Earl of Moray, and
laid him on the green.
as
For they have slain the Earl of Moray, and
Lady Mondegreen.
Some more examples:
·
The lady varnishes. (Title of a BBC
program on DIY for women)
·
End of the Pierre show. (Times headline
about Pierre Y. Gerbeau’s withdrawal from bidding for the millennium dome.)
·
Beck in business. (Headline over story
about a David Beckham goal in an England football win.)
·
“A postcard from Devon” heard as “A
coastguard from Devon”
·
A newspaper story in 1997 about a
Japanese lady with a plane ticket for Turkey who asked for directions at
Paddington Station and was put into a train for Torquay.
·
One of the most exasperating came
recently from my 2-year-old grand-daughter who on her high chair clamoured “I
want beeper din”, and had been offered virtually everything in the larder
before we worked out that what she meant was “I want to be pushed in.”
If language is self-repairing, then we would
expect the sounds which are highly distinct to tolerate more minimal pairs than
those which are similar in place or manner of articulation. I put the density
figures into two Excel tables to see what they might show. I shuffled the
columns and rows, trying to bring the cells with the greatest density towards
the top left. What shows up with the consonants (appendix C) is that the
greatest density is between and among the plosives and affricates, and that
sounds that are themselves rare tend to have low densities. Much the same is
true of the vowels, apart from the special case of the schwa and /ı/ which
seem to be affected by their use in unstressed syllables. (Appendix D) What this seems to suggest is that, far from
being self-repairing, language is self-sabotaging. The cheer/jeer and beer/bare
pairs that bother Germans and New Zealanders are not an aberration; they are
the norm. However, the figures need a bit more work before one can draw any
strong conclusions about language in general.
I
still have a few problems to work on. One intriguing question is whether there
can be a minimal pair involving the contrast of a sound with a null. Is bank/back a minimal pair? My new
algorithm makes it easy to collect such sets, so they are now included, even
though I am not sure of their status. Another question is whether one can have
a minimal pair between a vowel and a consonant. Vowels and consonants have a
different role in syllable structure so the syllable count and stress pattern
will necessarily be different. There are interesting cases like screen/serene or sprees/cerise in which replacing part of a complex cluster with a
vowel creates an apparent minimal pair. All such pairs that the algorithm finds
are going to be included on the site. A related question is whether syllabic
and non-syllabic forms of a consonant can be treated together. Is the contrast
between nail and sail the “same” as between button
and butts? The computer algorithm
will put them into the same list, though our instincts as English speakers
would probably lead us to reject them.
One
thing you will find noted in each list is the set of “interesting pairs”, which
might prompt the question “what makes them interesting?” or more caustically
“Get a life!” But I have noticed that, when I am editing a list, particular
pairs tend to stand out as being pairs to share, pairs you want to read aloud
to anybody who is in the room with you. What is more, on a small amount of
anecdotal evidence, it seems the same pairs will strike different readers in
the same way. So it is worth asking, why those? They tend to be the
polysyllables. Obviously the great majority of minimal pairs are monosyllables;
the chance of minimalness will reduce as the number of segments increases. But
the interesting pairs also tend to be polysyllables which are not the results
of inflecting the same root. In fact they are often different parts of speech.
As far as meaning is concerned, they seem either to be coincidentally related
in meaning like cheer/jeer, or else
widely different, so that a ludicrous association is formed: Caesar’s/scissors. It seems that the
same kind of delight arises from these pairs as from the outrageous rhymes of
Thomas Hood, W.S.Gilbert, P.G.Wodehouse, Ogden Nash and Ira Gershwin. Something
inside us makes us laugh when words sound the same but mean wildly different
things. Judge for yourself from the samples in Appendix E.
I
am continuing to work on the revised and extended set of lists, and hope it
will take me less than five years to finish this time. Minimal pairs have a
small role in raising learners’ awareness of pronunciation problems, though I
doubt whether masses of minimal pair exercises are either necessary or
sufficient to solve those problems. They clearly have some relevance to
linguists in sorting out the phonology of languages. But for me there is only
one thing that makes them worth the time I have put into them; they are fun.
Abercrombie, David (1991). “Phoneme; the concept
and the word”. In Fifty years in
phonetics. Edinburgh University Press.
Carney, Edward (1994). A survey of English spelling. Routledge
Crystal, David (1995). The Cambridge Encyclopedia of the English Language. Cambridge
University Press.
Fiji Legislative Council, (1937). Proposal for changes to Fijian orthography. Council
Paper No. 37.
Fry, D.B. (1947). "The frequency of
occurrence of speech sounds in Southern English." Archives Néerlandaises de Phonétique Experimentales, 20.
Hall, Robert A. Jr. (1964). Introductory linguistics. Chilton Books.
Higgins, J.J (1970). Pronunciation teaching; practical suggestions for English teachers. Issued
by the English Language Panel of the Institute of Education, Dar es Salaam,
Tanzania.
Higgins, John (1989). “I speak analogue, you
hear digital” Paper given at the
Canadian CALL Conference, Guelph, 1989. Now published on the web on http://myweb.tiscali.co.uk/wordscape/wordlist/analogue.html.
Hockett, Charles. (1958). A
course in modern linguistics. New York, Macmillan
Hornby, A.S. (1974). Oxford Advanced Learner’s Dictionary of Current English. Oxford:
Oxford University Press.
Johns, T.F. (1982). “Exploratory CAL: an
alternative use of the computer in teaching foreign languages.” In Higgins,
J.J. (ed), Computers and ELT: British
Council Inputs, London, The British Council.
Mitton, Roger (1992). “A description of a
computer-usable dictionary files based on the OALDCE”. Documentation lodged
with the Oxford Text Archive. http://ota.ahds.ac.uk/.
Mitton, Roger (1996). English spelling and the computer. Longman.
O’Connor, J.D. (1967). Better English pronunciation. CUP.
O’Connor, J.D. (1973). Phonetics. Harmondsworth: Penguin.
Quirk, Randolph. (1962). The use of English. Longman.
Riggs, Fred W. (1999). “Homonyms, heteronyms,
and allonyms; a semantic/onomantic puzzle.”
Web document at http://www2.hawaii.edu/~fredr/homonomy.htm.
Swan, Michael and Smith, Bernard. (1987). Learner English; a teacher’s guide to
interference and other problems. CUP.
Torikian, Merwyn (1992). “Watch your language; an account of Soundedit with reference to the validity
of phonological rules.” System, 20,
4, p. 471-480.
Wijk, Axel (1966). Rules of pronunciation for the English language. Oxford, OUP.
References:
Appendix A
ASCII phonetic transcription system used by
Roger Mitton (based on John Wells’s recommendations to the Alvey Committee).
|
Vowels |
Keyword |
transcribed |
Consonants |
Keyword |
transcribed |
|
|
i |
key |
ki |
p |
pea |
pi |
|
|
I |
pit |
pIt |
b |
bee |
bi |
|
|
e |
pet |
pet |
t |
toe |
t@U |
|
|
& |
pat |
p&t |
d |
doe |
d@U |
|
|
A |
hard |
hAd |
k |
cap |
k&p |
|
|
0 |
pot |
p0t |
g |
get |
get |
|
|
O |
raw |
rO |
f |
fat |
f&t |
|
|
U |
put |
pUt |
v |
vet |
vet |
|
|
u |
coo |
ku |
T |
thin |
Tin |
|
|
V |
hut |
hVt |
D |
then |
Den |
|
|
3 |
cur |
k3 |
s |
sack |
s&k |
|
|
@ |
about/mother |
@baUt/mVD@ |
z |
zoo |
zu |
|
|
eI |
bay |
beI |
S |
ship |
Sip |
|
|
aI |
buy |
baI |
Z |
measure |
meZ@ |
|
|
oI |
boy |
boI |
h |
hide |
haId |
|
|
@U |
go |
g@U |
m |
man |
m&n |
|
|
aU |
cow |
kaU |
n |
no |
n@U |
|
|
I@ |
peer |
pI@ |
N |
sing |
sIN |
|
|
e@ |
pair |
pe@ |
l |
lie |
laI |
|
|
U@ |
poor |
pU@ |
r |
red |
red |
|
|
|
j |
year |
jI@ |
|||
|
|
w |
wet |
wet |
|||
|
|
tS |
chin |
tSIn |
|||
|
|
dZ |
judge |
dZVdZ |
|||
Appendix B
Frequency
of RP phonemes in the Advanced Learner's Dictionary
Total number of
dictionary entries: 70646
Figures for
running words from D.B.Fry, 1947, cited in Crystal, 1995.
|
Vowels |
Keyword |
Total |
Words |
in
dictionary |
Freq.
rank |
in
spoken text |
Freq.
rank |
|
i |
bead |
6721 |
6525 |
9.24% |
9 |
1.65% |
7 |
|
I |
bid |
51830 |
37729 |
53.41% |
1 |
8.33% |
2 |
|
e |
bed |
11312 |
10940 |
15.49% |
4 |
2.97% |
3 |
|
& |
bad |
11603 |
11149 |
15.78% |
3 |
1.45% |
9 |
|
A |
bard |
4215 |
4141 |
5.86% |
14 |
0.79% |
14 |
|
0 |
pot |
7960 |
7747 |
10.97% |
6 |
1.37% |
10 |
|
O |
port |
4730 |
4627 |
6.55% |
12 |
1.24% |
11 |
|
U |
put |
1977 |
1959 |
2.77% |
17 |
0.86% |
13 |
|
u |
boot |
4794 |
4743 |
6.71% |
11 |
1.13% |
12 |
|
V |
bud |
7124 |
6917 |
9.79% |
8 |
1.75% |
5 |
|
3 |
bird |
3095 |
3083 |
4.36% |
15 |
0.52% |
16 |
|
@ |
about |
31009 |
26813 |
37.95% |
2 |
10.74% |
1 |
|
eI |
bait |
10234 |
10029 |
14.20% |
5 |
1.71% |
6 |
|
aI |
bite |
7441 |
7236 |
10.24% |
7 |
1.83% |
4 |
|
oI |
boy |
788 |
784 |
1.11% |
20 |
0.14% |
19 |
|
aU |
cow |
2179 |
2135 |
3.02% |
16 |
0.61% |
15 |
|
@U |
no |
6685 |
6416 |
9.08% |
10 |
1.51% |
8 |
|
I@ |
beer |
4174 |
4034 |
5.71% |
13 |
0.21% |
18 |
|
e@ |
bear |
965 |
962 |
1.36% |
19 |
0.34% |
17 |
|
U@ |
poor |
1053 |
1053 |
1.49% |
18 |
0.06% |
20 |
Notes:
·
Column
1 contains vowel transcription in Alvey-style ASCII phonetics.
·
Column
2 shows an illustrative keyword.
·
Column
3 shows the total number of occurrences of the sound in the dictionary and
column 4 the number of words in which it occurred. The difference between these
two corresponds to the number of words in which the sound occurs more than
once.
·
Column
5 is column 4 as a percentage of 70646, the total of words in the dictionary.
·
Column
6 shows the frequency rank of the sound.
·
Columns
7 and 8 are frequency and rank for transcribed running speech.
Consonants |
Keyword |
Total |
Words |
in
dictionary |
Freq.
rank |
in
spoken text |
Freq.
rank |
|
p |
pop |
15553 |
14569 |
20.62% |
9 |
1.78% |
15 |
|
b |
bib |
10907 |
10420 |
14.75% |
11 |
1.97% |
13 |
|
t |
teat |
34260 |
29441 |
41.67% |
1 |
6.42% |
2 |
|
d |
died |
21275 |
19125 |
27.07% |
7 |
5.14% |
3 |
|
k |
cake |
22453 |
20308 |
28.75% |
6 |
3.09% |
9 |
|
g |
go |
6239 |
6079 |
8.60% |
14 |
1.05% |
18 |
|
tS |
chin |
2672 |
2639 |
3.74% |
21 |
0.41% |
22 |
|
dZ |
judge |
3869 |
3802 |
5.38% |
18 |
0.60% |
21 |
|
f |
fine |
8839 |
8606 |
12.18% |
13 |
1.79% |
14 |
|
v |
vine |
6007 |
5859 |
8.29% |
16 |
2.00% |
12 |
|
T |
think |
1602 |
1591 |
2.25% |
22 |
0.37% |
23 |
|
D |
then |
596 |
593 |
0.84% |
23 |
3.56% |
6 |
|
s |
see |
33922 |
28548 |
40.41% |
2 |
4.81% |
4 |
|
z |
zoo |
19972 |
18808 |
26.62% |
8 |
2.46% |
11 |
|
S |
shy |
6117 |
6039 |
8.55% |
15 |
0.96% |
19 |
|
Z |
treasure |
334 |
334 |
0.47% |
24 |
0.10% |
24 |
|
m |
my |
14823 |
13988 |
19.80% |
10 |
3.22% |
8 |
|
n |
near |
31934 |
27020 |
38.25% |
3 |
7.58% |
1 |
|
N |
sing |
9181 |
8958 |
12.68% |
12 |
1.15% |
17 |
|
l |
low |
27373 |
25435 |
36.00% |
4 |
3.66% |
5 |
|
r |
raw |
23069 |
21434 |
30.34% |
5 |
3.51% |
7 |
|
w |
west |
4600 |
4523 |
6.40% |
17 |
2.81% |
10 |
|
j |
year |
3560 |
3518 |
4.98% |
20 |
0.88% |
20 |
|
h |
high |
3699 |
3625 |
5.13% |
19 |
1.46% |
16 |
Notes:
·
Column
1 contains transcription in Alvey-style ASCII phonetics.
·
Column
2 shows an illustrative keyword.
·
Column
3 shows the total number of occurrences of the sound in the dictionary and column
4 the number of words in which it occurred. The difference between these two
corresponds to the number of words in which the sound occurs more than once.
·
Column
5 is column 4 as a percentage of 70646, the total of words in the dictionary.
·
Column
6 shows the frequency rank of the sound.
·
Columns
7 and 8 are frequency and rank for transcribed running speech.
Average vowel
phonemes per word: 2.55 Average consonant phonemes per word: 4.43 Average length of word in phonemes: 6.98
Balance of
vowels and consonants in connected speech sample: 39.21%, 60.79%.
Notice the
difference in frequencies of consonants between the dictionary list and the
speech text, partly accounted for by the high frequency of the function words
with /D/ such as the and that. The data
for transcribed running speech are affected by the transcription used. The
research was done a long time ago (1947) so it may be that a careful style of
speech was recorded and a broad transcription used. Some evidence for this is
the relatively high ranking for /h/, suggesting that the words he, his, her, have, has and had have always been transcribed with
initial /h/. I have not seen the original research so cannot be sure.
Appendix C
Density of
consonant minimal pairs
|
Symbol |
p |
b |
d |
k |
g |
f |
h |
m |
t |
l |
w |
tS |
dZ |
S |
v |
n |
s |
r |
T |
z |
N |
j |
D |
Z |
|
p |
|
1.7 |
1.2 |
2.3 |
1.32 |
1.78 |
1.2 |
1.4 |
1.5 |
1.01 |
1.2 |
1.28 |
0.8 |
1.05 |
0.78 |
0.9 |
1 |
0.7 |
0.6 |
0.46 |
0.2 |
0.2 |
0.3 |
0 |
|
b |
1.7 |
|
1.4 |
1.5 |
2.12 |
2.16 |
1.6 |
1.6 |
1.1 |
0.96 |
1.3 |
1.22 |
1.3 |
1.13 |
0.79 |
0.7 |
0.9 |
0.9 |
0.5 |
0.27 |
0.2 |
0.5 |
0.3 |
0 |
|
d |
1.2 |
1.4 |
|
1.2 |
0.99 |
1.2 |
0.8 |
1.3 |
1.4 |
1.14 |
0.6 |
0.95 |
0.9 |
0.96 |
1.14 |
1.1 |
1 |
1.1 |
0.6 |
7.01 |
5.8 |
0.2 |
0.3 |
0 |
|
g |
1.3 |
2.1 |
1 |
1.3 |
|
1.33 |
1.3 |
1.2 |
0.9 |
0.66 |
1 |
1.11 |
1.1 |
1.2 |
0.66 |
0.7 |
0.6 |
0.6 |
0.7 |
0.22 |
0.4 |
0.3 |
0.3 |
0 |
|
k |
2.3 |
1.5 |
1.2 |
|
1.29 |
1.6 |
1.1 |
1.2 |
1.5 |
1.03 |
0.8 |
0.92 |
0.6 |
0.81 |
0.67 |
1 |
1 |
0.6 |
0.5 |
0.55 |
0.3 |
0.2 |
0.2 |
0 |
|
f |
1.8 |
2.2 |
1.2 |
1.6 |
1.33 |
|
1.5 |
1.4 |
1.1 |
0.8 |
1.4 |
1.39 |
1.4 |
0.94 |
0.9 |
0.7 |
1 |
0.7 |
0.5 |
0.27 |
0.1 |
0.4 |
0.4 |
0 |
|
h |
1.2 |
1.6 |
0.8 |
1.1 |
1.29 |
1.51 |
|
1.3 |
0.7 |
0.74 |
2.3 |
1.52 |
1.4 |
1.33 |
0.7 |
0.5 |
0.7 |
0.9 |
0.7 |
0.11 |
0 |
1 |
0.4 |
0 |
|
m |
1.4 |
1.6 |
1.3 |
1.2 |
1.19 |
1.38 |
1.3 |
|
1 |
1.3 |
0.8 |
1.03 |
1 |
0.89 |
0.94 |
0.9 |
0.9 |
0.7 |
0.4 |
0.48 |
0.3 |
0.3 |
0.4 |
0.1 |
|
t |
1.5 |
1.1 |
1.4 |
1.5 |
0.9 |
1.06 |
0.7 |
1 |
|
1.05 |
0.6 |
0.74 |
0.8 |
0.7 |
0.66 |
0.9 |
2.2 |
0.6 |
0.4 |
0.79 |
0.3 |
0.1 |
0.2 |
0 |
|
w |
1.2 |
1.3 |
0.6 |
0.8 |
1.03 |
1.36 |
2.3 |
0.8 |
0.6 |
0.68 |
|
0.85 |
1.1 |
0.99 |
0.5 |
0.5 |
0.5 |
0.8 |
0.7 |
0.07 |
0 |
0.6 |
0.4 |
0 |
|
tS |
1.3 |
1.2 |
1 |
0.9 |
1.11 |
1.39 |
1.5 |
1 |
0.7 |
0.65 |
0.9 |
|
1.4 |
1.33 |
0.74 |
0.5 |
0.6 |
0.5 |
1 |
0.48 |
0.2 |
0.5 |
0.7 |
0.1 |
|
S |
1.1 |
1.1 |
1 |
0.8 |
1.2 |
0.94 |
1.3 |
0.9 |
0.7 |
0.57 |
1 |
1.33 |
1.1 |
|
0.41 |
0.5 |
0.6 |
0.6 |
0.5 |
0.26 |
0.6 |
0.4 |
0.3 |
0.1 |
|
dZ |
0.8 |
1.3 |
0.9 |
0.6 |
1.09 |
1.38 |
1.4 |
1 |
0.8 |
0.69 |
1.1 |
1.43 |
|
1.05 |
0.96 |
0.5 |
0.6 |
0.6 |
0.7 |
0.42 |
0.6 |
0.6 |
0.4 |
0 |
|
l |
1 |
1 |
1.1 |
1 |
0.66 |
0.8 |
0.7 |
1.3 |
1.1 |
|
0.7 |
0.65 |
0.7 |
0.57 |
0.74 |
1.3 |
0.9 |
1.3 |
0.2 |
0.57 |
0.2 |
0.2 |
0.2 |
0 |
|
s |
1 |
0.9 |
1 |
1 |
0.58 |
1 |
0.7 |
0.9 |
2.2 |
0.87 |
0.5 |
0.58 |
0.6 |
0.64 |
0.59 |
0.7 |
|
0.6 |
0.3 |
0.49 |
0.1 |
0.1 |
0.1 |
0 |
|
n |
0.9 |
0.7 |
1.1 |
1 |
0.73 |
0.66 |
0.5 |
0.9 |
0.9 |
1.3 |
0.5 |
0.51 |
0.5 |
0.45 |
0.68 |
|
0.7 |
0.5 |
0.2 |
0.69 |
0.2 |
0.1 |
0.2 |
0 |
|
v |
0.8 |
0.8 |
1.1 |
0.7 |
0.66 |
0.9 |
0.7 |
0.9 |
0.7 |
0.74 |
0.5 |
0.74 |
1 |
0.41 |
|
0.7 |
0.6 |
0.4 |
0.3 |
0.6 |
0.6 |
0.1 |
0.5 |
0 |
|
r |
0.7 |
0.9 |
1.1 |