DON’T ASK THE ADMIRAL TO SHOW YOU HIS PINNACE

Delights and wrongs of minimal pairs

by John Higgins, Centre for English Language Teaching, University of Stirling. (retired)

E-mail: marlodge1@tiscali.co.uk
Web: http://www.wordscape.net

Lecture given in Birmingham, Tuesday 6 March, 2001, in a series marking the retirements of Tim Johns and Tony Dudley Evans from the English for Overseas Students Unit of Birmingham University.

Tim and I first met in 1971 at the final board for a vacancy in the English Department of the University of Birmingham. I don’t think the EOSU (English for Overseas Students Unit) had been christened then, but these were its birth pangs. As we all know, Birmingham had the great wisdom to appoint Tim. He has always told me that he attributes his success to the fact that I was wearing for the interview a suit in a shade of deep cinnamon red that was too advanced for the sartorial tastes of the University at that time. I disagree; I am sure that Birmingham made the right appointment purely on academic grounds.

Meanwhile I joined the British Council, and was sent to lie abroad for my country in Turkey, Thailand, Egypt and Yugoslavia. I am reminded of the couplet in the house of Master Crane Robe in Robert van Gulik’s novel, The Chinese Maze Murders, quoted from an anonymous 6th century Zen Buddhist writer:

There are but two roads that lead to the gate of eternal life:
either one bores his head in the mud like a worm,
or like a dragon flies high in the sky.

Van Gulik’s book makes it clear that boring one’s head in the mud is a much better solution than flapping around like one of these darned dragons. Not that Tim has stayed at home all the time; he seems to be well known and admired in all the corners of the world.

Our next meeting was in 1980 at a BAAL conference in Colchester. Tim was demonstrating his Jumbler program and the program that later became Textbag, and I had just started to teach myself a little BASIC on the British Council’s office mainframe. Inspired by his talk, I showed him some of my efforts, and invited him to give a paper at the British Council and to contribute to a collection of articles (Johns, 1982). Later that year we were invited to a British Council/Goethe Institute seminar in Paris, at which we got to know kindred spirits Peter Roe and Burkhard Leuschner. We both had our unexpanded ZX81s at that point, and Tim had already written S-ENDING, the program which can add an appropriate –s, -es or -ies ending to any input word, real or nonsense. I remember challenging him to do something similar with the rules for A and AN, which he did between midnight and breakfast.

In 1982 I took advantage of something that the British Council calls a “training year” to plant myself on Tim and John Sinclair. I went through old photo albums looking for pictures of us at that time, but the only one I could find is this one of Tim impressing Randolph Quirk with something on a Newbrain at an English-Speaking Union conference in London; the program was probably Two Sticks. We both loved the Newbrain and wrote quite a lot of stuff for it, not all of which has resurfaced on the PC.

ESU conference 1982 Tim Johns and Randolph Quirk

We had by then been asked by Annette Capel of Collins to do a book on CALL, but we made little progress between September and Christmas. After that I invited Tim to our home in Essex where for two weeks we spent each morning in separate rooms drafting, each early afternoon reading each other’s stuff, and the rest of the day revising and extending, with family standing by to produce coffee or whisky on demand. By mid-January we had the whole book finished apart from the programming examples. Several people asked me afterwards: “How did you get a book out of Tim? We’ve been trying for years.” My answer was: “I kidnapped him.” But Tim will be remembered not so much for his book publications as for his personal presentations, his software, his pioneering of classroom concordancing, and for the astonishing work he has done with his web site. To my mind, his Kibbitzer pages are worth a dozen books of grammar and methodology.


Belonging to an older generation of applied linguists, I was trained in the days when linguistics meant phonetics first, followed by morphology and some syntax, with a bit of semantics if there was any time left. Real linguists didn’t eat quiche. They bought themselves tickets to Indonesia or Burma, and found a hill tribe and a student who had left the tribe and got enough education to act as translator and informant. For six months the linguist would sit at the feet of the tribal chief and record creation myths and other moral tales, or make recordings of his wives naming various animals and plants and saying how to cook them. Every week there would be a quick dash back to the nearest city to buy fresh batteries for the Uher tape recorder and post a parcel of 3 inch tape reels home. Then, with the data assembled, there would be the narrow transcription of the recordings and, with the help of the informant, the furious hunt for minimal pairs with which to establish the phoneme inventory.

Those were also the days when we were supposed to do the same with English. All those linguistic textbooks by people like Charles Hockett and Robert A Hall started with a chapter in which we had to imagine ourselves as invaders from Mars, trying to make sense of the jumble of noises we heard around us. This, too, would lead us towards making a phoneme inventory. We would be told about the problem of /h/ versus /ŋ/ which never occur in the same environments so could be considered as mere allophones of each other, were they not so dissimilar. We would also learn about assimilation and neutralisation which turned distinct pairs into indistinguishable homophones, “I scream” and “ice-cream”, or “the sky” and “this guy”. It was fascinating but it was also concrete and assimilable.

My specific interest in minimal pairs has three likely origins. Firstly it may be partly due to my name; I am so often asked, jocularly or sometimes sincerely, how much I have been influenced by Shaw’s Professor Henry Higgins, that I have stopped denying that he was my grandfather.

Secondly, it may be due to an accident of geography. Working in Turkey in 1971, my wife and I lived in a village ten miles outside Istanbul called Etiler, on a cliff overlooking the Bosphorus. From the living-room window of our flat we could look out across a meadow and see and hear small liners, cargo vessels and even Russian submarines sailing past. It was one of the very few places in the world where one might have said “Look, there’s a sheep!” and expect to be misunderstood. I am afraid I couldn’t find a picture with both a ship and a sheep, but this one may show how easily they could enter the same field of vision.

But thirdly, and most important, I have a real phonetician ancestor. My great-great-grandfather, David Cargill, was sent to Tonga by the Church Missionary Society in 1834 to preach and translate. He was a graduate of Aberdeen University, the only graduate in the team, and seems to have mastered the Tongan language well enough to preach in it within a year or so. In 1835 he was sent with a colleague, William Cross, to Fiji to establish a mission, the first one in the islands. The Tongan headquarters was supported by several shipfuls of tradespeople, one of the most important of whom was a printer equipped with a press, producing bible extracts and prayer books as fast as the missionaries could give him the translations. The local legend, not supported by any first hand documentation unfortunately, says that when Cargill had finished his first Fijian document, a catechism, he told the printer that he would need Greek thetas to represent the dental fricative in Fijian. “Sorry, Mr Cargill,” the printer is supposed to have said. “Haven’t got any of them. But here’s a whole case of letter C; nobody seems to want them.” The outcome of this is that /θ/ became <C> in Fijian orthography from then on. Cargill also noted that /b/ and /mb/, /d/ and /nd/, and /g/ and /ŋg/ were never in contrast; the oral versions occurred initially and the nasalised versions occurred medially. (Fijian has CV structure, so there are no final consonants.) Since these were allophones, they did not need to be represented. Consequently a name like “CAKOBAU” is pronounced /θækmbau/. This is something for which the memory of Cargill has been blessed by Fijian schoolchildren learning to read and cursed by colonial administrators ever since (Fijian Legislative Council, 1937.) Cargill seems to have stumbled on something close to the phoneme principle, some 35 years before it was formalised and christened in Europe by the grandly named French poet and scholar A. Dufriche-Desgenettes (see Abercrombie, 1991: 24).

So, for whatever reasons, I have spent some of my career thinking about and playing with these things. In East Africa in 1970 I prepared a report on pronunciation teaching that, among other things, enumerated almost everything that could be done in a classroom with minimal pairs (Higgins, 1970). More recently, I have worked on test formats in which, using minimally distinct sentences randomly selected and printed out by computer, learners ‘test’ teachers and thereby test themselves (Higgins, 1989). I made up some material for the British Council which set learners playing with sounds in pairs. I am not sure how much real difference this has made to anyone’s listening and speaking ability, but I suspect that if there has been fun there has been learning. Minimal pair exercises which are funless and extended are unlikely to have much effect.

In all of this it occurred to me to wonder how much of a problem any particular minimal pair would set. This must have some relation to the frequency or otherwise of the constituent sounds in the language and the frequency of actual minimal pairs. In 1973 J.D.O’Connor wrote:

A language can tolerate quite a lot of homophones provided they do not get in each other’s way, that is provided they are not likely to occur in the same contexts. This may be a grammatical matter: if the homophones are different parts of speech they are not likely to turn up in the same place in a sentence … If they are the same part of speech, e.g. site sight; pear, pair they can be tolerated unless they occur in the same area of meaning and in association with a similar set of other words. Site may be ambiguous in It’s a nice site though a wider context will usually make the choice plain. … If homophones do interfere with each other the language may react by getting rid of one or by modifying one.

The main justification for all the ear-training exercises using minimal pairs is that a failure to distinguish two phonemes will overload a learner’s language with a number of new and false homophones. When you tell a German speaker that the crowd jeered David Beckham’s recent performance, will they know whether the crowd approved or disapproved? But how many extra homophones constitute an overload? O’Connor’s argument seems to suggest that language is self-repairing, and that repair mechanisms will start to apply in particular cases. For instance the word quean for a prostitute has dropped out of the language, and the Americans have replaced the innocent senses of cock and rubber with rooster and eraser. An example I like is the way that the homophone pair oral and aural survives in the language at large but has been modified by us applied linguists. After all the public has very little use for the word aural, and uses oral mainly for dentistry and promises. We language teachers, needing to talk about aural discrimination and oral production in the same paragraph, tend to use a Germanic pronunciation for aural, /aurәl/, to make sure we are not misunderstood.

But such examples do not answer the quantity question. How many homophones does the native speaker have to cope with? How many extra homophones are added when a learner confuses two phonemes? How many of each minimal pair are there? These are questions that would be extremely tedious to answer by hand, but should be solvable with a little help from a computer, so I gave some of my spare time to thinking about a solution. Obviously what I needed was a wordlist and an algorithm, and around 1992 I managed to find one and devise the other.

Selecting a word list should a principled matter. The principle I used was the simple one of availability. No-one was paying for this, so I had no administrative assistant who could have looked after correspondence if I had applied to a commercial dictionary for permission to use their source files. Therefore I went for the one dictionary which existed as a plain ASCII text, with pronunciations in a low ASCII transcription that could be processed in old-fashioned QuickBASIC, and which was available then as a free download with no restrictions other than the obvious restriction that one should not make any money by re-publishing it. This was Text 710.DAT, now re-christened Text 0154, in the Oxford Text Archive. It is the wordlist of the 1974 edition of the Oxford Advanced Learners Dictionary, revised by Roger Mitton, a rather cumbersome download of nearly 9 megabytes, three-quarters of which is blank space. Never mind. Once you have got it, never let it go, as it is a gem. Roger Mitton created the document as part of a research project into spelling correction, and released it “so that researchers who need a reasonably large computer-usable dictionary do not need to spend months, as I did, putting one together.” God Bless Him, and do buy his excellent and very readable book (Mitton 1996).

The original headword list was around 35,000 words, rather limited by today’s standards. It is worth remembering, though, that a larger dictionary is not always a better dictionary, especially for such purposes as spelling correction or the measurement of confusability. Take a word such as flong. This is not even in the Shorter OED, though it is in the full dictionary. To know its meaning, “a rubberised paper used as an intermediate stage in the printing of newspapers on rotary presses”, you will probably need to have been a printer or, like me, to have been brought up next door to one who occasionally gave us discarded sheets of flong to insulate our hen-house. The flong supplied a fascinating and time-wasting diversion to a bookish boy in the shape of news stories in mirror-writing. However, any good spelling checker will highlight the word flong and rightly so; outside the printing trade it is far more likely to be a spelling mistake for flung than a meaningful word.

Mitton extended the list by adding all the inflected forms of nouns and verbs, and about 2500 proper names including common personal names, countries, nationalities, states, major world cities and British towns. He then examined words occurring in the LOB Corpus, and added several thousand which were not in the original dictionary, bringing the total up to just over 70,000. Each entry has five fixed-length fields padded out with spaces where necessary, 23 characters for spelling, 23 for pronunciation, 23 for grammatical tags, 1 for syllable count, and 58 for verb patterns, giving a total record length of 128 characters. The pronunciation field uses a form of low ASCII phonetics related to the alphabet proposed by John Wells for the Alvey Project. (Appendix A)

When I first downloaded the dictionary in 1992, I used a form of BASIC called PDS 7, which incorporated database support called ISAM (indexed sequential access method). The first thing I did was to make a table of overall phoneme frequencies. (Appendix B) This does not, of course, show the frequencies of the sounds in consecutive speech, only in the list of available words, but that seems to me to be the right basis for looking at the potential for confusion. If you compare the frequencies in speech, the obvious difference is the ranking for /δ/, which is pretty rare in the dictionary but very common in the high frequency function words like the and that. This list is posted on my web site.

            Next I used my ISAM indexes to find all the homophones and homographs in the dictionary, and anyone who has seen my internet pages may know those. The list of homographs, in particular, is by now about as complete as could be. I am aware, by the way, of recent discussion of the nomenclature homophone, homograph, homonym and heteronym, but am very unconvinced by the argument for introducing the term allonym that Fred Riggs proposes (1999). Having flagged all the homophones, the next stage was to write a program which would substitute a dummy character for every occurrence of two phonemes in the pronunciation field. I could then re-index the pronunciation field and list all the additional homophones so created, since these would be the minimal pairs. This was not as automatic as it sounds, and a lot of editorial decisions had to be taken on the hoof. These included the following.

·        Linking –r: The dictionary includes a symbol for potential linking -r, so cheetah/heater did not show up as minimal, while cheetahs/heaters did. I wondered whether to delete this symbol, and did so in the second trawl.

·        Secondary stress:  Are super and suture minimal?

·        Two occurrences of the same distinction within one word, such as purple versus burble: Should that be counted minimal? (In practice, what we have here is a single distinctive feature, breath versus voice, extended over the whole word; the vowel and the [l] are also quite different in purple and burble, though the differences would be listed as allophonic in any description while the [p] and [b] differences are counted as phonemic.) 

·        Inflected forms of pairs like pat and pad, which will not show up in the list since in theory we add a -s inflection to pat and a -z to pad? A student of mine once investigated this phenomenon with some sound analyser software and found that the fortis inflection in pats has a lot of zedness about it, while the lenis /z/ in pads is under close analysis indistinguishable from an /s/. (Torikian, 1992).

Making the full set of 466 lists took me from 1993 to 1998, working at a rather desultory rate, at which point I was able to publish on the web, along with the lists themselves, a tentative figure for a grand total of minimal pairs, 75,615. But I had misgivings. My algorithm supplied only one minimal pair regardless of whether either word was a member of a homophone set. Thus heal/hole would appear only once, even though it represents four potential confusions: heal/hole, heal/whole, heel/hole and heel/whole. If the totals were to represent some kind of index of confusability, surely they must take account of such pairs. I went back to the drawing board and taught myself just enough about Microsoft Access to use it to retrieve a new set of lists which did include all minimal pairs including the homophones. Comparing these lists with the originals, I found that the totals in individual lists were going up by an average of over 30%. I started blending the new lists with the old, and set about annotating each list including comments on the following features:

·        The spellings (since a great deal of what passes for pronunciation problems are the direct result of our spelling system). For this I have used Edward Carney’s magisterial work (1994), as well as going back to Axel Wijk (1966) occasionally.

·        Which learners might, because of their first language, have a problem with any given distinction. For this I have consulted Swan and Smith (1987) and O’Connor (1967) as well as my own experience.

·        Any particular cases of difficulty such as sets involving homographs or problems with linking –r.

·        Problems with rude words like pinnace/penis or crept/crapped. 

·        Finally the pairs I judged “interesting”, which I shall have more to say about later. 

I also calculated two statistics. Firstly I worked out the density of minimal pairs. If every word containing one of the sounds were matched by one containing the other, that would be a density of 100%. In practice, other than the special case of two inflections in contrast, I have found no densities greater than 5% with most less than 2%.

            I also lemmatised the lists, so that where both members of the pair were the same part of speech or took the same inflections, they are grouped into one lemma. I have used the term semantic loading to describe the ratio of headwords or lemmas to all words in the list.  Each pair of lemmas represents only one contrast of meaning. However the sets that can be grouped in this way are probably more confusable than those that cannot, by O’Connor’s original conjecture. A low semantic loading, therefore, say around 50% or less, probably indicates rather a lot of same-part-of-speech pairs and could be taken as a danger sign.

I was interested in seeing how far these statistics might confirm or refute O’Connor’s conjecture that language is self-repairing. Even though minimal pairs are not homophones, the ones which are close in articulation can cause confusions even with native speakers, and are certainly the stuff of spelling mistakes, such a notice I saw recently, “RUGBY: Stirling verses Liverpool”; jokes like “double gins lead to double chins”, and punning headlines like “clothes encounters of the third kind” (over a story about a fashion show). Even some not-quite-minimal pairs give trouble. One classic examples is

And for his sheep he doth a steak.

which Quirk discusses in The Use of English. (Errors of this type are known as Mondegreen’s, from a famous mis-hearing of the Scottish Ballad:

            For they have slain the Earl of Moray, and laid him on the green.

as

            For they have slain the Earl of Moray, and Lady Mondegreen.

Some more examples:

·        The lady varnishes. (Title of a BBC program on DIY for women)

·        End of the Pierre show. (Times headline about Pierre Y. Gerbeau’s withdrawal from bidding for the millennium dome.)

·        Beck in business. (Headline over story about a David Beckham goal in an England football win.)

·        “A postcard from Devon” heard as “A coastguard from Devon”

·        A newspaper story in 1997 about a Japanese lady with a plane ticket for Turkey who asked for directions at Paddington Station and was put into a train for Torquay.

·        One of the most exasperating came recently from my 2-year-old grand-daughter who on her high chair clamoured “I want beeper din”, and had been offered virtually everything in the larder before we worked out that what she meant was “I want to be pushed in.”

If language is self-repairing, then we would expect the sounds which are highly distinct to tolerate more minimal pairs than those which are similar in place or manner of articulation. I put the density figures into two Excel tables to see what they might show. I shuffled the columns and rows, trying to bring the cells with the greatest density towards the top left. What shows up with the consonants (appendix C) is that the greatest density is between and among the plosives and affricates, and that sounds that are themselves rare tend to have low densities. Much the same is true of the vowels, apart from the special case of the schwa and /ı/ which seem to be affected by their use in unstressed syllables. (Appendix D)  What this seems to suggest is that, far from being self-repairing, language is self-sabotaging. The cheer/jeer and beer/bare pairs that bother Germans and New Zealanders are not an aberration; they are the norm. However, the figures need a bit more work before one can draw any strong conclusions about language in general.

            I still have a few problems to work on. One intriguing question is whether there can be a minimal pair involving the contrast of a sound with a null. Is bank/back a minimal pair? My new algorithm makes it easy to collect such sets, so they are now included, even though I am not sure of their status. Another question is whether one can have a minimal pair between a vowel and a consonant. Vowels and consonants have a different role in syllable structure so the syllable count and stress pattern will necessarily be different. There are interesting cases like screen/serene or sprees/cerise in which replacing part of a complex cluster with a vowel creates an apparent minimal pair. All such pairs that the algorithm finds are going to be included on the site. A related question is whether syllabic and non-syllabic forms of a consonant can be treated together. Is the contrast between nail and sail the “same” as between button and butts? The computer algorithm will put them into the same list, though our instincts as English speakers would probably lead us to reject them.

            One thing you will find noted in each list is the set of “interesting pairs”, which might prompt the question “what makes them interesting?” or more caustically “Get a life!” But I have noticed that, when I am editing a list, particular pairs tend to stand out as being pairs to share, pairs you want to read aloud to anybody who is in the room with you. What is more, on a small amount of anecdotal evidence, it seems the same pairs will strike different readers in the same way. So it is worth asking, why those? They tend to be the polysyllables. Obviously the great majority of minimal pairs are monosyllables; the chance of minimalness will reduce as the number of segments increases. But the interesting pairs also tend to be polysyllables which are not the results of inflecting the same root. In fact they are often different parts of speech. As far as meaning is concerned, they seem either to be coincidentally related in meaning like cheer/jeer, or else widely different, so that a ludicrous association is formed: Caesar’s/scissors. It seems that the same kind of delight arises from these pairs as from the outrageous rhymes of Thomas Hood, W.S.Gilbert, P.G.Wodehouse, Ogden Nash and Ira Gershwin. Something inside us makes us laugh when words sound the same but mean wildly different things. Judge for yourself from the samples in Appendix E.

            I am continuing to work on the revised and extended set of lists, and hope it will take me less than five years to finish this time. Minimal pairs have a small role in raising learners’ awareness of pronunciation problems, though I doubt whether masses of minimal pair exercises are either necessary or sufficient to solve those problems. They clearly have some relevance to linguists in sorting out the phonology of languages. But for me there is only one thing that makes them worth the time I have put into them; they are fun.

 

 

Bibliography

 

Abercrombie, David (1991). “Phoneme; the concept and the word”. In Fifty years in phonetics. Edinburgh University Press.

 

Carney, Edward (1994). A survey of English spelling. Routledge

 

Crystal, David (1995). The Cambridge Encyclopedia of the English Language. Cambridge University Press.

Fiji Legislative Council, (1937). Proposal for changes to Fijian orthography. Council Paper No. 37.

 

Fry, D.B. (1947). "The frequency of occurrence of speech sounds in Southern English." Archives Néerlandaises de Phonétique Experimentales, 20.

 

Hall, Robert A. Jr. (1964). Introductory linguistics. Chilton Books.

 

Higgins, J.J (1970). Pronunciation teaching; practical suggestions for English teachers. Issued by the English Language Panel of the Institute of Education, Dar es Salaam, Tanzania.

 

Higgins, John (1989). “I speak analogue, you hear digital”  Paper given at the Canadian CALL Conference, Guelph, 1989. Now published on the web on http://myweb.tiscali.co.uk/wordscape/wordlist/analogue.html.

 

Hockett, Charles. (1958).  A course in modern linguistics. New York, Macmillan

 

Hornby, A.S. (1974). Oxford Advanced Learner’s Dictionary of Current English. Oxford: Oxford University Press.

 

Johns, T.F. (1982). “Exploratory CAL: an alternative use of the computer in teaching foreign languages.” In Higgins, J.J. (ed), Computers and ELT: British Council Inputs, London, The British Council.

 

Mitton, Roger (1992). “A description of a computer-usable dictionary files based on the OALDCE”. Documentation lodged with the Oxford Text Archive. http://ota.ahds.ac.uk/.

 

Mitton, Roger (1996). English spelling and the computer. Longman.

 

O’Connor, J.D. (1967). Better English pronunciation. CUP.

 

O’Connor, J.D. (1973). Phonetics. Harmondsworth: Penguin.

 

Quirk, Randolph. (1962). The use of English. Longman.

 

Riggs, Fred W. (1999). “Homonyms, heteronyms, and allonyms; a semantic/onomantic puzzle.”  Web document at http://www2.hawaii.edu/~fredr/homonomy.htm.

 

Swan, Michael and Smith, Bernard. (1987). Learner English; a teacher’s guide to interference and other problems. CUP.

 

Torikian, Merwyn (1992).  “Watch your language; an account of Soundedit with reference to the validity of phonological rules.” System, 20, 4, p. 471-480.

 

Wijk, Axel (1966). Rules of pronunciation for the English language. Oxford, OUP.

References:


Appendix A

 

ASCII phonetic transcription system used by Roger Mitton (based on John Wells’s recommendations to the Alvey Committee).

 

 

Vowels

Keyword

transcribed

Consonants

Keyword

transcribed

 

i

key

ki

p

pea

pi

 

I

pit

pIt

b

bee

bi

 

e

pet

pet

t

toe

t@U

 

&

pat

p&t

d

doe

d@U

 

A

hard

hAd

k

cap

k&p

 

0

pot

p0t

g

get

get

 

O

raw

rO

f

fat

f&t

 

U

put

pUt

v

vet

vet

 

u

coo

ku

T

thin

Tin

 

V

hut

hVt

D

then

Den

 

3

cur

k3

s

sack

s&k

 

@

about/mother

@baUt/mVD@

z

zoo

zu

 

eI

bay

beI

S

ship

Sip

 

aI

buy

baI

Z

measure

meZ@

 

oI

boy

boI

h

hide

haId

 

@U

go

g@U

m

man

m&n

 

aU

cow

kaU

n

no

n@U

 

I@

peer

pI@

N

sing

sIN

 

e@

pair

pe@

l

lie

laI

 

U@

poor

pU@

r

red

red

 



j

year

jI@



w

wet

wet



tS

chin

tSIn



dZ

judge

dZVdZ

 

 


Appendix B

Frequency of RP phonemes in the Advanced Learner's Dictionary

 

Total number of dictionary entries: 70646

Figures for running words from D.B.Fry, 1947, cited in Crystal, 1995.

 

Vowels

Keyword

Total

Words

in dictionary

Freq. rank

in spoken text

Freq. rank

i

bead

6721

6525

9.24%

9

1.65%

7

I

bid

51830

37729

53.41%

1

8.33%

2

e

bed

11312

10940

15.49%

4

2.97%

3

&

bad

11603

11149

15.78%

3

1.45%

9

A

bard

4215

4141

5.86%

14

0.79%

14

0

pot

7960

7747

10.97%

6

1.37%

10

O

port

4730

4627

6.55%

12

1.24%

11

U

put

1977

1959

2.77%

17

0.86%

13

u

boot

4794

4743

6.71%

11

1.13%

12

V

bud

7124

6917

9.79%

8

1.75%

5

3

bird

3095

3083

4.36%

15

0.52%

16

@

about

31009

26813

37.95%

2

10.74%

1

eI

bait

10234

10029

14.20%

5

1.71%

6

aI

bite

7441

7236

10.24%

7

1.83%

4

oI

boy

788

784

1.11%

20

0.14%

19

aU

cow

2179

2135

3.02%

16

0.61%

15

@U

no

6685

6416

9.08%

10

1.51%

8

I@

beer

4174

4034

5.71%

13

0.21%

18

e@

bear

965

962

1.36%

19

0.34%

17

U@

poor

1053

1053

1.49%

18

0.06%

20

 

Notes:

·         Column 1 contains vowel transcription in Alvey-style ASCII phonetics.

·         Column 2 shows an illustrative keyword.

·         Column 3 shows the total number of occurrences of the sound in the dictionary and column 4 the number of words in which it occurred. The difference between these two corresponds to the number of words in which the sound occurs more than once.

·         Column 5 is column 4 as a percentage of 70646, the total of words in the dictionary.

·         Column 6 shows the frequency rank of the sound.

·         Columns 7 and 8 are frequency and rank for transcribed running speech.


 


Consonants

Keyword

Total

Words

in dictionary

Freq. rank

in spoken text

Freq. rank

p

pop

15553

14569

20.62%

9

1.78%

15

b

bib

10907

10420

14.75%

11

1.97%

13

t

teat

34260

29441

41.67%

1

6.42%

2

d

died

21275

19125

27.07%

7

5.14%

3

k

cake

22453

20308

28.75%

6

3.09%

9

g

go

6239

6079

8.60%

14

1.05%

18

tS

chin

2672

2639

3.74%

21

0.41%

22

dZ

judge

3869

3802

5.38%

18

0.60%

21

f

fine

8839

8606

12.18%

13

1.79%

14

v

vine

6007

5859

8.29%

16

2.00%

12

T

think

1602

1591

2.25%

22

0.37%

23

D

then

596

593

0.84%

23

3.56%

6

s

see

33922

28548

40.41%

2

4.81%

4

z

zoo

19972

18808

26.62%

8

2.46%

11

S

shy

6117

6039

8.55%

15

0.96%

19

Z

treasure

334

334

0.47%

24

0.10%

24

m

my

14823

13988

19.80%

10

3.22%

8

n

near

31934

27020

38.25%

3

7.58%

1

N

sing

9181

8958

12.68%

12

1.15%

17

l

low

27373

25435

36.00%

4

3.66%

5

r

raw

23069

21434

30.34%

5

3.51%

7

w

west

4600

4523

6.40%

17

2.81%

10

j

year

3560

3518

4.98%

20

0.88%

20

h

high

3699

3625

5.13%

19

1.46%

16


Notes:

·         Column 1 contains transcription in Alvey-style ASCII phonetics.

·         Column 2 shows an illustrative keyword.

·         Column 3 shows the total number of occurrences of the sound in the dictionary and column 4 the number of words in which it occurred. The difference between these two corresponds to the number of words in which the sound occurs more than once.

·         Column 5 is column 4 as a percentage of 70646, the total of words in the dictionary.

·         Column 6 shows the frequency rank of the sound.

·         Columns 7 and 8 are frequency and rank for transcribed running speech.

 

Average vowel phonemes per word: 2.55 Average consonant phonemes per word: 4.43  Average length of word in phonemes: 6.98

Balance of vowels and consonants in connected speech sample: 39.21%, 60.79%.

Notice the difference in frequencies of consonants between the dictionary list and the speech text, partly accounted for by the high frequency of the function words with /D/ such as the and that. The data for transcribed running speech are affected by the transcription used. The research was done a long time ago (1947) so it may be that a careful style of speech was recorded and a broad transcription used. Some evidence for this is the relatively high ranking for /h/, suggesting that the words he, his, her, have, has and had have always been transcribed with initial /h/. I have not seen the original research so cannot be sure.


Appendix C

Density of consonant minimal pairs

 

 

Symbol

p

b

d

k

g

f

h

m

t

l

w

tS

dZ

S

v

n

s

r

T

z

N

j

D

Z

p

 

1.7

1.2

2.3

1.32

1.78

1.2

1.4

1.5

1.01

1.2

1.28

0.8

1.05

0.78

0.9

1

0.7

0.6

0.46

0.2

0.2

0.3

0

b

1.7

 

1.4

1.5

2.12

2.16

1.6

1.6

1.1

0.96

1.3

1.22

1.3

1.13

0.79

0.7

0.9

0.9

0.5

0.27

0.2

0.5

0.3

0

d

1.2

1.4

 

1.2

0.99

1.2

0.8

1.3

1.4

1.14

0.6

0.95

0.9

0.96

1.14

1.1

1

1.1

0.6

7.01

5.8

0.2

0.3

0

g

1.3

2.1

1

1.3

 

1.33

1.3

1.2

0.9

0.66

1

1.11

1.1

1.2

0.66

0.7

0.6

0.6

0.7

0.22

0.4

0.3

0.3

0

k

2.3

1.5

1.2

 

1.29

1.6

1.1

1.2

1.5

1.03

0.8

0.92

0.6

0.81

0.67

1

1

0.6

0.5

0.55

0.3

0.2

0.2

0

f

1.8

2.2

1.2

1.6

1.33

 

1.5

1.4

1.1

0.8

1.4

1.39

1.4

0.94

0.9

0.7

1

0.7

0.5

0.27

0.1

0.4

0.4

0

h

1.2

1.6

0.8

1.1

1.29

1.51

 

1.3

0.7

0.74

2.3

1.52

1.4

1.33

0.7

0.5

0.7

0.9

0.7

0.11

0

1

0.4

0

m

1.4

1.6

1.3

1.2

1.19

1.38

1.3

 

1

1.3

0.8

1.03

1

0.89

0.94

0.9

0.9

0.7

0.4

0.48

0.3

0.3

0.4

0.1

t

1.5

1.1

1.4

1.5

0.9

1.06

0.7

1

 

1.05

0.6

0.74

0.8

0.7

0.66

0.9

2.2

0.6

0.4

0.79

0.3

0.1

0.2

0

w

1.2

1.3

0.6

0.8

1.03

1.36

2.3

0.8

0.6

0.68

 

0.85

1.1

0.99

0.5

0.5

0.5

0.8

0.7

0.07

0

0.6

0.4

0

tS

1.3

1.2

1

0.9

1.11

1.39

1.5

1

0.7

0.65

0.9

 

1.4

1.33

0.74

0.5

0.6

0.5

1

0.48

0.2

0.5

0.7

0.1

S

1.1

1.1

1

0.8

1.2

0.94

1.3

0.9

0.7

0.57

1

1.33

1.1

 

0.41

0.5

0.6

0.6

0.5

0.26

0.6

0.4

0.3

0.1

dZ

0.8

1.3

0.9

0.6

1.09

1.38

1.4

1

0.8

0.69

1.1

1.43

 

1.05

0.96

0.5

0.6

0.6

0.7

0.42

0.6

0.6

0.4

0

l

1

1

1.1

1

0.66

0.8

0.7

1.3

1.1

 

0.7

0.65

0.7

0.57

0.74

1.3

0.9

1.3

0.2

0.57

0.2

0.2

0.2

0

s

1

0.9

1

1

0.58

1

0.7

0.9

2.2

0.87

0.5

0.58

0.6

0.64

0.59

0.7

 

0.6

0.3

0.49

0.1

0.1

0.1

0

n

0.9

0.7

1.1

1

0.73

0.66

0.5

0.9

0.9

1.3

0.5

0.51

0.5

0.45

0.68

 

0.7

0.5

0.2

0.69

0.2

0.1

0.2

0

v

0.8

0.8

1.1

0.7

0.66

0.9

0.7

0.9

0.7

0.74

0.5

0.74

1

0.41

 

0.7

0.6

0.4

0.3

0.6

0.6

0.1

0.5

0

r

0.7

0.9

1.1