Minimal pairs for English RP: lists by John Higgins

A small donation will help me keep these pages updated and available.

Vowels and diphthongs

i ɪ e æ ɑ ɒ ɔ ʊ u ʌ 3 ə ɔɪ əʊ ɪə ʊə null cons
i * 466 331 391 312 361 476 77 370 300 298 66 549 525 98 518 154 125 139 40 172 64
ɪ 4 * 449 639 227 438 327 64 235 492 194 365 368 296 62 380 98 24 29 9 1348 978
e 4 5 * 305 148 249 238 50 134 250 153 36 281 241 59 239 118 33 30 11

æ 2 3 5 * 184 438 202 58 172 436 173 11 284 275 33 269 118 24 33 9 93
ɑ 3 2 3 4 * 184 225 39 92 177 156 11 209 146 48 201 64 62 73 33 61
ɒ 2 3 3 4 4 * 174 72 150 323 161 3 231 190 24 231 100 27 19 8 46
ɔ 2 1 1 2 4 4 * 66 186 193 237 21 322 272 93 287 131 131 168 40 88
ʊ 1 3 3 3 2 5 4 * 18 20 46 1 66 50 3 29 14 6 8 3

u 2 1 1 1 2 3 4 4 * 134 85 15 280 260 50 275 118 53 66 19

ʌ 2 4 4 5 4 4 3 4 2 * 134 4 234 180 30 205 92 19 24 8

3 4 3 3 4 5 3 4 3 3 4 * 8 214 175 35 179 75 45 54 14 20
ə 3 5 4 5 3 4 2 4 2 5 5 * 90 22 4 67 3 1 8 4

4 4 5 4 3 1 1 1 1 3 4 4 * 405 108 417 187 77 82 22

3 2 3 4 4 3 3 2 2 4 4 4 4 * 59 341 192 81 96 19

ɔɪ 3 1 1 2 3 4 5 4 4 3 3 3 3 4 * 92 39 29 18 11 17
əʊ 2 1 2 2 3 4 4 4 3 3 4 4 3 3 3 * 134 77 96 20 147 17
1 1 2 3 4 4 4 3 3 4 3 3 2 4 3 5 * 41 33 12 41
ɪə 4 5 4 3 2 1 1 1 1 2 3 3 4 2 1 2 1 * 100 27 53
3 4 5 4 3 2 1 1 1 3 4 4 5 3 1 3 2 5 * 26 53
ʊə 1 1 1 1 3 4 4 5 5 3 3 2 2 1 4 3 3 2 2 * 4

i ɪ e æ ɑ ɒ ɔ ʊ u ʌ 3 ə ɔɪ əʊ ɪə ʊə null cons
In the table of vowels each cell links to a list of minimal pairs involving the phonemes in the relevant column and row. The numbers in north-eastern half of the table are the actual numbers of pairs identified. The numbers in the south-western half give an indication of the importance or difficulty of the pair calculated as follows: from a maximum of 6, deduct 1 for difference between vowel and diphthong, 1 for a difference of length within monophthongs, 1 for difference of direction within diphthongs, 1 for a difference in lip-rounding, and then for the distance apart of the starting tongue position deduct 1 for a distance of up to one cardinal vowel, 2 for up to two cardinal vowels, 3 for any wider distance. Thus a score of 4 or 5 would show two very similar sounds, a contrast likely to be a cause of difficulty for some or all learners, while a score of 1 or 2 would be unlikely to cause problems.

Consonants
    p  b t d k g f v Ɵ ð s z ʃ ʒ h m n ŋ l r j w ʧ ʤ null vowel
p * 612 882 524 1009 401 570 227 129 73 614 222 296 3 378 640 563 84 684 374 87 433 296 197 916 139
b 5 * 518 446 577 415 525 144 72 46 453 87 240 2 337 476 321 38 418 387 96 284 226 213 995
t 5 4  *  867 822 396 469 298 128 78 1352 446 276 9 274 559 687 140 738 367 89 278 271 274

d 4 5 5  *  590 275 402 303 158 68 548 2941 257 9 241 484 542 1620 585 501 50 197 222 211

k 4 3 4 3 * 444 533 231 121 48 573 262 230 4 331 558 598 120 617 267 67 253 236 183

g 3 4 4 4 5  *  244 91 73 20 242 58 155 1 149 274 265 65 237 196 37 150 111 115

f 4 3 2 1 2 1 * 153 61 46 432 76 147 2 228 381 290 30 345 259 70 237 176 194

v 3 4 1 3 1 2 5 * 30 30 238 164 50 2 71 213 259 81 272 141 13 65 75 106

Ɵ 3 2 4 3 3 2 5 4 * 11 96 73 46 2 39 70 90 10 78 46 12 45 46 41 208
ð 2 3 3 4 1 3 4 5 5 * 36 38 13 2 20 75 54 7 52 17 5 27 22 15 67
s 3 2 4 3 2 1 4 3 5 4 * 281 219 11 305 453 496 51 544 418 66 267 219 224

z 2 3 3 4 1 2 3 4 3 5 5 * 67 11 26 186 396 1143 322 59 11 24 124 106

ʃ 2 1 4 2 2 1 4 3 5 3 5 4 * 9 150 219 162 86 212 234 41 136 131 125

ʒ 1 2 3 4 2 3 3 4 4 5 4 5 5 * 4 10 7 none 6 none none 1 3 2 25 5
h 2 1 2 1 4 2 4 3 4 3 3 2 4 3 * 281 186 none 280 277 84 257 111 112 172
m 3 4 1 2 1 2 3 4 2 3 1 2 1 2 1 * 425 61 590 322 60 204 214 192

n 2 2 3 4 1 2 2 3 3 4 3 4 3 3 1 4 * 99 823 286 53 199 182 173

ŋ 1 2 2 2 3 4 1 2 2 3 1 2 1 3 2 4 5 * 60 4 none none 23 78 543
l 2 2 3 4 1 2 2 3 3 4 3 4 3 4 1 3 4 3 * 657 92 257 215 226
17
r 2 2 2 4 1 3 2 3 3 4 3 4 3 4 1 3 4 3 5 * 79 290 147 171 937
j 1 2 2 3 2 4 2 3 2 3 3 3 2 4 3 2 4 3 3 3 * 65 40 54 105
w 3 4 1 2 1 2 3 5 2 2 1 2 1 2 2 4 2 2 3 3 4 * 91 121 428
ʧ 3 2 5 3 3 2 3 2 4 3 4 3 5 4 2 1 3 2 3 2 2 1 * 106 217
ʤ 2 3 4 4 2 3 2 3 3 4 3 4 4 5 1 2 4 3 4 3 3 2 5 * 215

p b t d k g f v Ɵ ð s z ʃ ʒ h m n ŋ l r j w ʧ ʤ null vowel

In the table of consonants each cell links to a list of minimal pairs involving the phonemes in the relevant column and row. The numbers in north-eastern half of the table are the actual numbers of pairs identified. The numbers in the south-western half give an indication of the importance or difficulty of the pair calculated as follows: from a maximum of 6, deduct 1 for difference of voicing, 1 or 2 for a difference of manner of articulation, 1 or 2 for the distance apart of the contact point. Thus a score of 4 or 5 would show two very similar sounds, a contrast likely to be a cause of difficulty for some or all learners, while a score of 1 or 2 would be unlikely to cause problems.

Click here for the phonetic transcription key.


What are minimal pairs?

Minimal pairs are pairs of words whose pronunciation differs at only one segment, such as sheep and ship or lice and rice. They are often used in listening tests and pronunciation exercises. Theoretically it is the existence of minimal pairs which enables linguists to build up the phoneme inventory for a language or dialect, though the process is not without difficulty.

Each cell in the tables above is a link to a list of minimal pairs derived from a dictionary. Use the tables of vowels and consonants to retrieve the relevant lists. All the vowel and consonant lists have now been edited and commented on. Earlier versions of the lists included only one pair for each pronunciation, such as heal/hole. Newly revised versions have been added which include all the pairs which arise when one or both members of the pair have a homophone, so giving a better indication of how much confusion a given pair may cause. In the case of heal/hole, for instance, the new version of the list would include all of the following:

Please note that, as you move the mouse over a link, the name of the relevant document should appear at the bottom of the browser window and this gives a further indication of which sound contrast is featured in the list.

Source of the lists: Roger Mitton and The Advanced Learners' Dictionary

Hal Gleason (1955, p. 19), writing about minimal pairs before the era of widespread computing, said "Presumably by diligent search through the total vocabulary, minimal pairs might be found for all English consonant phonemes. But there is no guarantee that all will be found, and in any case it is hardly a feasible procedure."

I have not tried to search the total vocabulary, but I have tried to search a vocabulary which includes most of the words available in non-specialist contexts to everyday users of English. In putting together these lists I have used Roger Mitton's machine-readable version of the 1974 edition of the Advanced Learners Dictionary, incorporating Mitton's 1990 additions to the word list (see Mitton 1996). The minimal pair lists below have been prepared from the dictionary by means of a program which sorts the pronunciation field, identifies identical pairs (homophones), substitutes dummy characters for the symbols of the minimal pair, and then flags all the additional homophone pairs created by the process. This generates (fairly) complete lists of minimal pairs, though a certain amount of rather tedious post-editing is needed.

Geography and interference

I have added to the lists some notes on which nationalities would potentially have problems with each contrast. For this I have used Swan and Smith's invaluable Learner English, as well as drawing on my own experience of teaching in the Far East, East and North Africa and Europe. I have added tracking code to the pages, and will gradually incorporate information about which countries have figured most prominently on the visitor statistics to see if significant patterns emerge.

Semantic loading and density

When this project (collecting and editing minimal pair lists for all the 510 theoretically possible contrasts) is complete, I hope to be in a position to measure the functional load of a pronunciation error, ie how much potential for confusion is created by a particular vowel or consonant error and therefore how important it is. Naturally this is not just a matter of counting the number of pairs, but also depends on other factors. One of these is the part of speech of the words and therefore their potential for appearing in the same contexts. Two nouns, such as beer and pier, are much more confusable than a noun and a preposition, such as frog and from. For this reason the edited lists draw a distinction between the number of pairs and number of semantic contrasts realised by the pairs, and calculate a "semantic loading" figure. Thus if there were 100 pairs but they belonged to only 70 different pairs of headwords, the semantic loading would be 70%. For the longer lists the semantic loading tends to fall within the range 48% to 60%, but the very short lists involving rare sounds are often higher. Paradoxically, the lower the semantic loading, the more confusable pairs may exist for that contrast, since a smaller number shows there are many inflected forms in the list and signals a large number of words in the open classes: noun, verb or adjective. To some extent the figure is arbitrarily dependent on editorial decisions. I have, for instance, treated agent nouns as separate headwords from their verb roots, since there is often a large shift of meaning, as in wait/waiter.

It is also important to take into account the density of the minimal pair, namely how the actual total relates to the theoretically possible number if every word containing one of the sounds were matched by a word containing the other. This would show how the distribution of minimal pairs relates to the overall phoneme frequencies in the same dictionary. A 100% match could only occur if there were exactly the same number of words with each sound in the language, and that is clearly unlikely. But, if the number is unequal, the density depends on which sound you start with. There are 37,729 words in the dictionary containing the vowel /ɪ/ and only 784 containing the diphthong /ɔɪ/. There are 62 minimal pairs. For the diphthong this is a density of 7.9%, but for the monophthong the density is only 0.16%. For an average of diphthong plus monophthong it is 0.32% (calculated using the harmonic mean, of course). What I have decided to do is report the mean density, pointing out where, as in this case, there is a large discrepancy in frequency.

The O'Connor conjecture

It is also my ambition to examine the statistical data coming our of these lists and to see if it offers any evidence for or against what I call "the O'Connor conjecture" that language is self-repairing. I don't know if J.D.O'Connor was the first person to express this, but he presents a very simple and clear statement of it in his book Phonetics.
A language can tolerate quite a lot of homophones provided they do not get in each other’s way, that is provided they are not likely to occur in the same contexts. This may be a grammatical matter: if the homophones are different parts of speech they are not likely to turn up in the same place in a sentence … If they are the same part of speech, e.g. site sight; pear, pair they can be tolerated unless they occur in the same area of meaning and in association with a similar set of other words. Site may be ambiguous in It’s a nice site, though a wider context will usually make the choice plain. … If homophones do interfere with each other the language may react by getting rid of one or by modifying one.
What minimal pairs do is increase the potential number of homophones in a learner's speech or the potential for misunderstaning between speakers of different dialects. What we would expect, therefore, is for there to be more minimal pairs between sounds which differ greatly, such as peat/part or shake/wake, and fewer between sounds which are close enough to create problems for learners such as cot/caught or pie/buy. So far the evidence I have collected does not support a strong form of the conjecture.

Problems

There are a number of problems waiting to be resolved:

You will find two related lists derived from the same dictionary source at the following links:

A note about the overall total and which words enter the largest number of pairings can be found here.

References

Gleason, Hal (1955). An Introduction to Descriptive Linguistics, Holt Rinehart Winston.
Mitton, Roger (1996). English spelling and the computer. Longman.
O'Connor, J.D. (1973). Phonetics. Penguin Books.
Swan, Michael and Smith, Bernard (1987). Learner English; a teacher's guide to interference and other problems. Cambridge University Press.
Torikian, Merwyn (1992). “Watch your language; an account of Soundedit with reference to the validity of phonological rules.” System, 20, 4, p. 471-480.

Keywords:

Vowels Keyword Transcribed   Consonants Keyword Transcribed
ikey ki ppea pi
ɪ pit pɪt b beebi
epet pet ttoe təʊ
æ pat pæt ddoe dəʊ
ɑ hard hɑd k cap kæp
ɒ pot pɒt g get get
ɔ raw f fat fæt
ʊ put pʊt v vet vet
u coo ku Ɵ thin Ɵɪn
ʌ hut hʌt ð then ðen
3 cur k3 s sack sæk
ə about/mother əbaʊt/mʌðə z zoo zu
bay beɪ ʃ ship ʃɪp
buy baɪ ʒ measure meʒə
ɔɪ boy bɔɪ h hide haɪd
əʊ go gəʊ m man mæn
cow kaʊ n no nəʊ
ɪə peer pɪə ŋ sing sɪŋ
pair peə l lie laɪ
ʊə poor pʊə r red red
  j year jɪə
  w wet wet
  ʧ chin ʧɪn
  ʤ judge ʤʌʤ
Return to start

I would be grateful if teachers using this page could send me emails (to minpairsatvictorcanning.com, replace at with @ when mailing) telling me which lists they are using and what specific problems their learners experience. I am gradually re-editing all the lists, and this information would be useful to incorporate.

Links to revisions of some of John Higgins's other articles
Fuel for learning
… if I can without strain find 555 paraphrases of an 8-word sentence, then several thousand million paraphrases of a 50- to 60-word sentence is reasonable. … Why has Mother Language showered us with so many ways of expressing meanings?
I speak analogue; you hear digital
… In effect what we are doing here is to have the candidate give the assessor a listening test. We are certainly making the assessor behave more like a listener dealing digitally with the question "What is the candidate trying to tell me?" rather than like a judge dealing in an analogue way with the question "How well can the candidate make that sound?"
A note on quantities
…We seldom stop to ask, "What kind of 200 word text in real life is self-contained and interesting?" …
Artificial unintelligence
…While computers possess randomness, they can to some extent do without intelligence…
[The John and Muriel Higgins Home Page]

 

Page maintained by John Higgins. Last updated 22 July, 2014.