On the tip of my tongue

Say these words out loud.

‘The tip of the tongue, the lips and the teeth.’

Whilst you were speaking, what were your tongue and lips doing? How were you breathing? Can you breathe in and still speak?

Now try reciting this (rather peculiar) poem. It contains every sound (phoneme) used in spoken English.

The pleasure of Shawn’s company
Is what I most enjoy.
He put a tack on Ms. Yancey’s chair
When she called him a horrible boy.
At the end of the month he was flinging two kittens
Across the width of the room.
I count on his schemes to show me a way now
Of getting away from my gloom.

This ‘panphonic’ poem was written by linguist Neal Whitman, and used in the film Mission: Impossible 3 (2006).

Did you notice that whilst you were talking, your tongue and lips never moved sideways?

Reaching out across a divide; Israeli Prime Minister Yitzhak Rabin, U.S. president Bill Clinton, and PLO chairman Yasser Arafat at the White house in 1993 (Image: Wikimedia Commons)

Talking connects us across all cultural boundaries, and sets us apart from other animals. Our diverse vocal repertoire, contributing to nearly 7,000 languages worldwide, has no equivalent anywhere in the animal kingdom. We arrange complex sounds into phrases with rhythm, stress and intonation (prosody), and deliver them with visual emphasis using facial expressions and gestures.

Traditionally, linguists have considered that our speech is too complex to have arisen by natural selection, suggesting instead that it results from a sudden event such as a ‘freak’ genetic mutation.

This view is at odds with what we understand about the rest of our biology. Evolution works by selecting from existing variation in forms and behaviours. Adaptations are not a ‘best-design’ solution to a survival problem, but a balance of innovations that arrive with inherited constraints.

During our first seven years of life we learn to articulate the sounds unique to the language(s) we hear. We structure these sounds into syllables, and use them to build words, phrases and sentences that convey meaning. We also use these sounds to coin new words and invent new meanings for these sounds. For our ancestors to begin to do this, however, requires that the physical ability to make this diversity of sounds already to have been in place.

How do we physically produce speech sounds?

X-rays of a human jaw, taken by Dr H. Trevelyan George at St. Bartholomew’s Hospital, London, in January, 1917. The black dashed line in these x-rays is a thin metal chain on the tongue’s upper surface. The pict... moreures reveal how it changes position when producing the ‘cardinal’ vowels [i, u, a, ɑ]. These sounds are voiced by controlling the flow of breath and causing vibrations in the vocal cords. English consonants are voiced or voiceless, and are made using five main movements. We touch the lips together or against the teeth, place the tongue between or onto the back of the teeth, or against the hard palate, and lift the tongue against the soft palate. We also use an open-mouthed turbulent flow of air to make aspirated sounds as in the ‘h’ of ‘hair’ (Image: Wikimedia Commons)

Speech is musical; whilst we produce precisely articulated sounds, our voices also use pitch, tone and timbre to emphasise words and give rhythm and shape to our phrases. This requires precise coordination of multiple muscles in the chest, larynx, throat, mouth and face. These movements literally do not ‘come to mind’. Instead we focus on what we are saying, and how the listener responds.

Sound generation (phonation) in almost all mammals involves coordinating their breathing with the control of tension in the vocal folds of the larynx. We selectively process the harmonics from these basic sounds, and articulate precise sound sequences using rapid and rhythmic movements of the tongue, lips and associated structures.

At the simplest level, speaking involves alternating open-closed movements of the jaw at the same time as generating sound in the larynx. This produces an alternating stream of open (resonant) and closed (muted) sounds. We build words and phrases from these alternating ‘segments’, using the lips and tongue to produce precisely articulated consonants and control pitch, timbre, tone and stress.

We vocalise as other mammals do, using our feeding and breathing apparatus. The muscle movements that operate both chewing and speaking are controlled by rhythmic nerve impulses from ‘Central Pattern Generators’. These are autonomous nerve ‘modules’ in the lower brain and spinal cord. They co-ordinate all our repetitive movements, from walking to vomiting.

Which aspects of our speaking abilities are found in other animals?

Jack, a military working dog, barking during his training; Rochester, New York, 2009. Dogs are unusual amongst vocal mammals; their calls include barking sequences (bow-wow-wow) alternate between mostly identical open-c... morelose jaw movements. In humans, this alternation is a universal speaking mode. The dog barking in this video is also making other coupled rhythmic communication signals – note the tail wagging and ear movements (Image: Wikimedia Commons)

There is no animal equivalent to the combined movements that make up our vocal cycle, although other animals make all of these movements. Most mammals call with their mouths open, using coupled Central Pattern Generators that link the out-breath with sound production (phonation) in the larynx. A few animals such as dogs make occasional calls using a partial open-close oscillating jaw movement, although they typically repeat the same sound (bow-wow-wow). We coordinate the circuits for breath control and phonation with another set of pattern generators that operate the rhythmic movements of our jaw, lips and tongue.

These movements have other functions, as in the suckling of newborns. This ability defines us as mammals. But these movements may also be linked to talking. James Lund and co-workers suggest that the human Central Pattern Generators controlling chewing, licking and sucking also participate in speech.

Peter MacNeilage goes on to suggest that the rhythmic repetitive movements used for eating have been coordinated in mammals since the clade arose some 200 Ma ago, and that they lie at the root of our articulations skills. As we speak, our tongue moves up and down, and front to back in the mouth. Chewing also includes sideways motions of the tongue and jaw. These are not included in our vocal movements, indeed they would leave us more prone to biting our tongue. MacNeilage proposes that coupling our pre-existing capability for making vocal calls with this subset of movements used during eating, gave our more immediate ancestors the capacity to articulate simple ‘proto-syllables’.

Baby chimpanzee (Pan troglodytes) at Beijing zoo. Chimpanzees can be taught to recognise several hundred human words, but none have reproduced these verbally, even if raised in a human environment. Chimpanzees make occa... moresional calls in a series (something like syllables in speech), although repetition of one sound is more usual. In the variable-sound calls, the arrangement of components do not seem to be significant. In contrast, we can say e.g. ‘cat, ‘tack’ and ‘act, using varied sequences of the same sounds to infer different information (Image: Wikimedia Commons)

Does this then apply to our nearest relatives, the chimps? Philip Lieberman has shown that their vocal apparatus is anatomically suitable to produce a range of human syllables, and yet they do not speak. This suggests that they cannot coordinate their Central Pattern Generator signals for vocal sound production and chewing. The reason for this appears to be linked to differences in cognition. Recent work comparing the neurology underpinning chimp calls and human word-based speech has found that the neural circuits driving these respective vocalisations originate from different parts of the brain.

However chimps and other primates do make rhythmical face and jaw movements, producing lipsmacks, tongue smacks, teeth-chatters and other facial gestures. Lipsmacks involve moving the jaw without the teeth coming together, as in human speech, and are often made by juveniles as they approach their mother to suckle. Primate grooming is a 1:1 interaction using touch, eye contact and other positive one-to-one interactions which often involve ‘taking turns’.

We are able to precisely control the pitch, and resonance as well as the rhythm of speech and articulation of our sounds thanks to our flexible and dextrous tongue. The tongues of new-born babies lie flat in their mouths. The permanent descent of our larynx occurs early during our development; this raises the tongue in the mouth, allowing it to move freely.

Male Koala (Phascolarctos cinereus) at Billabong Koala and Aussie Wildlife Park, Port Macquarie, New South Wales, Australia. Almost all mammals lower their larynx to vocalise. Koalas are unusual; along with lions, deer ... moreand humans they have a permanently descended larynx. In these non-human animals this results in a dramatically resonant deep male mating call. Unlike us, these other animals have pronounced differences in body form between males and females (Image: Wikimedia Commons). Watch a video of male koala vocalisation here.

The ability to learn and reproduce complex sounds in the form of song has arisen independently in whales, humans, and at least three times in birds. Only male songbirds make complex learned song calls; they sing to attract mates. In contrast our language ability is gender-balanced. No other primates have elaborated male mating calls. However the second descent of the larynx in boys during puberty suggests that sexual selection may have refined our control of the resonant qualities of our vocal tract.

How could natural selection have favoured our ancestors’ ability to produce these sounds?

Our vocal flexibility comes at a price; an increased individual risk of choking. Our ancestors’ ability to form ‘proto-words’ and ‘proto-song’ must have given their tribal group a selective advantage that outweighed this risk.

Strong bonds bring advantages to social groups including relative safety in numbers from predators, collective understanding of their environment, higher levels of parental care from the extended family (resulting in better juvenile survival), and coordination of hunting and foraging. Primates use grooming, which combines touch with emotionally-coded facial and vocal sound gestures, to make and maintain social bonds.

Chimpanzees (Pan trogoldytes) grooming at Gombe Stream National Park, Tanzania.Grooming in primates cleans the outer body, decreases stress, allows acceptance into the group and reveals the social hierarchy. Chimps and ... moreother higher primates may utter pleasure-indicating sounds during grooming, particularly if being attended to by a higher status member of the tribe. Chimpanzee tribes range from 15 to 120 individuals. In their “fission-fusion society” all members know each other but feed, travel, and sleep in smaller groups of six or less. The membership of these small groups changes frequently (Image: Wikimedia Commons)

Robin Dunbar points out that the extent of primate grooming time can be predicted from their combined neocortex (‘thinking’ brain) size and social group size. It is thought our large-brained hominid ancestors lived in tribes of up to around 150 individuals, which would mean that they would need to spend up to 40% of their time manually grooming each other to maintain social bonds. Speech may have provided a time-saving alternative; a means of ‘vocally grooming’ others. A speaker may have been able to connect to and bond simultaneously with multiple individuals.

Human babies making new or unusual sounds quickly receive their parents’ attention. Ulrike Griebel and Kimbrough Oller suggest that our hominin ancestors’ babies may have produced sounds that provoked more parental attention and so more effective bonding. Their better survival would select for babies with vocal variation and flexibility.

A large pod of over 80 dusky dolphins (Lagenorhynchus obscurus) swimming together in South Bay, Kaikoura, New Zealand. Although there is some physical touching amongst individuals in the pod, dolphins and killer whales ... moreuse vocal grooming to coordinate with each other when hunting, during migrations, and ‘play’ (Image: Wikimedia Commons)

Involuntary pleasure sounds encourage continued social interactions between primates of all ages. Perhaps our hominid ancestors learned to make diverse and pleasurable sounds as babies, and as a result were better equipped as adults to vocally groom their extended families.

Conclusions

Fine motor control of the tongue, lips and jaw allow us to produce a huge repertoire of diverse sounds. This control likely comes from combining the movements used for eating with the production of vocal sounds.
Our ancestors may have evolved this capability at a time when their social group size increased, with a more time-efficient form of grooming required if the cohesion of the ‘tribe’ was to be maintained.
Selection for diverse and flexible speech sounds may have begun with babies using suckling and other sounds to gain more parental attention. These individuals would be more effective as adults at ‘vocally grooming’ their wider social group.
Our ability to produce speech sounds is balanced between genders, suggesting that group selection rather than sexual selection has driven the evolution of this ability.

References

Arbib, M.A. (2005)  From monkey-like action recognition to human language: an evolutionary framework for neurolinguistics.  Behavioral and Brain Sciences 28, 105-124.

Bouchet, H. et al. (2013)  Social complexity parallels vocal complexity: a comparison of three non-human primate species.  Frontiers in Psychology 4, article 390.

Charlton, B. et al. (2011)  Perception of male caller identity in koalas (Phascolarctos cinereus): acoustic analysis and playback experiments.  PLoS ONE 6, e20329.

Fitch, W.T. (2010)  The Evolution of Language. Cambridge University Press.

Green, S. and Marler, P. (1979)  The analysis of animal communication.  In Social Behavior and Communication (P. Marler, ed.) pp. 73-158.  Springer.

Hauser, M. et al. (2002)  The faculty of language: what is it, who has it, and how did it evolve?  Science298, 1569-1579.

Kimbrough Oller, D. and Griebel, U. (eds) (2008)  Evolution of Communicative Flexibility: Complexity, Creativity, and Adaptability in Human and Animal Communication.  Vienna Series in Theoretical Biology, MIT Press.

Lehman, J., Korstjens, A.H. and Dunbar, R.I.M. (2007)  Group size, grooming and social cohesion in primates.  Animal Behaviour 74, 1617-1629.

Lieberman, P. (2006)  Toward an Evolutionary Biology of Language. Harvard University Press.

Lund, J.P. and Kolta, A. (2006)  Brainstem circuits that control mastication: do they have anything to say during speech?  Journal of Communication Disorders 39, 381-390.

MacNeilage, P. (2008)  The Origin of Speech. Oxford University Press.

MacNeilage, P.F. and Davis, B.L. (2000)  On the origin of internal structure of word forms.  Science288, 527-531.

Parr. L.A. et al. (2007)  Classifying chimpanzee facial expressions using muscle action. Emotion 7, 172-181.

Pearson, K.G. (2000) Neural adaptation in the generation of rhythmic behaviour.  Annual Review of Physiology 62, 723-753.

Redican, W.K. and Rosenblum, L.A. (1975)  Facial expressions in nonhuman primates. Stanford Research Institute.

Titze, I. R. (1989)  Physiologic and acoustic differences between male and female voices.  The Journal of the Acoustical Society of America 85, 1699-1707.

Traxler, M.J. et al. (2012)  What's special about human language?  The contents of the "Narrow Language Faculty" revisited.  Linguistics and Language Compass 6, 611-621.

van Wassenhove, V. (2013)  Speech through ears and eyes: interfacing the senses with the supramodal brain.  Frontiers in Psychology 4, article 388.

Weusthoff, S. et al. (2013)  The siren song of vocal fundamental frequency for romantic relationships.  Frontiers in Psychology 4, article 439.

Willemet, R. (2013)  Reconsidering the evolution of brain, cognition, and behaviour in birds and mammals.  Frontiers in Psychology, 4, article 396.

Don't miss these