Feeling the fear: the ecology of our most basic emotion

Seeing red: for happy faces, or snakes in the grass?

On the tip of my tongue

Say these words out loud.

‘The tip of the tongue, the lips and the teeth.’

Whilst you were speaking, what were your tongue and lips doing?  How were you breathing?  Can you breathe in and still speak?

Now try reciting this (rather peculiar) poem.  It contains every sound (phoneme) used in spoken English.

The pleasure of Shawn’s company
Is what I most enjoy.
He put a tack on Ms. Yancey’s chair
When she called him a horrible boy.
At the end of the month he was flinging two kittens
Across the width of the room.
I count on his schemes to show me a way now
Of getting away from my gloom.

This ‘panphonic’ poem was written by linguist Neal Whitman, and used in the film Mission: Impossible 3 (2006).

Did you notice that whilst you were talking, your tongue and lips never moved sideways?


Reaching out across a divide; Israeli Prime Minister Yitzhak Rabin, U.S. president Bill Clinton, and PLO chairman Yasser Arafat at the White house in 1993 (Image: Wikimedia Commons)

Reaching out across a divide; Israeli Prime Minister Yitzhak Rabin, U.S. president Bill Clinton, and PLO chairman Yasser Arafat at the White house in 1993 (Image: Wikimedia Commons)

Talking connects us across all cultural boundaries, and sets us apart from other animals.  Our diverse vocal repertoire, contributing to nearly 7,000 languages worldwide, has no equivalent anywhere in the animal kingdom.  We arrange complex sounds into phrases with rhythm, stress and intonation (prosody), and deliver them with visual emphasis using facial expressions and gestures.

Traditionally, linguists have considered that our speech is too complex to have arisen by natural selection, suggesting instead that it results from a sudden event such as a ‘freak’ genetic mutation.

This view is at odds with what we understand about the rest of our biology.  Evolution works by selecting from existing variation in forms and behaviours.  Adaptations are not a ‘best-design’ solution to a survival problem, but a balance of innovations that arrive with inherited constraints.

During our first seven years of life we learn to articulate the sounds unique to the language(s) we hear.  We structure these sounds into syllables, and use them to build words, phrases and sentences that convey meaning.  We also use these sounds to coin new words and invent new meanings for these sounds.  For our ancestors to begin to do this, however, requires that the physical ability to make this diversity of sounds already to have been in place.

How do we physically produce speech sounds?

X-rays of a human jaw, taken by Dr H. Trevelyan George at St. Bartholomew's Hospital, London, in January, 1917.  The black dashed line in these x-rays is a thin metal chain on the tongue’s upper surface.  The pictures reveal how it changes position when producing the ‘cardinal’ vowels [i, u, a, ɑ].  These sounds are voiced by controlling the flow of breath and causing vibrations in the vocal cords.   English consonants are voiced or voiceless, and are made using five main movements.  We touch the lips together or against the teeth, place the tongue between or onto the back of the teeth, or against the hard palate, and lift the tongue against the soft palate.  We also use an open-mouthed turbulent flow of air to make aspirated sounds as in the ‘h’ of ‘hair’ (Image: Wikimedia Commons)

X-rays of a human jaw, taken by Dr H. Trevelyan George at St. Bartholomew’s Hospital, London, in January, 1917. The black dashed line in these x-rays is a thin metal chain on the tongue’s upper surface. The pict... moreures reveal how it changes position when producing the ‘cardinal’ vowels [i, u, a, ɑ]. These sounds are voiced by controlling the flow of breath and causing vibrations in the vocal cords. English consonants are voiced or voiceless, and are made using five main movements. We touch the lips together or against the teeth, place the tongue between or onto the back of the teeth, or against the hard palate, and lift the tongue against the soft palate. We also use an open-mouthed turbulent flow of air to make aspirated sounds as in the ‘h’ of ‘hair’ (Image: Wikimedia Commons)

Speech is musical; whilst we produce precisely articulated sounds, our voices also use pitch, tone and timbre to emphasise words and give rhythm and shape to our phrases. This requires precise coordination of multiple muscles in the chest, larynx, throat, mouth and face.  These movements literally do not ‘come to mind’.  Instead we focus on what we are saying, and how the listener responds.

Sound generation (phonation) in almost all mammals involves coordinating their breathing with the control of tension in the vocal folds of the larynx.  We selectively process the harmonics from these basic sounds, and articulate precise sound sequences using rapid and rhythmic movements of the tongue, lips and associated structures.

At the simplest level, speaking involves alternating open-closed movements of the jaw at the same time as generating sound in the larynx.  This produces an alternating stream of open (resonant) and closed (muted) sounds.  We build words and phrases from these alternating ‘segments’, using the lips and tongue to produce precisely articulated consonants and control pitch, timbre, tone and stress.

We vocalise as other mammals do, using our feeding and breathing apparatus.  The muscle movements that operate both chewing and speaking are controlled by rhythmic nerve impulses from ‘Central Pattern Generators’.  These are autonomous nerve ‘modules’ in the lower brain and spinal cord.  They co-ordinate all our repetitive movements, from walking to vomiting.

Which aspects of our speaking abilities are found in other animals?

Jack, a military working dog, barking during his training; Rochester, New York, 2009.  Dogs are unusual amongst vocal mammals; their calls include barking sequences (bow-wow-wow) alternate between mostly identical open-close jaw movements.  In humans, this alternation is a universal speaking mode.  The dog barking in this video is also making other coupled rhythmic communication signals – note the tail wagging and ear movements (Image: Wikimedia Commons)

Jack, a military working dog, barking during his training; Rochester, New York, 2009. Dogs are unusual amongst vocal mammals; their calls include barking sequences (bow-wow-wow) alternate between mostly identical open-c... morelose jaw movements. In humans, this alternation is a universal speaking mode. The dog barking in this video is also making other coupled rhythmic communication signals – note the tail wagging and ear movements (Image: Wikimedia Commons)

There is no animal equivalent to the combined movements that make up our vocal cycle, although other animals make all of these movements.  Most mammals call with their mouths open, using coupled Central Pattern Generators that link the out-breath with sound production (phonation) in the larynx.  A few animals such as dogs make occasional calls using a partial open-close oscillating jaw movement, although they typically repeat the same sound (bow-wow-wow).  We coordinate the circuits for breath control and phonation with another set of pattern generators that operate the rhythmic movements of our jaw, lips and tongue.

These movements have other functions, as in the suckling of newborns.  This ability defines us as mammals.  But these movements may also be linked to talking.  James Lund and co-workers suggest that the human Central Pattern Generators controlling chewing, licking and sucking also participate in speech.

Peter MacNeilage goes on to suggest that the rhythmic repetitive movements used for eating have been coordinated in mammals since the clade arose some 200 Ma ago, and that they lie at the root of our articulations skills.  As we speak, our tongue moves up and down, and front to back in the mouth.  Chewing also includes sideways motions of the tongue and jaw.  These are not included in our vocal movements, indeed they would leave us more prone to biting our tongue.  MacNeilage proposes that coupling our pre-existing capability for making vocal calls with this subset of movements used during eating, gave our more immediate ancestors the capacity to articulate simple ‘proto-syllables’.

Baby chimpanzee (Pan troglodytes) at Beijing zoo. Chimpanzees can be taught to recognise several hundred human words, but none have reproduced these verbally, even if raised in a human environment. Chimpanzees make occasional calls in a series (something like syllables in speech), although repetition of one sound is more usual.  In the variable-sound calls, the arrangement of components do not seem to be significant.  In contrast, we can say e.g.  ‘cat, ‘tack’ and ‘act, using varied sequences of the same sounds to infer different information (Image: Wikimedia Commons)

Baby chimpanzee (Pan troglodytes) at Beijing zoo. Chimpanzees can be taught to recognise several hundred human words, but none have reproduced these verbally, even if raised in a human environment. Chimpanzees make occa... moresional calls in a series (something like syllables in speech), although repetition of one sound is more usual. In the variable-sound calls, the arrangement of components do not seem to be significant. In contrast, we can say e.g. ‘cat, ‘tack’ and ‘act, using varied sequences of the same sounds to infer different information (Image: Wikimedia Commons)

Does this then apply to our nearest relatives, the chimps?  Philip Lieberman has shown that their vocal apparatus is anatomically suitable to produce a range of human syllables, and yet they do not speak.  This suggests that they cannot coordinate their Central Pattern Generator signals for vocal sound production and chewing.  The reason for this appears to be linked to differences in cognition.  Recent work comparing the neurology underpinning chimp calls and human word-based speech has found that the neural circuits driving these respective vocalisations originate from different parts of the brain.

However chimps and other primates do make rhythmical face and jaw movements, producing lipsmacks, tongue smacks, teeth-chatters and other facial gestures.  Lipsmacks involve moving the jaw without the teeth coming together, as in human speech, and are often made by juveniles as they approach their mother to suckle.  Primate grooming is a 1:1 interaction using touch, eye contact and other positive one-to-one interactions which often involve ‘taking turns’.

We are able to precisely control the pitch, and resonance as well as the rhythm of speech and articulation of our sounds thanks to our flexible and dextrous tongue.  The tongues of new-born babies lie flat in their mouths.  The permanent descent of our larynx occurs early during our development; this raises the tongue in the mouth, allowing it to move freely.

Male Koala (Phascolarctos cinereus) at Billabong Koala and Aussie Wildlife Park, Port Macquarie, New South Wales, Australia.  Almost all mammals lower their larynx to vocalise.  Koalas are unusual; along with lions, deer and humans they have a permanently descended larynx.  In these non-human animals this results in a dramatically resonant deep male mating call.  Unlike us, these other animals have pronounced differences in body form between males and females (Image: Wikimedia Commons).   Watch a video of male koala vocalisation here.

Male Koala (Phascolarctos cinereus) at Billabong Koala and Aussie Wildlife Park, Port Macquarie, New South Wales, Australia. Almost all mammals lower their larynx to vocalise. Koalas are unusual; along with lions, deer ... moreand humans they have a permanently descended larynx. In these non-human animals this results in a dramatically resonant deep male mating call. Unlike us, these other animals have pronounced differences in body form between males and females (Image: Wikimedia Commons). Watch a video of male koala vocalisation here.

The ability to learn and reproduce complex sounds in the form of song has arisen independently in whales, humans, and at least three times in birds.  Only male songbirds make complex learned song calls; they sing to attract mates.  In contrast our language ability is gender-balanced.  No other primates have elaborated male mating calls.  However the second descent of the larynx in boys during puberty suggests that sexual selection may have refined our control of the resonant qualities of our vocal tract.

How could natural selection have favoured our ancestors’ ability to produce these sounds?

Our vocal flexibility comes at a price; an increased individual risk of choking.  Our ancestors’ ability to form ‘proto-words’ and ‘proto-song’ must have given their tribal group a selective advantage that outweighed this risk.

Strong bonds bring advantages to social groups including relative safety in numbers from predators, collective understanding of their environment, higher levels of parental care from the extended family (resulting in better juvenile survival), and coordination of hunting and foraging.   Primates use grooming, which combines touch with emotionally-coded facial and vocal sound gestures, to make and maintain social bonds.

Chimpanzees (Pan trogoldytes) grooming at Gombe Stream National Park, Tanzania.  Grooming in primates cleans the outer body, decreases stress, allows acceptance into the group and reveals the social hierarchy.  Chimps and other higher primates may utter pleasure-indicating sounds during grooming, particularly if being attended to by a higher status member of the tribe. Chimpanzee tribes range from 15 to 120 individuals. In their “fission-fusion society” all members know each other but feed, travel, and sleep in smaller groups of six or less.  The membership of these small groups changes frequently (Image: Wikimedia Commons)

Chimpanzees (Pan trogoldytes) grooming at Gombe Stream National Park, Tanzania.Grooming in primates cleans the outer body, decreases stress, allows acceptance into the group and reveals the social hierarchy. Chimps and ... moreother higher primates may utter pleasure-indicating sounds during grooming, particularly if being attended to by a higher status member of the tribe. Chimpanzee tribes range from 15 to 120 individuals. In their “fission-fusion society” all members know each other but feed, travel, and sleep in smaller groups of six or less. The membership of these small groups changes frequently (Image: Wikimedia Commons)

Robin Dunbar points out that the extent of primate grooming time can be predicted from their combined neocortex (‘thinking’ brain) size and social group size.  It is thought our large-brained hominid ancestors lived in tribes of up to around 150 individuals, which would mean that they would need to spend up to 40% of their time manually grooming each other to maintain social bonds.  Speech may have provided a time-saving alternative; a means of ‘vocally grooming’ others.  A speaker may have been able to connect to and bond simultaneously with multiple individuals.

Human babies making new or unusual sounds quickly receive their parents’ attention.  Ulrike Griebel and Kimbrough Oller suggest that our hominin ancestors’ babies may have produced sounds that provoked more parental attention and so more effective bonding.  Their better survival would select for babies with vocal variation and flexibility.

A large pod of over 80 dusky dolphins (Lagenorhynchus obscurus) swimming together in South Bay, Kaikoura, New Zealand. Although there is some physical touching amongst individuals in the pod, dolphins and killer whales use vocal grooming to coordinate with each other when hunting, during migrations, and ‘play’  (Image: Wikimedia Commons)

A large pod of over 80 dusky dolphins (Lagenorhynchus obscurus) swimming together in South Bay, Kaikoura, New Zealand. Although there is some physical touching amongst individuals in the pod, dolphins and killer whales ... moreuse vocal grooming to coordinate with each other when hunting, during migrations, and ‘play’ (Image: Wikimedia Commons)

Involuntary pleasure sounds encourage continued social interactions between primates of all ages.  Perhaps our hominid ancestors learned to make diverse and pleasurable sounds as babies, and as a result were better equipped as adults to vocally groom their extended families.

Conclusions

  • Fine motor control of the tongue, lips and jaw allow us to produce a huge repertoire of diverse sounds.  This control likely comes from combining the movements used for eating with the production of vocal sounds.
  •  Our ancestors may have evolved this capability at a time when their social group size increased, with a more time-efficient form of grooming required if the cohesion of the ‘tribe’ was to be maintained.
  • Selection for diverse and flexible speech sounds may have begun with babies using suckling and other sounds to gain more parental attention.  These individuals would be more effective as adults at ‘vocally grooming’ their wider social group.
  • Our ability to produce speech sounds is balanced between genders, suggesting that group selection rather than sexual selection has driven the evolution of this ability.

Text copyright © 2015 Mags Leighton. All rights reserved.

References
Arbib, M.A. (2005)  From monkey-like action recognition to human language: an evolutionary framework for neurolinguistics.  Behavioral and Brain Sciences 28, 105-124.
Bouchet, H. et al. (2013)  Social complexity parallels vocal complexity: a comparison of three non-human primate species.  Frontiers in Psychology 4, article 390.
Charlton, B. et al. (2011)  Perception of male caller identity in koalas (Phascolarctos cinereus): acoustic analysis and playback experiments.  PLoS ONE 6, e20329.
Fitch, W.T. (2010)  The Evolution of Language. Cambridge University Press.
Green, S. and Marler, P. (1979)  The analysis of animal communication.  In Social Behavior and Communication (P. Marler, ed.) pp. 73-158.  Springer.
Hauser, M. et al. (2002)  The faculty of language: what is it, who has it, and how did it evolve?  Science298, 1569-1579.
Kimbrough Oller, D. and Griebel, U. (eds) (2008)  Evolution of Communicative Flexibility: Complexity, Creativity, and Adaptability in Human and Animal Communication.  Vienna Series in Theoretical Biology, MIT Press.
Lehman, J., Korstjens, A.H. and Dunbar, R.I.M. (2007)  Group size, grooming and social cohesion in primates.  Animal Behaviour 74, 1617-1629.
Lieberman, P. (2006)  Toward an Evolutionary Biology of Language. Harvard University Press.
Lund, J.P. and Kolta, A. (2006)  Brainstem circuits that control mastication: do they have anything to say during speech?  Journal of Communication Disorders 39, 381-390.
MacNeilage, P. (2008)  The Origin of Speech. Oxford University Press.
MacNeilage, P.F. and Davis, B.L. (2000)  On the origin of internal structure of word forms.  Science288, 527-531.
Parr. L.A. et al. (2007)  Classifying chimpanzee facial expressions using muscle action. Emotion 7, 172-181.
Pearson, K.G. (2000) Neural adaptation in the generation of rhythmic behaviour.  Annual Review of Physiology 62, 723-753.
Redican, W.K. and Rosenblum, L.A. (1975)  Facial expressions in nonhuman primates. Stanford Research Institute.
Titze, I. R. (1989)  Physiologic and acoustic differences between male and female voices.  The Journal of the Acoustical Society of America 85, 1699-1707.
Traxler, M.J. et al. (2012)  What's special about human language?  The contents of the "Narrow Language Faculty" revisited.  Linguistics and Language Compass 6, 611-621.
van Wassenhove, V. (2013)  Speech through ears and eyes: interfacing the senses with the supramodal brain.  Frontiers in Psychology 4, article 388.
Weusthoff, S. et al. (2013)  The siren song of vocal fundamental frequency for romantic relationships.  Frontiers in Psychology 4, article 439.
Willemet, R. (2013)  Reconsidering the evolution of brain, cognition, and behaviour in birds and mammals.  Frontiers in Psychology, 4, article 396.

Putting things into words

The dolphin calf is barely three weeks old. 

His mother nuzzles at him, calling softly.  The calf responds, mimicking her call. 

This unique set of sounds is his signature; the name that mother is teaching him to recognise, and which she will use to call to her calf as she teaches her baby to hunt.  Later he will use this signature whistle so that others in his own pod will recognise him. 

Again and again as his mother calls this name, he repeats it back. 


Words hold ideas in code.  As in all communications, the meaning of a signal must be agreed between the sender and receiver.  We give our words their meaning by shared agreement.

A bonobo (Pan paniscus) ‘fishing’ for termites using a stick tool, at San Diego Zoo (Image: Wikimedia Commons)

A bonobo (Pan paniscus) ‘fishing’ for termites using a stick tool, at San Diego Zoo (Image: Wikimedia Commons)

As we remember and recall them, we access the information they hold.  Collectively we use our words as tools to store information in symbolic form, and so bring our memories ‘to mind’.

Human languages, whether sung or spoken, produce words by controlling the pitch and articulation of these distinct sets of sounds with the lips and tongue. We process our words in the brain through the same fine motor control circuits as we (and our primate cousins) use to coordinate our hands and fingers.

This means that we use words almost as if they are tools in our hands.  At the neurological level, words are gestures to which we give meaning, and then use as tools to share that meaning.

Why did our ancestors need words?

Hearing words bring our ideas to mind, coordinating the thoughts of our social group.  New words are coined when that group agrees to associate a syllable sequence (a word), itself distinct from existing words, with a new unique meaning.  Some other animals, e.g. dogs, can learn to associate human word sounds or gestures with simple meanings.

 

Dogs (Canis lupus subspecies domesticus) have lived alongside humans for over 33,000 years.  We have selectively bred these animals to shepherd, hunt, sniff out and retrieve our targets for us upon demand.  Human-directed selection has ‘evolved’ working dogs that can be trained to recognise around 200 human words. Commands, or proxies for them such as whistled signals, are tools use to coordinate their activity with ours (Image: Wikimedia Commons)

Dogs (Canis familiaris) have lived alongside humans for over 33,000 years. We have selectively bred these animals to shepherd, hunt, sniff out and retrieve our targets for us upon demand. Human-directed selection has �... more��evolved’ working dogs that can be trained to recognise around 200 human words. Commands, or proxies for them such as whistled signals, are tools use to coordinate their activity with ours (Image: Wikimedia Commons)

However our understanding and use of words – as symbolic tools – is highly flexible.  Our application of words is often playful, for example puns and ambiguities can extend the meaning of a word, or apply it in a new way.  This furthers our use of these tools for social interaction.

Perhaps starting around 2.5 Ma, our ancestors began to experience selective forces that ultimately promoted a remarkable mental flexibility, resulting in the development of elaborate and multi-purpose manual tools.  This expanded tool use corresponds with the onset of cultural learning.  The making and development of tools is learned from our social group, as is our speech.

Are we then unique in our ability to coin new words?  Dolphins and some other whales broadcast ‘signature calls’ when hunting in murky and deep water, enabling them to stay connected with their pod.  Vocal self-identifying calls would have provided similar benefits to our hominin ancestors in dense, low visibility forest habitats, and perhaps also across large distances in open grassland habitats.  Specific word tools for sharing information, e.g. warning of snakes or poisonous fruit, would enable this group to collectively navigate their world more effectively than they could alone.

We often talk whilst using our hands to make additional gestures, or operate tools.  Here those tools are knives, forks and wineglasses (Image: Wikimedia Commons)

We often talk whilst using our hands to make additional gestures, or operate tools. Here those tools are knives, forks and wineglasses (Image: Wikimedia Commons)

As with manual tools, the act of using words provides immediate feedback.  Our language may have a gestural basis in the brain, but our vocal-auditory speech mode is much more efficient.  Although we often move our hands when we talk, we can speak whilst conducting other manual tasks.

How did our ancestors begin to use words as tools?

– Peter MacNeilage suggests that our language arose directly as vocal speech.  Our ancestors’ circumstances may have selected for specific vocal signals, received using their auditory communication channel, whilst their hands were busy with other tasks.  This could include hunting with manual tools, foraging or attending to their young.

– William Stokoe and others argue instead that sign came first.  Hand gestures use the visual channel as a receiver.  They suggest that vocal gestures emerged later, perhaps as a combined visual and auditory signal.

This photograph of the New York Curb Association market (c1916) shows brokers and clients signalling from street to offices using stylised gestures.  Similar manual signs have arisen in many open floor stock exchanges around the world, where they made it possible to broker rapid ‘face-to-face’ deals across a noisy room (known as the ‘open outcry’ method of trading).  Today these manual languages have been largely superseded by the advent of telephones and electronic trading through the 1980’s and 1990’s (Image: Wikimedia Commons)

This photograph of the New York Curb Association market (c1916) shows brokers and clients signalling from street to offices using stylised gestures. Similar manual signs have arisen in many open floor stock exchanges ar... moreound the world, where they made it possible to broker rapid ‘face-to-face’ deals across a noisy room (known as the ‘open outcry’ method of trading). Today these manual languages have been largely superseded by the advent of telephones and electronic trading through the 1980’s and 1990’s (Image: Wikimedia Commons)

In practice, we often use manual and vocal channels synchronously, but they don’t mix; we never create words that oblige us to combine hand movements with mouth sounds.  Sign languages based on gestures do arise ‘naturally’ (i.e. much like a pidgin language) usually in response to a constraint, such as where deafness is present in some or all of the population, or where other forms of common language are not available within the group.  Manual languages arising under such circumstances reveals just how flexible and adaptable our speech function really is.

Before our ancestors could assign meaning to words, however, they had to learn how to copy and reproduce the unique movements of the lips and tongue that each new word requires.

What might those first words have been?

Babbling sounds are learned.  Hearing-impaired infants start to babble later, are less rhythmical and use more nasal sounds than babies with normal hearing.  Children exposed only to manual sign language also ‘babble’ with their hands.   Language learning robots that hear and interact with adult humans quickly pick out relevant one-syllable words from a stream of randomly generated babble.  These initial syllables act as ‘anchors’, allowing the machines to more quickly distinguish new syllables from the human sounds it hears (Image: Wikimedia Commons)

Babbling sounds are learned. Hearing-impaired infants start to babble later, are less rhythmical and use more nasal sounds than babies with normal hearing. Children exposed only to manual sign language also ‘babble’... more with their hands. Language learning robots that hear and interact with adult humans quickly pick out relevant one-syllable words from a stream of randomly generated babble. These initial syllables act as ‘anchors’, allowing the machines to more quickly distinguish new syllables from the human sounds it hears (Image: Wikimedia Commons)

Babies begin to control the rhythmical movements involved with both eating and vocalising as they start to babble, at around 4-6 months.  Making these movements involves coordinating the rhythmical nerve outputs of multiple Central Pattern Generator neural circuits.

Central Pattern Generators operate various repetitive functions of the body, including breathing, walking, the rhythmic arm movements that babies often make as they babble, and the baby hand-to-mouth grab reflex.

Babies begin to babble simply by moving their lower jaw at the same time as making noises with the larynx.  These sounds are actually explorations of syllables the child has already heard; around half are the simple syllables used in ‘baby talk’.

Learning to make these sounds involves mastering the simplest of our repetitive vocal movements; typically this involves opening and closing the jaw with the tongue in one position (front, central or back) inside the mouth.

To suckle, babies raise the soft palate in the roof of their mouths to close off the nasal cavity. This creates a vacuum in the mouth that enables them to obtain milk from the breast.  After swallowing, the infant opens the soft palate to take a breath through the nose; this often results in an ‘mmm’ sound in the nasal cavity (Image: Wikimedia Commons)

To suckle, babies raise the soft palate in the roof of their mouths to close off the nasal cavity. This creates a vacuum in the mouth that enables them to obtain milk from the breast. After swallowing, the infant opens ... morethe soft palate to take a breath through the nose; this often results in an ‘mmm’ sound in the nasal cavity (Image: Wikimedia Commons)

Say ‘Mmmmm’, then ‘ma-ma’…  Where do you feel this sound resonating?  A suckling child’s murmuring sounds have this same nasal resonance.

Our first vocal sounds as babies show our desire to connect with our parents.  This connection is two-way; neural and hormonal responses are triggered in human parents upon hearing the cries of their child.

A baby makes nasal murmuring sounds when its lips are pressed to the breast and its mouth is full.  Perhaps as a mother repeats these soothing sounds back to her child, they become a signal that the infant associates with its mother and later mimics to call to her.  Selection may have favoured hominins able to connect with their offspring using vocal sounds.

Unlike young chimps who cling to their mothers, human babies need to be carried.  A hominin mother soothing her child by voice sounds would be able to put down her baby and forage with both hands.

There is more.  Consider walking.  Adopting an upright posture provoked a re-structuring of our hominin ancestors’ body plan.

Breathing and swallowing whilst standing up required a re-orientation of the larynx.  This organ acts as a valve controlling the out-breath, prevents food entering the trachea, and houses the vocal folds (vocal cords) controlling the pitch and volume of the voice.

The nasal resonant consonants of ‘mama’ are made with the tongue at rest and soft palate open (above).  In this position the nasal cavity is continuous with the mouth and throat.  To produce ‘dada’, the soft palate elevates (below), closing off the nasal cavity and limiting resonance to the oral chamber (Image: Modified from Wikimedia Commons)

The nasal resonant consonants of ‘mama’ are made with the tongue at rest and soft palate open (above). In this position the nasal cavity is continuous with the mouth and throat. To produce ‘dada’, the soft palat... moree elevates (below), closing off the nasal cavity and limiting resonance to the oral chamber (Image: Modified from Wikimedia Commons)

 

Breathing and eating whilst standing upright also requires that the soft palate (velum) in the roof of the mouth can move up and down, closing off the nasal cavity from the throat when swallowing.  Moving the soft palate also changes the size and connection between the resonating chambers in our heads.

‘Ma-ma’ sounds are made with the soft palate in an open position and opening and closing the jaw to articulate the lips together.  Closing the soft palate shifts resonance into the mouth, producing ‘pa-pa’ from the same movement.

Most world languages have nasal and oral resonance in their ‘baby talk’ names for ‘mother’ and ‘father’.  Peter MacNeilage highlights this as the only case of two contrasting phonetic forms being regularly linked with their opposing meanings.  The desire of hominin infants to connect specifically with one or the other parent may have resulted in them producing the first deliberately contrasted sounds.

Could these sounds, perhaps along with sung tones, have been part of our first ‘words’?

How do we apply meanings to these vocal gestures? 

Chimpanzees make spontaneous vocal sounds that express emotions such as fear and pleasure, much as we do.  They also communicate intentionally, using non-verbal facial gestures e.g. lipsmacks.

We gesture with our faces, hands and voices across all languages and cultures.  Human baby babbling is also voluntary, combining sound from the larynx with lipsmack-like movements to create simple syllables.

Mandarin Chinese is a tonal language; words with different meanings are coded into the same syllable using changes in pitch (Image: Wikimedia Commons).  Click to hear the four main tones of standard Mandarin, pronounced with the syllable "ma".

Mandarin Chinese is a tonal language; words with different meanings are coded into the same syllable using changes in pitch (Image: Wikimedia Commons). Click to hear the four main tones of standard Mandarin, pronounced ... morewith the syllable “ma”.

The initiation of vocal sounds arises from different regions of the brain in humans and other primates.  Primate calls arise from emotional centres within the brain (associated with the limbic system), whereas human speech circuits are focussed around the lateral sulcus (Sylvian fissure).

Within the lateral sulcus, a zone of the macaque brain (area ‘F5’) thought to be the equivalent of Broca’s area in humans, houses nerve pathways called ‘mirror neurons’.  The mirror circuits are involved with producing and understanding grasping actions associated with obtaining food, decoding others’ facial expressions, and making mouth movements related to eating.

These circuits reveal that neurological routes link hand-and-mouth action commands.  Broca’s area in humans is essential for speech, hand gestures and producing fine movements in the fingers.

A female blackbird (Turdus merula) with nest building materials. As well as speaking and manipulating food in a precise way, many animals and birds use their mouths as a tool to manipulate objects.  The materials with which this blackbird builds her nest are also tools.  Using the eating apparatus to perform other tasks is common amongst vertebrates (Image: Wikimedia Commons)

A female blackbird (Turdus merula) with nest building materials. As well as speaking and manipulating food in a precise way, many animals and birds use their mouths as a tool to manipulate objects. The materials with wh... moreich this blackbird builds her nest are also tools. Using the eating apparatus to perform other tasks is common amongst vertebrates (Image: Wikimedia Commons)

This and other higher brain areas control Central Pattern Generator circuits in the lower brain which coordinate eating movements and voice control.  The same circuits that operate grasping gestures with the hands also trigger the moth to open and close in humans and other higher primates (the automatic hand to mouth reflex of babies).

Mirror neuron networks in humans interconnect with the insula and amygdala; components of the limbic system that are involved in emotional responses.  Maurizio Gentilucci and colleagues at the University of Parma suggest that mirror neurons which link these components of the emotional brain with higher brain circuits for understanding the intention of food grasping gestures may have enabled our ancestors to associate hand or mouth gestures with an emotional content.  Tagging our observations with an emotional response is how we code our own vocal and other gestures with meaning.

Pronunciation chart for Toli Pona, a simple constructed language, designed by Toronto-based language Sonja Lang.  Toli Pona has 14 main sounds (phonemes) and around 120 root words, and was designed as a means to express maximal meaning with minimal complexity.   Each human language uses a unique selection of sounds from the syllables which are possible to make using our vocal apparatus.  As our children learn their first words, they replicate the spoken sounds they hear.  In this way, the sounds we learn as part of our first languages are specific to our location and circumstances (our environment), and reproduce local nuances in pronunciation.  As we become fluent in our native language, producing these sounds becomes ‘automatic’.  We are rarely conscious of the syllables we choose, focussing instead on what we want to say.   People learning a foreign language as adults tend to speak that language using the sounds repertoire of their native tongue.   Listen to the same panphonic prhsase repeated in a variety of accents, (and to contribute your own if you wish), visit the speech accent archive here (Image: Wikimedia Commons)

Pronunciation chart for Toli Pona, a simple constructed language, designed by Toronto-based language Sonja Lang. Toli Pona has 14 main sounds (phonemes) and around 120 root words, and was designed as a means to express ... moremaximal meaning with minimal complexity.Each human language uses a unique selection of sounds from the syllables which are possible to make using our vocal apparatus. As our children learn their first words, they replicate the spoken sounds they hear. In this way, the sounds we learn as part of our first languages are specific to our location and circumstances (our environment), and reproduce local nuances in pronunciation. As we become fluent in our native language, producing these sounds becomes ‘automatic’. We are rarely conscious of the syllables we choose, focussing instead on what we want to say.People learning a foreign language as adults tend to speak that language using the sounds repertoire of their native tongue.Listen to the same panphonic prhsase repeated in a variety of accents, (and to contribute your own if you wish), visit the speech accent archive here (Image: Wikimedia Commons)

Many primates vocalise upon discovering food.  Gestures then may be a bridge linking body movement to objects and associated vocal sounds.  Hearing but not seeing an event take place allows the hearer to visually construct an idea of the associated experience in their mind.  Once an uttered sound could trigger an associated memory, our hominin ancestors could then revisit that experience.

When we hear or think of words that describe an object or a movement, the same mirror neuron circuits are activated as when we encounter that object or make the movement.  Thinking of the words for walking or dancing also triggers responses in our mirror neuron network that are involved with walking or dancing movements.  When we think of doing something, and then do it, we are literally ‘walking our talk’.

Conclusions

  • Words are tools produced by unique sets of movements in the vocal apparatus.  They may have developed in our hominin ancestors as a sound-based form of gesture.
Young chimpanzees from the Jane Goodall sanctuary of Tchimpounga (Congo Brazzaville).  Wild chimpanzees (Pan troglodytes) make ‘pant hoot’ calls upon finding food, such as a tree laden with fruit.  These calls are recognisable by other members of their group.  Adjacent groups of wild chimps with overlapping territories adjust and re-model their pant hoot calls so that their group call signature is distinctive from that of the other tribe.  These remodelled calls seem to indicate group learning amongst these animals (Image: Wikimedia Commons)

Young chimpanzees from the Jane Goodall sanctuary of Tchimpounga (Congo Brazzaville). Wild chimpanzees (Pan troglodytes) make ‘pant hoot’ calls upon finding food, such as a tree laden with fruit. These calls are rec... moreognisable by other members of their group. Adjacent groups of wild chimps with overlapping territories adjust and re-model their pant hoot calls so that their group call signature is distinctive from that of the other tribe. These remodelled calls seem to indicate group learning amongst these animals (Image: Wikimedia Commons)

  • Studying how our babies learn to speak gives us some insights into how hominins may have made the transition to talking.  Our ancestors’ first word tools may have been parental summoning calls.  The vocal calls of babies assist them to bond strongly with their parents.
  • Words inside the brain replicate our physical experience of the phenomena that they symbolise.
  • The mental flexibility to agree new sound combinations and associate these with meaning provided our hominin ancestors with a powerful resource of vocal tools that allow us to share our learning.  This ability to share learning has many potentially selectable survival advantages.

Text copyright © 2015 Mags Leighton. All rights reserved.

References
Davis, B.L. and MacNeilage, P.F. (1995)  Reconsidering the evolution of brain, cognition, and behaviour in birds and mammals.  Journal of Speech, Language, and Hearing Research 38, 1199-1211.
Eisen, A. et al. (2013)  Tools and talk: an evolutionary perspective on the functional deficits associated with amyotrophic lateral sclerosis.  Muscle and Nerve 49, 469-477.
Falk, D. (2004)  Prelinguistic evolution in early hominins: whence motherese? Behavioural and Brain Sciences 27, 491-541.
Gentilucci, M. and Dalla Volta, R. (2008) Spoken language and arm gestures are controlled by the same motor control system. Quarterly Journal of Experimental Psychology 61, 944-957.
Gentilucci, M. et al. (2008) When the hands speak. Journal of Physiology-Paris 102, 21-30.
Goldman, H.I. (2001)  Parental reports of “mama” sounds in infants: an exploratory study.  Journal of Child Language 28, 497-506.
Jakobson, R. (1960)  Why “Mama” and “Papa”?’  In Essays in Honor of Heinz Werner (R. Jakobson, ed.) pp. 538-545.  Mouton.
Johnson-Frey, S.H. (2003)  What's so special about human tool use?  Neuron 39, 201-204.
Johnson-Frey, S.H. (2004)  The neural bases of complex tool use in humans.  Trends in Cognitive Sciences 8, 71-78.
Jürgens, U. (2002)  Neural pathways underlying vocal control.  Neuroscience and Biobehavioural Reviews 26, 235–258.
King, S.L. and Janik, V.M. (2013)  Bottlenose dolphins can use learned vocal labels to address each other.  Proceedings of the National Academy of Sciences, USA 110, 13216-13221.
King, S. et al. (2013) Vocal copying of individually distinctive signature whistles in bottlenose dolphins. Proceedings of the Royal Society of London, B 280, 20130053.
Lieberman. P. (2006)  Toward an Evolutionary Biology of Language. Harvard University Press.
Lyon, C. et al. (2012)  Interactive language learning by robots: the transition from babbling to word forms.  PLoS ONE 7, e38236.
MacNeilage, P. (2008)  The Origin of Speech. Oxford University Press.
MacNeilage, P.F. and Davis, B.L. (2000)  On the origin of internal structure of word forms.  Science288, 527-531.
MacNeilage, P.F. et al. (2000) The motor core of speech: a comparison of serial organization patterns in infants and languages. Child Development 71, 153–163.
MacNeilage, P.F. et al. (1999)  Origin of serial output complexity in speech.  Psychological Science 10, 459-460.
Matyear, C.L. et al. (1998)  Nasalization of vowels in nasal environments in babbling: evidence for frame dominance.  Phonetica 55, 1-17.
Mitani, J.C. et al. (1992)  Dialects in wild chimpanzees?  American Journal of Primatology 27, 233-243.
Petito, L.A. and Marentette, P. (1991)  Babbling in the manual mode: evidence for the ontogeny of language.  Science 251, 1483-1496.
Savage-Rumbaugh, E.S. (1993)  Language comprehension in ape and child.  Monographs of the Society for Research into Child Development 58, 1-222.
Stokoe, W.C. (2001)  Language in hand: Why sign came before speech. Gallaudet University Press.
Tomasello, M. (1999)  The Human Adaptation for Culture.  Annual Review of Anthropology 28, 509-529.

Telling Tales

Scissors cut paper.

Paper wraps rock.

Rock blunts scissors.

These sentences are not only the ‘unspoken rules’ of the game; they are the game.

The players also understand other unspoken rules.  Let us consider them as a ‘rule of three’.

(i) Each sentence must contain two items and an action.

(ii) The order of words shows which item is acting (‘the subject’) and which is acted upon (‘the object’).

(iii) The rules alone are not enough; the meaning must make sense to both players.


As we speak, we create ‘story’.  We do this using the unspoken rules of our language (the grammar and syntax) to ‘make sense’ of our words and those of others.  The rules used are specific to each language; there is no underlying structure of language (a ‘universal grammar’) as Chomsky has suggested.  Instead, a universal principle applies; all languages, even the simplest of creoles, quickly evolve their own patterns of grammar and syntax.  These patterns produce the attributes of ‘story’, the structure which allows a language to work.

Stories have a structure.

Stories should have a beginning, a middle and an end…

But not necessarily in that order

(Jean Luc Godard)

The mimed game of ‘rock, paper, scissors’ tells the story of how these objects interact.  The sentences build relationships between the objects.  The rules of grammar and syntax allow us to organise the words into sequences that show what these relationships are.  We use this structure to code meaning into our speech.

As we learn words, so we ‘imprint’ the structure of our language.  We also acquire common expressions, such as sayings and other phrases or ideas with metaphorical meanings (e.g. characters from fairy tales) that are understood by our cultural group.

Many songbirds and other animals also produce imprinted learned calls made of ordered sequences (phrases) of sounds.  We recognise these repeating patterns in the speech we hear and ‘imprint’ as children.  The differences in our childhood learning environments give rise to variations in the way we produce sounds, such as regional accents.

A male Zebra Finch (Taeniopygia guttata) at Dundee Wildlife Park, South Australia. The complex calls of Zebra finches and many other songbirds are learned.  Juvenile birds have a period of imprinting, in which they mimic the adult calls of their ‘tribe’.  This learning window closes as they reach adult age, at which point their song pattern becomes ‘fixed’ (Image: Wikimedia Commons)

A male Zebra Finch (Taeniopygia guttata) at Dundee Wildlife Park, South Australia. The complex calls of Zebra finches and many other songbirds are learned. Juvenile birds have a period of imprinting, in which they mimic... more the adult calls of their ‘tribe’. This learning window closes as they reach adult age, at which point their song pattern becomes ‘fixed’ (Image: Wikimedia Commons)

Speaking involves learning and repeating these sound-generating movement patterns, along with other movements such as facial expressions and hand gestures.  In this sense, our speech is formed from a set of stylised vocal and non-vocal movement patterns.  We learn this ‘dance’ from our cultural group.

Most of our words, phrases, sayings, metaphors and inferences result from inherited patterns in our cultural context.  This ‘local effect’ has also been observed in ‘dialects’ (local pattern variations) of birdsong.

Stories ‘move us’ physically and through emotions.

Movement is part of all types of communication.  In the course of the action (the verb) within a speech phrase, the ‘subject’ (the ‘character’ which is active) moves between physical states as a result of having made this action.

We understand that this change has taken place through the emotional shift we associate with this change.  The very word ‘emotion’ contains the Latin root movere, to move, and the prefix ‘e-’ means ‘out of’.

We find that we are able to ‘read’ emotion in the face of this male Barbary Macaque (Macaca sylvanus) and his offspring.  All primates use facial expressions and involuntary vocal calls to  communicate emotions. Macaques and other monkeys change their facial expression involuntarily, revealing information about their emotional state even when these animals are not interacting directly with each other.  For instance, these and other primates are sensitive to audible rhythms, and upon hearing them, produce changes in facial expression which is matched by a shift in the neural circuitry of the brain (Image: Wikimedia Commons).   Watch this happen.

We find that we are able to ‘read’ emotion in the face of this male Barbary Macaque (Macaca sylvanus) and his offspring. All primates use facial expressions and involuntary vocal calls to communicate emotions. Macaq... moreues and other monkeys change their facial expression involuntarily, revealing information about their emotional state even when these animals are not interacting directly with each other. For instance, these and other primates are sensitive to audible rhythms, and upon hearing them, produce changes in facial expression which is matched by a shift in the neural circuitry of the brain (Image: Wikimedia Commons).

We code our words with symbolic meaning by tagging our understanding of what they represent with how we feel.  Our words are therefore symbols that contain both semantic and emotional content.  This content is conveyed into our speech through the pitch, timbre, tone and pace of our voice, as well as the rhythm and stress patterns within our speech.

Vital as this is, usually we are not conscious of this musical content, known as ‘prosody’, which transmits emotional information.  Beyond ourselves, we can also discern and infer emotional cues in the involuntary vocal calls and facial expressions (‘facial gestures’) of other primates.

Producing speech with mouths, hands, or even through remote means such as writing, involves movement.  Indeed, the same neural pathways, involving the so-called ‘mirror neurons’,  are activated both when we either think of or hear a word, and when we encounter the experience which the word symbolises.  This neural network extends into the emotional centres of our brains, allowing us to code the meaning of words by relating emotions to our experiences.

Spoken words are rhythmical ‘vocal gestures’ made with our eating apparatus (the mouth and throat), and are coordinated with the rhythm of our breathing.  Controlling these movements involves coupling and modifying rhythmic outputs from multiple Central Pattern Generator  circuits.

These ‘neural metronomes’ generate autonomous rhythmic nerve impulses that drive repetitive movements like chewing and walking.  Higher brain centres initiate and coordinate these signals, integrating them via the basal ganglia into ordered sequences of finely controlled motor movements.  The order, pace and musicality of our speech results from combining sets of these neural patterns with pattern modulations and interruptions, in a way which works with the rules that structure our language.

We walk as we speak; with intention.  The rhythmic motor movements that both of these processes involve are learned patterns with a cultural basis.  The distinctive walking gait of the Maasai tribe from the Kalahari in southern Africa has a very low impact on the body.  Their traditional nomadic life involves walking with their cattle for hundreds of miles, between grazing areas.  Their walking style means that they suffer from little or none of the wear-and-tear we would expect from this level of activity (Image: Wikimedia Commons)

We walk as we speak; with intention. The rhythmic motor movements that both of these processes involve are learned patterns with a cultural basis. The distinctive walking gait of the Maasai tribe from the Kalahari in so... moreuthern Africa has a very low impact on the body. Their traditional nomadic life involves walking with their cattle for hundreds of miles, between grazing areas. Their walking style means that they suffer from little or none of the wear-and-tear we would expect from this level of activity (Image: Wikimedia Commons)

Our body movements, like our language use and accents, follow patterns that we learn as children from ‘our tribe’. The unique rules and patterns of a language, like its vocabulary, are indicators of the cultural content and perspective they express. A speaker and listener must agree the meaning of the words and phrase structure they use. Communication then, involves creating ‘story’ using movements which are simultaneously a whole body and a whole society activity.

Stories have something to say.

In the game ‘scissors, rock, paper’, both players understand the meaning of the symbolic gestures used and the relationship between them.  Their interaction is understood as an event in time and space.  In language, we ‘tell stories’ using ordered words and phrases in order to convey an intended meaning.

‘Scissors cut paper’

The sentence describes an event which may have happened, is happening or could happen.  An object acts upon another object in time and space, until there is a resolution.  The objects are represented as gestures.  Considering words as gestures in sound, when we hear or read a simple sentence from the game, we internally reproduce (‘represent’) the associated objects, actions and representing gestures in our mirror neuron circuits along with the experiences they symbolise.  The mirror system, associated with areas of motor activity, allows us to revisit the embodied experience of movements associated with these ideas.

What stories have to say involves a journey (a beginning, a middle and an end).  The characters in a story are animated.  They appear in the plot with a discernible ‘motivation’ revealed by their actions.  This action results in a change of state for that character.  This movement is true at different levels of resolution , such as the object acting within a single phrase, or a character in an epic tale.  The narrative created by grammar and syntax ‘animates’ objects into a change of state in the phrase.  This codes for a change of meaning (Image: Wikimedia Commons)

What stories have to say involves a journey (a beginning, a middle and an end). The characters in a story are animated. They appear in the plot with a discernible ‘motivation’ revealed by their actions. This action ... moreresults in a change of state for that character. This movement is true at different levels of resolution , such as the object acting within a single phrase, or a character in an epic tale. The narrative created by grammar and syntax ‘animates’ objects into a change of state in the phrase. This codes for a change of meaning (Image: Wikimedia Commons)

We perceive the word symbols and emotional content of speech as patterns.  Mammals may be particularly competent at recognising patterns and edges, but our pattern recognition ability is exceptional.  When we speak, we use sound pattern motifs (words) coded with symbolic ideas, and then order these into sequences.

Forming our words and phrases involve making a series of movement patterns.  Our speech and body posture reveals how we understand ourselves.  As our children progress in learning to speak, they reveal the progress of their capacity to take charge of their thoughts.  The story that emerges through our spoken words reveals what we think and how we feel.  We craft this narrative from the meaning we assign to our observations, and use music and movement to mirror this back to ourselves and others.  How could this story mechanism have begun, and how has it evolved?

Did our stories begin with singing? 

Amongst the apes, our ability to sing is unique.  The musicality of our speech has many components, including variations in rhythm, phrasing, pitch and tone.

(i) Rhythm

Basic ‘rock’ drum rhythm pattern, notated for bass, snare and cymbal.  Listen to this being played here. Rhythm is a basic component of the music of our speech, and in English this rhythm often has priority over other factors in the way we pronounce words.  For example, we typically pronounce the word thirteen as ‘thir-TEEN’, with stress on the second syllable.  If this word comes ahead of another word with stress on the first syllable, such as ‘WO-men', we pronounce this ‘THIR-teen WOmen’.  This shift in the stress peak maintains a ‘beat, offbeat, beat, offbeat’ rhythm pattern in our speech, similar to this rock drum riff.

Basic ‘rock’ drum rhythm pattern, notated for bass, snare and cymbal. Listen to this being played here. Rhythm is a basic component of the music of our speech, and in English this rhythm often has priority over othe... morer factors in the way we pronounce words. For example, we typically pronounce the word thirteen as ‘thir-TEEN’, with stress on the second syllable. If this word comes ahead of another word with stress on the first syllable, such as ‘WO-men’, we pronounce this ‘THIR-teen WOmen’. This shift in the stress peak maintains a ‘beat, offbeat, beat, offbeat’ rhythm pattern in our speech, similar to this rock drum riff.

Speech is inherently rhythmical. Indeed in English, we even shift the stress of a word to maintain that rhythm. Andrew Carstairs-McCarthy has suggested that syntax evolved from the same basic (open-close) ‘oscillating motor’ mechanism by which we also organise our articulated sounds into ‘consonant-vowel-consonant’ syllables.  Alternation is discernible in the rhythm of speech, and appears also in word order, such as the object-action-object structure of ‘paper wraps rock’.

Every human culture has some form of music with a beat.  Rhythm determines how we perceive and process musical information.  Syllables and small clauses have a bimodal rhythmic structure which helps to articulate the sounds within the phrase; for example:

‘PAper wraps ROCK’.

Asif Ghazanfar suggests that rhythmic communication is found throughout the higher primates, in the form of behaviours such as chimpanzee pant-hoots and rhythmical facial gestures such as lipsmacks.  This implies that a similar bimodal rhythm mechanism was present in the communication behaviours of our shared ancestors.

(ii) Pitch

William Tecumseh Fitch suggests that speaking began with prosody; he proposes that our hominin ancestors’ ‘proto-language’ may have initially used intonation (controlling the pitch of sounds) rather than word-based syllables.  Many current world languages are based on tones rather than word forms.

Organ pipes from the old church at Pellworm, Schleswig-Holstein.  Organ pipes produce their note as air passes across the pipe and resonates the air in the cylinder.  Longer pipes produce lower pitch notes.  Male mammals drop their larynx to vocalise, producing a lower pitched call.  The second descent of the human male larynx during puberty deepens a man’s voice (Image: Wikimedia Commons)

Organ pipes from the old church at Pellworm, Schleswig-Holstein. Organ pipes produce their note as air passes across the pipe and resonates the air in the cylinder. Longer pipes produce lower pitch notes. Male mammals d... morerop their larynx to vocalise, producing a lower pitched call. The second descent of the human male larynx during puberty deepens a man’s voice (Image: Wikimedia Commons)

Fitch argues that the vocal control which allowed our ancestors to sing may have been later ‘exapted’ to produce such pitch-based proto-syllables, and that articulation of vowel and consonant sounds came later as a means of expanding the diversity of this sound repertoire.  This makes a plausible case for the origins of our ability to articulate syllables arising from song.

The human male larynx descends during puberty, giving a deeper formant (resonant frequency) which enables us to distinguish male from female voices mainly by their pitch and tone.  In contrast to most songbirds, our ability to speak (and sing) is balanced between the sexes.  This suggests that however human speech arose, it was not primarily to attract mates, as is the case in most songbirds and vocal mammals.

Our mode of communication is adaptable to our context.  The Yoruba from west Africa traditionally use ‘talking drums’ to communicate with villages up to 5 miles away.  The pitch of these drums can be varied when played, mimicking words from the Yoruba language, which is based on tonal shifts (Image: Wikimedia Commons)

Our mode of communication is adaptable to our context. The Yoruba from west Africa traditionally use ‘talking drums’ to communicate with villages up to 5 miles away. The pitch of these drums can be varied when playe... mored, mimicking words from the Yoruba language, which is based on tonal shifts (Image: Wikimedia Commons)

This may be true, but David Puts and colleagues have found that the pitch of a man’s voice alters in the presence of other males, inferring an element of between-male competition in the establishment of dominance.  The second descent of the human male larynx may therefore have evolved as a cue to signal the individual’s status within the tribe, or to defend territorial boundaries.

In addition, low sounds travel further.  Deeper male voices may have proven more effective at coordinating the tribe when hunting in low visibility conditions such as dense forest or over long distances in the open savannah.

(iii)  Musical phrasing

In English, we use variations in relative pitch to shape our phrases and code emotional meanings.  These ‘melodies’ overlay variations in pitch on to the rhythmic patterns within words.   Shifts in pitch give stress and emphasis, so shaping these phrases.  This musicality is crucial to reveal our intended meaning to others, while intonation makes it easier for a listener to distinguish the endings of our phrases.  As children we learn to make these musical (prosodic) sound patterns alongside our capacity to articulate the vowels and consonants.

 

European Starlings (Sturnus vulgaris) taking an opportunity to feed.  These birds are accomplished vocal mimics, and can add new sounds to their song repertoire throughout life.  Their calls comprise repeating syllable sequences; each bird’s song is distinctive, and seems to enable individuals to recognise others from their flock (Image: Wikimedia Commons)

European Starlings (Sturnus vulgaris) taking an opportunity to feed. These birds are accomplished vocal mimics, and can add new sounds to their song repertoire throughout life. Their calls comprise repeating syllable se... morequences; each bird’s song is distinctive, and seems to enable individuals to recognise others from their flock (Image: Wikimedia Commons)

Both young songbirds and human children have sensitive periods of vocal learning that requires social feedback from adults.  Like human babies, songbirds have a ‘babbling’ phase where they try out sounds prior to rehearsing and imprinting their adult call.  These birds build phrased sequences from repeating syllable elements, punctuated with explicit pauses.

Repeated elements may add emphasis, although modifying the order does not seem to alter the meaning of the call, which advertises their suitability as a mate.  Sequence variations may however serve to identify individuals within the flock for some species, such as starlings.

Steven Brown shows that the most complex song forms in birds and in other primates arises amongst monogamous pairs of duetting tropical songbirds and gibbons. These calls appear to be significant in the defence of territories and maintaining social bonds.  He notes that courtship calls are rare to non-existent in our sub-clade of the higher primates; neither chimps nor bonobos vocalise complex learned song.  Although in theory their vocal tract anatomy would enable them to produce some vowel and consonant sounds, no chimps have ever done so.

In contrast, territorial calls are found throughout the entire primate clade.  A deeper male voice may permit a means to distinguish authority and hierarchy within the tribe, and may have been influential in ‘vocal grooming’ amongst our hominin ancestors.

Babies progress from early babbling sounds through repeating simple naming words, and to simple abstracted words such as pronouns (he, she, they) and simple sentences (e.g. ‘carry me’) by age 2, and talks simply about their day by age 4.   Chimps growing up with close human contact cannot speak, although they can be taught sign gestures.  However their ability to combine these gestures into sequences is limited (Image: Wikimedia Commons)

Babies progress from early babbling sounds through repeating simple naming words, and to simple abstracted words such as pronouns (he, she, they) and simple sentences (e.g. ‘carry me’) by age two, and talks simply a... morebout their day by age four. Chimpanzees growing up with close human contact cannot speak, although they can be taught sign gestures. However their ability to combine these gestures into sequences is limited (Image: Wikimedia Commons)

In order to produce sequences of symbolically coded, articulated sounds, however, our ancestors must have been capable of organising their proposed actions into sequences.  Neurological studies show that organising our thoughts into sequences requires a high degree of brain connectivity.

This would mean that the developmental shift in the degree of connectedness between neurons  that allowed hominins to enact organised sequences of behaviours, may have provided the circuitry that was later ‘exapted’ for language syntax.

How did our storytelling behaviour evolve? 

Other animals’ behaviours are driven broadly by instinctual need.  Whilst humans do operate at this level, what is distinctive about our behaviour is the capacity to produce actions to achieve an intended purpose.  Our speech is unique because we can intentionally use it to order our thoughts and tasks in time.

Illustration by Grandville (1803-1847) for one of the Fables of Aesop.   This is one of a number of tales credited to Aesop, a slave and story-teller believed to have lived in Ancient Greece between 620 and 560 BCE.  These cautionary tales use metaphors of unbelievable scenarios (here a fox and a crow having a conversation) to code much bigger meanings.  This tale cautions about being susceptible to flattery.   A fox was walking through the forest when he saw a crow sitting on a tree branch with a fine piece of cheese in her beak. The fox wanted the cheese and decided he would be clever enough to outwit the bird.       “What a noble and gracious bird I see in the tree!" proclaimed the fox, "What exquisite beauty! What fair plumage! If her voice is as lovely as her beauty, she would no doubt be the jewel of all birds."      The crow was so flattered by all this talk that she opened her beak and gave a cry to show the fox her voice.      "Caw! Caw!" she cried, as the cheese dropped to the ground for the fox to grab (Image: Wikimedia Commons)

Illustration by Grandville (1803-1847) for one of the Fables of Aesop. a slave and story-teller believed to have lived in Ancient Greece between 620 and 560 BCE. These cautionary tales use metaphors to code much bigger ... moremeanings. This tale cautions about being susceptible to flattery.             A fox was walking through the forest when he saw a crow sitting on a tree branch with a fine piece of cheese in her beak. The fox wanted the cheese and decided he would be clever enough to outwit the bird. “What a noble and gracious bird I see in the tree!” proclaimed the fox, “What exquisite beauty! What fair plumage! If her voice is as lovely as her beauty, she would no doubt be the jewel of all birds.” The crow was so flattered by all this talk that she opened her beak and gave a cry to show the fox her voice. “Caw! Caw!” she cried, as the cheese dropped to the ground for the fox to grab (Image: Wikimedia Commons)

At the cognitive level, this is in essence the same type of task as ordering a sequence of actions using manual tools to achieve a goal.  We use words as tools to share information by constructing an idea before we can share this information with others.

In both cases, we construct the tasks in a sequence, formulating the goal (the completed manual task or the delivering of the information) before we begin.  This structuring is an active process that takes place even for a remembered event.

As we revisit our memories, we include only certain details in our narrative; these details trigger pattern recognition in our own thinking (and that of our listener).  This processing and editing of information is the essence a structuring of patterned information.  The establishment of a ‘narrative’ (a sequence of events) applies equally to using a manual tool and ‘telling a ‘story’.

Our repertoire of stories, folk tales and fairy tales are amongst the tools by which we share culturally what is important to us and how we think.  Using story in this way reveals patterns; we discern deeper meanings and lessons from the presented information, which is a mechanism to share understanding.

Being able to project different interpretations to evaluate cues from the environment provide a means for assessing risk, and resulting in different types of behaviour choices.  The means to effectively evaluate a cue such as rustling in the undergrowth as either an opportunity (a potential food item such as a small mammal) or a threat (a wolf) would quickly prove a selectable advantage.

These marks in sand reveal the recent passage of a grey wolf (Canis lupus). Understanding such cues from the environment would have been useful to the survival of our hominin ancestors (Image: Wikimedia Commons)

These marks in sand reveal the recent passage of a grey wolf (Canis lupus). Understanding such cues from the environment would have been useful to the survival of our hominin ancestors (Image: Wikimedia Commons)

Many animals recognise cues which index the presence of another animal (e.g. a scent, or footprints).  It is easy to imagine how our ancestors’ capacity to evaluate these cues strategically using their ‘thinking tools’ would quickly change their chances of survival.

Being able to share these thoughts and coordinate their responses with others would radically shift the ecology of the tribe into a mode where sensory awareness and experience held and understood collectively by the ‘quorum’.

Words then are tools that define boundaries between ideas, and help to structure our collective thinking.  The making and using of tools requires the execution of sequences of patterns, in the form of intentional fine motor movements.

The gymnast Jade Barbosa, competing for the Mediterraneo Gym Cup July 5th 2008 in Rome, 2008.  To work on the barr requires highly developed balance and precise coordination.  Standing upright is a whole body activity.  From Lieberman’s research, it appears that this posture has remodelled our ancestors’ entire physiology from breathing to childbirth.  Balancing on two legs requires higher levels of motor control than other primates use to walk on all fours.  Walking therefore may be the behaviour that selected for our precise coordination.  In addition, a bipedal posture frees up the hands, allowing us to perform new manual tasks with tools.  The precise sequences of fine motor movement used to manipulate tools may have established the neurological networks through which our ancestors were better sequence their thoughts as well as their actions (Image: Wikimedia Commons)

The gymnast Jade Barbosa, competing for the Mediterraneo Gym Cup July 5th 2008 in Rome, 2008. To work on the barr requires highly developed balance and precise coordination. Balancing whilst standing upright is a whole... more body activity. From Lieberman’s research, it appears that this posture has remodelled our ancestors’ entire physiology from breathing to childbirth. In addition, it frees up the hands, allowing us to perform new manual tasks with tools. The precise motor movement used to manipulate tools may have established the neurological networks through which our ancestors were better sequence their thoughts as well as their actions (Image: Wikimedia Commons)

Philip Lieberman suggests that our upright posture may have pre-adapted our ancestors for the enhanced motor control needed for tool-making, tool using and speech.  Walking itself is a simple patterned, repetitive movement.

Our basic ‘walking instinct’ initially activates a Central Pattern Generator circuit driving movement in all four limbs. Our newborns’ initial locomotion usually involves crawling.  The subcortical basal ganglia of the language network in the brain also regulate the muscles controlling our upright posture.

Walking, and other sequences of behaviour such as speech, are learned gradually over years.  Heel strike, which marks efficient bipedal locomotion, takes years to develop.

Learning to use words as symbols requires repetition of the word and an internal coding of it as a pattern motif.  The characteristic of any pattern is that it contains repeating elements, and that repetition stimulates our memory to the degree of the strength (i.e. our familiarity with) that pattern.  The sequence of word order determined by a language’s grammar and syntax provides in essence a framework for creating patterns.

This ability to associate gestures with ideas and order them into sequences is present to a limited degree in our primate relatives.  Wild bonobos combine call types together into longer mixed sequences upon finding a food cache.  Other tribe members understand information about the quality of this find from these different sound sequences, and respond with appropriate types of foraging behaviour.

Bonobos (Pan paniscus) use four tonal calls when finding food (barks, peeps, peep-yelps, and yelps).   These animals forage in dense forest.  Upon finding a cache of fruit, their calls bring the group together to feed.   Playing recordings of calls made upon encountering a good quality food find in a location where only a poor quality crop was available, prompted these animals to forage as though in the presence of the better food source (Image: Wikimedia Commons)

Bonobos (Pan paniscus) use four tonal calls when finding food (barks, peeps, peep-yelps, and yelps). These animals forage in dense forest. Upon finding a cache of fruit, their calls bring the group together to feed. Pla... moreying recordings of calls made upon encountering a good quality food find in a location where only a poor quality crop was available, prompted these animals to forage as though in the presence of the better food source (Image: Wikimedia Commons)

Higher primates have a mirror neuron network which is triggered by facial expressions and grasping movements associated with obtaining food, although they cannot mirror mimed actions.

Why do we tell stories? 

Communication involves transmitting and receiving a message that is understood the same way by sender and recipient.  Human speech is the same, but is driven by the intention to share a meaning.  As well as a repertoire of pre-arranged signals with agreed meanings, we need to have something to say.  We tell stories therefore to communicate an intention.  This means that our ancestors must have felt compelled to convey the contents of their thoughts to others.

This diagram is adapted from Claude Shannon’s paper, ‘A mathematical theory of communication’; produced whilst working for the Bell Telephone company in 1948.  In conjunction with Warren Weaver, this idea was popularised into a book (The mathematical theory of communication), published in 1949.  This is now known as the Shannon-Weaver communication model.  In this model, information is transformed multiple times between the source and destination.     •An information source that produces a message •A transmitter that operates on the message to create a signal which can be sent through a channel •A channel, through which the message travels (and during which process it may be distorted by ‘noise’ interference or otherwise modified by the environment) •A receiver, which transforms the signal back into the message intended for delivery •A destination, which can be a person or a machine, for whom or which the message is intended Messages are transmitted through the body by some means (for example the spoken word, mimed gestures or writing an email) and then through an external medium (e.g. the air, by sight, or cyberspace) until it is received by another human through their senses.  The sensory information is then decoded, and an understanding of the message is crafted by the signal receiver.

This diagram is adapted from Claude Shannon’s paper, ‘A mathematical theory of communication’; produced whilst working for the Bell Telephone company in 1948. In conjunction with Warren Weaver, this idea was popul... morearised into a book (The mathematical theory of communication), published in 1949. This is now known as the Shannon-Weaver communication model. In this model, information is transformed multiple times between the source and destination.• An information source that produces a message.• A transmitter that operates on the message to create a signal which can be sent through a channel.• A channel, through which the message travels (and during which process it may be distorted by ‘noise’ interference or otherwise modified by the environment).• A receiver, which transforms the signal back into the message intended for delivery.• A destination, which can be a person or a machine, for whom or which the message is intended.Messages are transmitted through the body by some means (for example the spoken word, mimed gestures or writing an email) and then through an external medium (e.g. the air, by sight, or cyberspace) until it is received by another human through their senses. The

We speak to communicate the contents of our minds.  It is perhaps then the patterns of our thinking, rather than our speaking, which are the truly unique feature of human communication.  Our ability to combine ideas into a syntactic structure and create new associations demarcates a boundary between our thinking and that of our closest relatives, the chimps.

Captive chimpanzees and bonobos can be taught to understand some signed words or use pictorial symbols, and the most accomplished of these learners can combine certain of their vocabulary into ‘small clause’ forms, for example ‘agent + action’ or ‘action + object’.

Ljiljana Progovac argues that these ‘small clauses’ are the basic units of word combination that all languages share.  She proposes these forms as the ‘proto-syntax’ from which our more complex structures have evolved.  If this is correct, then this suggests that the most basic components of our ability to combine words into language were present in our common ancestors with chimpanzees.

A bonobo (Pan paniscus) at Cincinatti zoo, communicates using a gesture.  Arguably the most accomplished of the language trained chimps was the male bonobo Kansi, raised by Sue Savage Rumbaugh and others at Georgia State University.  Kansi learned to use some 400 words, using a board of pictorial (abstract) symbols.  He regularly used around 30 to 40 of these in a typical day, and combined them occasionally into pairs.  However although Kansi was able to understand simple phrases (such as ‘subject + verb + object’), could follow simple instructions, and could identify animals and humans by name, he did not use these language structures to talk about himself (Image: Wikimedia Commons)

A bonobo (Pan paniscus) at Cincinatti zoo, communicates using a gesture. Arguably the most accomplished of the language trained chimps was the male bonobo Kansi, raised by Sue Savage Rumbaugh and others at Georgia State... more University. Kansi learned to use some 400 words, using a board of pictorial (abstract) symbols. He regularly used around 30 to 40 of these in a typical day, and combined them occasionally into pairs. However although Kansi was able to understand simple phrases (such as ‘subject + verb + object’), could follow simple instructions, and could identify animals and humans by name, he did not use these language structures to talk about himself (Image: Wikimedia Commons)

The way we acquire language reveals the process by which we develop a sense of self-identity.  Our children’s awareness of ‘self’ and ‘other’ is expressed in their language, although it is not dependent upon it. An understanding that others share similar experiences (known as a ‘theory of mind’) develops gradually during their first five years.  The ability to organise words into small clauses such as ‘scissors cut’, or ‘cut paper’ appears in human children at between 1 and 2 years of age.

The more accomplished language learning primates, such as the bonobo Kansi, associate sounds and symbols with objects and even seemed to understand abstract concepts such as ‘happy’.  Bonobos at the Georgia State University primate language research project use a lexigram (symbols board) to create two-word small clauses.  None of these animals, however, has ever attempted a self-description such as ‘I think…’, ‘I feel…’ or ‘I want…’   This ability develops in human children between 2 and 3 years of age.  The bonobo’s use of human symbol-based language, in contrast, arrests at a stage roughly equivalent to a human child of around two years.

Adults and children pretending to be bears and following a trail of ‘footprints’ (Image: Wikimedia Commons)

Adults and children pretending to be bears and following a trail of ‘footprints’ (Image: Wikimedia Commons)

All young mammals play, but perhaps the most intriguing part of human communication is that play is incorporated into our information sharing.  The simple story of scissors, rock and paper is fiction in that the items do not need to exist to be ‘present’ and interact in the game.  Fiction captivates our attention and holds it far more easily than factual narratives.

Within a story, our thoughts project information based upon past experience into the future, allowing us to ‘play out’ the actions in drama in mime before attempting the task.  Studies on primate mirror neuron responses report that these animals typically do not respond to such mimed gestures.

The act of telling stories shifts our ecology into the ‘cognitive niche’; we operate in a world of endlessly combined and recombined ideas.  Emotions are the source of meaning for these ideas.  Perhaps our evolved story mechanism can be considered as a means of evolved emotional language, conveying higher orders of meaning and inferred understanding to our experiences.  This combining of ideas, putting one into another to make a new meaning which is different from that of the ideas on their own, is known as ‘recursion’.  It is considered to be a defining characteristic of our capacity for creativity in our thoughts.

In summary then, our ‘story mechanism’ translates thoughts into actions, and enables us to bring new things and events into being.  Coding information in story form makes it possible to share past experiences, ‘reword’ them into new sequences, and by communicating with others, project these imaginings into the future.

The West Tofts handaxe (also called a ‘biface’) is believed to date to around 400 thousand years ago. This particular handaxe is intriguing because it has a shell fashioned into one side. Given that the cortex (the outer layer of the stone) around the fossil is intact, it seems likely that it was intentionally fashioned into the centre. This is significant because it could represent one of the earliest examples of an appreciation of aesthetics which extends beyond utilitarian function (Image: Reproduced by the permission of the University of Cambridge Museum of Archaeology & Anthropology. Accession no. 1916.82)

The West Tofts handaxe (also called a ‘biface’) is believed to date to around 400 thousand years ago. This intriguing tool has a shell at its centre.  Unusually for this type of tool, the cortex (the outer layer o... moref the stone) has been left intact. This suggests that its maker took a deliberate aesthetic decision to make it in a way which preserved the shell.  This may be one of the earliest examples of a human-crafted object intended to be both functional and beautiful. (Image: Reproduced by the permission of the University of Cambridge Museum of Archaeology & Anthropology. Accession no. 1916.82)

It is clear that our hominin ancestors developed neural pathways that allowed them to copy and learn movement sequences, including vocal movements, and used these skills to share their intentions with others.  What is less clear is how they came to understand that others had minds like their own, prompting their yearning for connection.

Whatever its cause, however, an expansion of conscious awareness drove our ancestors to share their ideas and understanding, and to begin to tell each other their story.

Conclusions

  • Our speech comprises a complexity of coded sounds that we are able to order and organise.  This organisation has ‘rules’ that a listener can use to perceive and translate what they hear into meaning.
  • Using speech-based language allows us to put our thoughts in an order, and then control our actions in a defined and directed way.  Neurologically there is no difference between using manual tools and word tools in an ordered sequence; the brain codes both of these as a set of gesture-based motor movements.
  • Some birds and other animals are able to learn complex sound sequence patterns, mostly as a display signal for sexual selection.  In contrast, human speech and other forms of communication are gender balanced.  We use our communication sequences to bond socially and carry and transmit collectively held ideas.
  • The behavioural choices of other animals and birds is the result of reactions to their circumstances.  Humans put their thinking into words, and voice their intentions.
  • There is nothing biologically unique about the behaviours that allow us to speak.  Our language function suggests instead a greater level of connection between these abilities than is found in other animals.  This has allowed us to assume fine control over our movements that produce vocalisations and other actions, and also by implication our thoughts.  This may be a result of the physical changes needed to allow our ancestors to walk upright.
  • What human language enables us to express is a sense of self-identity; it is a means of defining ‘our story’.

Text copyright © 2015 Mags Leighton. All rights reserved.

References
Amandor, A. et al. (2013)  Elemental gesture dynamics are encoded by song premotor cortical neurons.  Nature 495, 59-64.
Astington, J. W. and Edward, M.J. (2010)  The development of theory of mind in early childhood.  Encyclopedia on Early Childhood Development, 1-6. Edward M.J (Ed)  Published online at www.child-encyclopedia.com
Atran, S. (1982)  Constraints on a theory of hominid tool-making behaviour.  L’Homme 22, 35-68.
Bouwer, F.L. et al. (2014)  Beat processing is pre-attentive for metrically simple rhythms with clear accents: an ERP study.  PLoS ONE 9, e97467.
Boyd, B. (2009)  On the Origin of Stories: Evolution, Cognition and Fiction.  Harvard.
Brown, R. (1973)  Development of the first language in the human species.  American Psychologist 28, 97-106.
Brown, S. (2000)  Evolutionary models of music: from sexual selection to group selection.  Perspectives in Ethology 13, 231-281.
Bruner, J.S. (1975)  From communication to language – a psychological perspective.  Cognition 3, 255-287.
Clark, K.B. and Clark, M.K. (1939)  The development of consciousness of self and the emergence of racial identification in negro preschool children.  Journal of Social Psychology, S.P.S.S.I. Bulletin 10, 591-599.
Corballis, M.C. (2007)  Recursion, language, and starlings.  Cognitive Science 31, 697-704.
Dittrich, F. et al. (2013)  Maximized song learning of juvenile male zebra finches following BDNF expression in the HVC.  European Journal of Neuroscience 38, 3338-3344.
Donald, M (2001)  A mind so rare: the evolution of human consciousness.   Norton
Dunbar, R.I.M. (2003)  The social brain: mind, language, and society in evolutionary perspective.  Annual Review of Anthropology 32, 163-181.
Eisen, A. et al. (2014)  Tools and talk: an evolutionary perspective on the functional deficits associated with amyotrophic lateral sclerosis.  Muscle & Nerve 49, 469-477.
Evans, N. and Levinson, S. (2009)  The myth of language universals: language diversity and its importance for cognitive science.  Behavioral and Brain Sciences 32, 429-448.
Everett, D. L. (2009)  Don't Sleep, There are Snakes: Life and Language in the Amazonian Jungle.  Random House.
Everett, D. L. (2012)  Language: The Cultural Tool.  Random House.
Everett, D. L. (in press)  The role of culture in the emergence of language 1.  In The Handbook of Language Emergence (W. O’Grady and B. MacWhinney, eds).  Wiley-Blackwell.
Everett, D. L. (in press) Sculpting language: A review of the David McNeill Gesture Trilogy.  In The Handbook of Language Emergence (W. O’Grady and B. MacWhinney, eds).  Wiley-Blackwell.
Fitch, W.T. (2005)  The evolution of language: a comparative review.  Biology and Philosophy 20, 193-230.
Fitch, W.T. (2011)  The evolution of syntax: an exaptationist perspective.  Frontiers in Evolutionary Neuroscience 3, article 9.
Fitch, W.T. (2012)  Evolutionary developmental biology and human language evolution: constraints on adaptation.  Evolutionary Biology 39, 613-637.
Gallistel, C.R. (2011)  Prelinguistic thought.  Language Learning and Development 7, 253–262.
Gentner, T.Q. et al. (2006)  Recursive syntactic pattern learning by songbirds.  Nature 440, 1204-1207.
Ghazanfar, A. (2013)  Multisensory communication in primates and the evolution of rhythmic speech.  Behavioural Ecology and Sociobiology 67, 1441-1448.
Gould, S.J. and Vrba, E.S. (1982)  Exaptation – a missing term in the science of form.  Paleobiology 8, 4-15.
Iverson, J.M. and Thelen, E. (1999)  Hand, mouth and brain.  Journal of Consciousness Studies 6, 19-40.
Jürgens, U. (2002)  Neural pathways underlying vocal control.   Neuroscience and Biobehavioural Reviews 26, 235–258.
Lai, J. and Poletiek, F.H. (2011)  The impact of adjacent-dependencies and staged-input on the learnability of center-embedded hierarchical structures.  Cognition 118, 265-273.
Lieberman, P. (1984)  The Biology and Evolution of Language.  Harvard.
Lieberman, P. (2001)  Human language and our reptilian brain: the subcortical bases of speech, syntax, and thought.  Perspectives in Biology and Medicine 44, 32-51.
Lieberman, P. (2006)  Toward an Evolutionary Biology of Language.  Harvard.
Lieberman, P. (2009)  Human language and our reptilian brain: The subcortical bases of speech, syntax, and thought.  Harvard.
Naoi, N. et al. (2012)  Prosody discrimination by songbirds (Padda oryzivora).  PLoS ONE 7, e47446.
Patel, A.D. and Iversen, J.R. (2014)  The evolutionary neuroscience of musical beat perception: the Action Simulation for Auditory Prediction (ASAP) hypothesis.  Frontiers in Systems Neuroscience 8, article 57.
Petitto, L.A. et al. (2001)  Language rhythms in baby hand movements.  Nature 413, 35-36.
Petkov, C.I. and Wilson, B. (2012)  On the pursuit of the brain network for proto-syntactic learning in non-human primates: conceptual issues and neurobiological hypotheses.  Philosophical Transactions of the Royal Society of London, B 367, 2077-2088.
Progovak, L. (2010)  Syntax: its evolution and its representation in the brain.  Biolinguistics 4, 234-254.
Puts, D.A. et al. (2006)  Dominance and the evolution of sexual dimorphism in human voice pitch.  Evolution and Human Behavior 27, 283-296.
Puts, D.A. et al. (2007)  Men’s voices as dominance signals: vocal fundamental and formant frequencies influence dominance attributions among men.  Evolution and Human Behavior 28, 340-344.
Rey, A. et al. (2012)  Centre-embedded structures are a by-product of associative learning and working memory constraints: evidence from baboons (Papio papio).  Cognition 123, 180-184.
Roy, A.C. et al. (2013)  Syntax at hand: common syntactic structures for actions and language.  PLoS ONE 8, e72677.
Savage-Rumbaugh S et al. (1998) Apes, language, and the human mind. Oxford
Selezneva, E. et al. (2013)  Rhythm sensitivity in macaque monkeys.  Frontiers in Systems Neuroscience 7, article 49.
Suddendorf, T. and Corballis, M.C. (2007)  The evolution of foresight: what is mental time travel, and is it unique to humans?  Behavioral and Brain Sciences 30, 299-351.
Vaesen, K. (2012)  The cognitive bases of human tool use.  Behavioral and Brain Sciences 35, 203-262.
Yip, M.J. (2006)  The search for phonology in other species.  Trends in Cognitive Sciences 10, 442-446.
Zawidzki, T.W. (2006)  Sexual selection for syntax and kin selection for semantics: problems and prospects.  Biology and Philosophy 21, 453-470.
Ziegler, W. (2013)  Therhythmicorganisationofspeechgesturesandthesenseofit.  Language, Cognition and Neuroscience 29, 38-40.

 

What’s so different about human speech?

Upon leaving the island, Odysseus is warned that storms lie ahead. His route home to Ithaca passes the sirens; monsters whose beautiful, haunting voices lure sailors to their deaths.  

He sets a course and explains his intention to the crew. At his command they fasten him to the mast, seal their ears with wax, and prepare for their encounter.

They reach treacherous waters. The siren song reaches into Odysseus’ mind, resonating with his deepest longings. The storm rages within. He struggles, but his bindings, the result of his clear intention, secure him tightly to the mast.

Forced to stand still and listen, he finds that he starts to hear the voices for what they really are; the empty fears of his own soul. He relinquishes his fight and hears the voice at his own still centre. The storm calms.

The crew notice that he has returned to his senses. They cut him loose.

He is indeed a wise and worthy captain.


This ancient footprint, first made in soft mud, is an index which shows us the passing of an three-toed Theropod dinosaur.  Denver, Colorado (Image: Wikimedia Commons)

This ancient footprint, first made in soft mud, is an index which shows us the passing of an three-toed Theropod dinosaur. Denver, Colorado (Image: Wikimedia Commons)

Speaking involves transmitting and interpreting intentional signs, some of which are also used in the instinctual communications of animals. These signals are of three kinds: 1. An index physically shows the presence of something, e.g. wolves tracking their prey by scent.

2. An ‘icon’ resembles the thing it stands for, like a photograph or a painting. Dolphins, apes and elephants recognise their own reflection; we assume that they interpret this two-dimensional image as representing their three-dimensional physical selves.

3. A symbol associates an unrelated form with a meaning. Our words are symbols, linking an idea with unique sound-and-movement sequences. They do not resemble the things they represent.

The ‘Union Jack’, a symbol of Great Britain since the union of Great Britain and Ireland in 1801.  It is made up of three other flag symbols; the Cross of St George for England (insert, top), St Andrew’s Saltaire for Scotland (centre), and for Northern Ireland, St Patricks Saltire (below) (Image: Wikimedia Commons)

The ‘Union Jack’, a symbol of Great Britain since the union of Great Britain and Ireland in 1801. It is made up of three other flag symbols; the Cross of St George for England (insert, top), St Andrew’s Saltaire f... moreor Scotland (centre), and for Northern Ireland, St Patricks Saltire (below) (Image: Wikimedia Commons)

Symbolism is almost unknown amongst animals, with a few rare exceptions. A stereotypical form of symbol is the ‘waggle dance’ of honeybees.

Although chimpanzees can be taught to use some sign gestures, they do not naturally communicate using symbols. In contrast, we use our symbolic language intentionally.

Our uses of speech are unique. We revisit our memories, order our thoughts and make future plans. With a destination in mind, we can listen for our ‘inner voice’, map out our route, take a stand against the storm of inner and outer distractions, and find our way home.

How is our speech unique?

Normal speech is already multi-channel; our words are accompanied by the musicality of our speaking, and our facial expressions and other physical gestures transmit many layers and levels of complex meaning.  Writing is another mode of communicating our language.  Social media transmits our language into virtual worlds.  The online social networking service Facebook commissioned ‘Facebook Man’ to commemorate their 150 millionth user   (Image: Wikimedia Commons)

Normal speech is already multi-channel; our words are accompanied by the musicality of our speaking, and our facial expressions and other physical gestures transmit many layers and levels of complex meaning. Writing is ... moreanother mode of communicating our language. Social media transmits our language into virtual worlds. The online social networking service Facebook commissioned ‘Facebook Man’ to commemorate their 150 millionth user(Image: Wikimedia Commons)

Aspects of our language ability are found in other animals, but the way we have combined and developed these traits is uniquely human.

1.  We use any available channel.

Most human languages use vocal speech. Under circumstances where speaking is not possible, we find other ways, e.g. sign languages and Morse code.

2.  We build our words from parts that gain meaning as they are combined

Most of the syllables we use to build words lack meaning on their own. Combining them together (as in English) or adding tonal shifts (as in Chinese) creates words.

3.  We code our words with meanings, making them into symbols

The chimpanzee (Pan troglodytes) known as Washoe (1965-2007) was the first non-human animal to be taught American sign language.  She lived from birth with a human family, and was taught around 350 sign words.  It was reported that upon seeing a swan, Washoe signed "water" and "bird".  Chimpanzees are capable of learning simple symbols.  However Washoe did not make the transition to combining these symbols together into new meanings  (Image: Wikimedia commons)

The chimpanzee (Pan troglodytes) known as Washoe (1965-2007) was the first non-human animal to be taught American sign language. She lived from birth with a human family, and was taught around 350 sign words. It was rep... moreorted that upon seeing a swan, Washoe signed “water” and “bird”. Chimpanzees are capable of learning simple symbols. However Washoe did not make the transition to combining these symbols together into new meanings (Image: Wikimedia commons)

Symbols are ‘displaced’, i.e. they do not need to resemble the thing they represent. Our words symbolise ideas, experiences and things.

4.  We combine these symbols to make new meanings.

We build words into phrases and stories, use these to revisit and share our memories, combine them into new forms, and communicate this information to others in various ways. Combining different symbols brings us a new understanding, which changes how we respond.

Look at this painting. As you do, consider what feelings it provokes.

'Wheatfield with crows' by Vincent Van Gogh, 1890.  (Image; wikimedia commons)

‘Wheatfield with crows’ by Vincent Van Gogh, 1890. (Image; wikimedia commons)

It is, of course, by Vincent Van Gogh. As you may know, his choices of colour and subject material were a personal symbolic code. He often used vibrant yellows, considering this colour to represent happiness.

His doctor noted that during his many attacks of epilepsy, anxiety and depression, Van Gogh tried to poison himself by swallowing paint and other substances.

As a consequence, he may have ingested significant amounts of toxic ‘chrome yellow’, which contains lead(II) chromate (PbCrO4).

Now consider this statement.

“This is the last picture that Van Gogh painted before he killed himself” (John Berger 1972, p28)

Look again at the picture.

What do you feel this time?

'Wheatfield with crows' by Vincent Van Gogh, 1890.  (Image; wikimedia commons)

‘Wheatfield with crows’ by Vincent Van Gogh, 1890. (Image; wikimedia commons)

Certainly our response has changed, though it is difficult to articulate precisely what is different. The image now seems to illustrate this sentence. Its symbolic content has altered for us. This example shows how combining two types of information –an image and text- can change the meaning it symbolises.

Some animals can be trained to recognise simple symbols. The psychologist Irene Pepperberg taught her African Grey parrot ‘Alex’ to count; he learned to use numbers as symbols, and could identify quantities of up to 6 items.

An African Grey Parrot (Psittacus erithacus).  Irene Pepperberg’s parrot, Alex, learned very basic grammar, could identify objects by name, and could count  (Image: Wikimedia Commons)

An African Grey Parrot (Psittacus erithacus). Professor Irene Pepperberg’s parrot, Alex, learned basic grammar, could identify objects by name, and could count (Image: Wikimedia Commons)

5.  The order in which we combine symbols defines their meaning

We put word symbols together into phrases, sentences, descriptions, sayings, stories, poems, documents, manuals, plays, oaths, promises, parodies, plays, pantomimes….

The ordering of words follow rules (grammar and syntax). Animals such as dogs and dolphins show some form of syntactical ability, but there is no evidence that they are on the verge of using what we understand as language. The order of words shows us their relationship, allowing us to understand how they are interacting. We change the order of our words and phrases to change the meaning we wish to communicate.

For instance; this makes sense.

‘Jane asked Simon to give these flowers to you.’

This doesn’t quite fit our normal understanding of reality…

‘These flowers asked Simon to give Jane to you.’

This works, but the meaning has changed.

‘Simon asked you to give these flowers to Jane’

However grammar is not enough . The words in combination need to ‘make sense’ for us to understand the meaning the speaker wishes to communicate.

What does this enable us to say?

‘The treachery of images’ by Belgian surrealist painter, Rene Magritte (1928-9).   Much of Magritte’s work explored the combination of words and images, and the way that this challenges the meaning that we understand from the components on their own.  This combination of words and image have been deliberately chosen so that they contradict each other.   What the artist says is true. However, it isn’t a pipe! It is a two-dimensional representation of a pipe (Image: Wikimedia Commons)

‘The treachery of images’ by Belgian surrealist painter, Rene Magritte (1928-9).Much of Magritte’s work explored the combination of words and images, and the way that this challenges the meaning that we understand... more from the components on their own. This combination of words and image have been deliberately chosen so that they contradict each other.What the artist says is true. However, it isn’t a pipe! It is a two-dimensional representation of a pipe (Image: Wikimedia Commons)

When we make new combinations of words, or add words to a visual signal such as a gesture, we create a new meaning.

We can add adjectives to a description, add qualifiers, combine phrases into a sentence, and make statements one after the other so that our listener associates these ideas. This process is known as ‘recursion’, a linguistic term borrowed from mathematics.

Our ideas about time vary between cultures, but we all mentally ‘time travel’ by revisiting our memories. For instance, the scent of something can evoke a memory that transports us back into an earlier event; suddenly we experience again the emotions and sensations we felt at that time. Putting our current selves into the past memory, or imagining a future scenario and inserting ourselves into that story, is a form of recursion.

Memory allows us to link speaking and listening with the meanings of our words. Our language is well structured to easily express recursive ideas. This shows us that our thinking uses recursion.

Why are we able to do this?

An illustration by Randolph Caldecott (1887) for ‘the House that Jack Built’.  This traditional British nursery rhyme uses recursion to build up a cumulative tale.  The sentence is expanded by adding to one end (end recursion).  Each addition adds an increasingly emphatic meaning to the final item of the sentence (i.e. the house that Jack built) (Image: Wikimedia Commons).  One final version of combined phrases ends like this; This is the horse and the hound and the horn That belonged to the farmer sowing his corn That kept the cock that crowed in the morn That woke the priest all shaven and shorn That married the man all tattered and torn That kissed the maiden all forlorn That milked the cow with the crumpled horn That tossed the dog that worried the cat That killed the rat that ate the malt That lay in the house that Jack built.

An illustration by Randolph Caldecott (1887) for ‘the House that Jack Built’. This traditional British nursery rhyme uses recursion to build up a cumulative tale. The sentence is expanded by adding to one end (end r... moreecursion). Each addition adds an increasingly emphatic meaning to the final item of the sentence (i.e. the house that Jack built) (Image: Wikimedia Commons). One final version of combined phrases ends like this;This is the horse and the hound and the hornThat belonged to the farmer sowing his cornThat kept the cock that crowed in the mornThat woke the priest all shaven and shornThat married the man all tattered and tornThat kissed the maiden all forlornThat milked the cow with the crumpled hornThat tossed the dog that worried the catThat killed the rat that ate the maltThat lay in the house that Jack built.

Our thinking capacity, through which we learn and remember, means that we can copy and learn to use language. Although some brain regions appear specialised for roles in memory and language, our ‘language function’ uses our entire brain, and cannot be dissociated from our minds.

Our ‘language brain’ includes the ‘basal ganglia’; these are neurons which connect the outer cortex and thalamus with lower brain regions.

We need this connectedness to coordinate movements in our fingers, to understand the relationships between words that are inferred by their order in our phrases, and to solve abstracted (theoretical) problems. This network interacts with ‘mirror neurons’ which allow us to relate to and decode the posture, speech and emotional cues of others.

The  basal ganglia that influence our speech also regulate the muscles controlling our posture. Standing is therefore more than just balancing on two legs; it is a whole body activity and requires much finer muscle control than walking on all fours. It also frees the hands, which allows us to manipulate tools. Lieberman suggests that it is the fine motor control required to maintain our upright posture which pre-adapted our ancestors for manipulating hand tools as well as the tongue, lips and other structures that make speech possible.  This upright posture is linked with a remodelling of our breathing apparatus, giving us more control over our larynx.

Philip Lieberman’s work with people suffering from Parkinson’s’ disease suggests that it is the ability to remember that makes speaking possible. Parkinson’s patients have degraded nerve circuits in their basal ganglia, so these patients have short term memory problems and difficulties with balancing and making precise finger movements. They also struggle with understanding and using metaphors and longer word sequences. This suggests that when we speak we are using the circuitry for sorting and remembering movement sequences, irrespective of whether these are producing words or actions.

Our posture has remodelled the evolution of our entire physiology from breathing to childbirth.  It frees the hands, allowing us to perform delicate and precise sequences of tasks.  Selection for the ability to precisely sequence our manual motor skills may have provided our ancestors the means to better sequence their thoughts (Image: Wikimedia Commons)

Our posture has remodelled the evolution of our entire physiology from breathing to childbirth. It frees the hands, allowing us to perform delicate and precise sequences of tasks. Selection for the ability to precisely ... moresequence our manual motor skills may have provided our ancestors the means to better sequence their thoughts (Image: Wikimedia Commons)

 

The basal ganglia that influence our speech also regulate the muscles controlling our posture. Standing is therefore more than just balancing on two legs; it is a whole body activity and requires finer muscle control that walking on all fours.  It also frees the hands, which allows us to manipulate tools.

Lieberman suggests that it is the fine motor control required to maintain our upright posture which pre-adapted our ancestors for manipulating hand tools as well as the tongue, lips and other structures that make speech possible.  This upright posture is linked with a remodelling of our breathing apparatus, giving us more control over our larynx.

The nerve networks that control our limbs and voices are linked across all vertebrates. Our basic ‘walking instinct’ initially activates Central Pattern Generator circuits driving movement in all four limbs. These are the same neural outputs that control our lips, tongue and throat.

Conclusions: What does this say about our language?

Captain Odysseus stands upright against the mast. This posture is distinct to our species, and has many implications for our speech, language and other actions  (Image: Wikimedia commons)

Captain Odysseus stands upright against the mast. This posture is distinct to our species, and has many implications for our speech, language and other actions
(Image: Wikimedia commons)

  • Our hominin ancestors evolved to use symbolic words and stories as a code to store and share memories, develop new skills and ideas, and coordinate their intentions and actions with their tribe.
  • When we revisit our memories or ‘reword’ our experiences into new sequences, we remodel the past, and project our thoughts into the future.
  • The control we have over our vocal sounds is linked with our neural circuits for movement. The ability to balance ideas and manipulate our tongues is linked to our ability to stand upright, balance on two feet and manipulate tools with our hands.
  • Language, then, is a cultural tool that allows us to order our thoughts, go beyond our instincts, share our intentions, and choose our own story.

Text copyright © 2015 Mags Leighton. All rights reserved.

References
Berger J (1972) ‘Ways of Seeing’ Penguin books Ltd, London, UK
BickertonD and Szathmáry E (2011) ‘Confrontational scavenging as a possible source for language and cooperation’ BMC Evolutionary Biology 11:261  doi:10.1186/1471-2148-11-261
Corballis MC (2007) ‘The uniqueness of human recursive thinking’ American Scientist Volume 95 (3), May 2007, Pages 240-248
Corballis, M.C.(2007) ‘Recursion, language, and starlings’ Cognitive Science 31(4) 697-704
Everett D (2008) ‘Don’t sleep, there are snakes: Life and language in the Amazonian jungle’ Pantheon Books, New York, NY (2008)
Everett, D (2012) ‘Language: the cultural tool’ Profile Books Ltd, London, UK
Gentner TQ et al (2006) ‘Recursive syntactic pattern learning by song birds’ Nature, 440;1204–1207

Riddles in code; is there a gene for language?

‘I have…’

Words are like genes; on their own they are not very powerful.  But apply them with others in the right phrase, at the right time and with the right emphasis, and they can change everything.

‘I have a dream…’

Genes are coded information.  They are like the words of a language, and can be combined into a story which tells us who we are.

The stories we choose to tell are powerful; they can change who we become, and also change the people with whom we share them.

‘I have a dream today!’


Language is a means for coding and passing on information, but it is cultural, and definitely non-genetic.  Nevertheless, for our speech capacity to have evolved, our ancestors must have had a body equipped to make speech sounds, along with the mental capacity to generate and process this language ‘behaviour’.  Our body’s development is orchestrated through the actions of relevant genes.  If the physical aspects of language ultimately have a genetic basis, this implies that speech must derive, at least in part, from the actions of our genes.

The hunt for genes involved with language led researchers at the University of Oxford to investigate an extended family (known as family KE).  Some family members had problems with their speech.  The pattern of their symptoms suggested that they inherited these difficulties as a ‘dominant’ character, and through a single gene locus.

The FoxP2 gene encodes for the ‘Forkhead-Box Protein-2’; a transcription factor.  This is a type of protein that interacts with DNA (shown here as a pair of brown spiral ladders), and influences which genes are turned on in the cell, and which remain silent.   This diagram shows two Forkhead box proteins, which associate with each other when active.  This bends the DNA strand and makes critical areas of the genetic code more accessible (Image: Wikimedia Commons)

The FOXP2 gene encodes for the ‘Forkhead-Box Protein-2’; a transcription factor. This is a type of protein that interacts with DNA (shown here as a pair of brown spiral ladders), and influences which genes are turne... mored on in the cell, and which remain silent. This diagram shows two Forkhead box proteins, which associate with each other when active. This bends the DNA strand and makes critical areas of the genetic code more accessible (Image: Wikimedia Commons)

Discovery of another unrelated patient with the same symptoms confirmed that the condition was linked to a gene known as FOXP2  (short for ‘Forkhead Box Protein-2’).  This locus encodes a ‘transcription factor’; a protein that influences the activation of many other genes.  FOXP2 was subsequently dubbed ‘the gene for language’.  Is that correct?

Not really.  FOXP2 affects a range of processes, not just speech.  The mutation which inactivates the gene causes difficulties in controlling muscles of the face and tongue, problems with compiling words into sentences, and a reduced understanding of language.  Neuroimaging studies showed that these patients have reduced nerve activity in the basal ganglia  region of the brain.  Their symptoms are similar to some of the problems seen in patients with debilitating diseases such as Parkinson’s and Broca’s Aphasia; these conditions also show impairment of the basal ganglia.

Genes code for proteins by using a 3-letter alphabet of adenine, thymine, guanine and cytosine (abbreviated to A, T, G and C).  These nucletodes are knwn as ‘bases’ (are alkaline in solution) and make matched pairs which form the ‘rungs of the ladder’ of the DNA helix. Substituting one base for another (as happens in many mutations) can change the amino acid sequence of the protein a gene encodes.  Changes may make no impact on survival, allowing the DNA sequence to alter over time.  Changes that affect critical sections of the protein (e.g. an enzyme’s active site), or critical proteins like FoxP2, are rare (Image: Wikimedia Commons)

Genes code for proteins by using a 3-letter alphabet of adenine, thymine, guanine and cytosine (abbreviated to A, T, G and C). These nucletodes are knwn as ‘bases’ (are alkaline in solution) and make matched pairs w... morehich form the ‘rungs of the ladder’ of the DNA helix. Substituting one base for another (as happens in many mutations) can change the amino acid sequence of the protein a gene encodes. Changes may make no impact on survival, allowing the DNA sequence to alter over time. Changes that affect critical sections of the protein (e.g. an enzyme’s active site), or critical proteins like FOXP2, are rare (Image: Wikimedia Commons)

Genes provide the code to build proteins.  Proteins are assembled from this coding template (the famous triplets) as a sequence of amino acids, strung together initially like the carriages of a train and then folded into their finished form.  The amino acid sequences of the FOXP2 protein show very few differences across all vertebrate groups.  This strong conservation of sequence suggests that this protein fulfils critical roles for these organisms.  In mice, chimpanzees and birds, FOXP2 has been shown to be required for the healthy development of the brain and lungs.  Reduced levels of the protein affect motor skills learning in mice and vocal imitation in song birds.

The human and chimpanzee forms of FOXP2 protein differ by only two amino acids. We also share one of these changes with bats.  Not only that, but there is only one amino acid difference between FOXP2 from chimpanzees and mice.  These differences might look trivial but they are probably significant.  FOXP2 has evolved faster in bats than any other mammal, hinting at a possible role for this protein in echolocation.

Mouse brain slice, showing neurons from the somatosensory cortex (20X magnification) producing green fluorescent protein (GFP).  Projections (dendrites) extend upwards towards the pial surface from the teardrop-shaped cell bodies. Humanised Foxp2 in mice causes longer dendrites to form on specific brain nerve cells, lengthens the recovery time needed by some neurons after firing, and increases the readiness of these neurons to make new connections with other nerves (synaptic plasticity).  The degree of synaptic plasticity indicates how efficiently neurons code and process information (Image: Wikimedia Commons)

Mouse brain slice, showing neurons from the somatosensory cortex (20X magnification) producing green fluorescent protein (GFP). Projections (dendrites) extend upwards towards the pial surface from the teardrop-shaped ce... morell bodies. Humanised Foxp2 in mice causes longer dendrites to form on specific brain nerve cells, lengthens the recovery time needed by some neurons after firing, and increases the readiness of these neurons to make new connections with other nerves (synaptic plasticity). The degree of synaptic plasticity indicates how efficiently neurons code and process information (Image: Wikimedia Commons)

Changing the form of mouse FOXP2 to include these two human-associated amino acids alters the pitch of these animals’ ultrasonic calls, and affects their degree of inquisitive behaviour.  Differences also appear in their neural anatomy.  Altering the number of working copies (the genetic ‘dose’) of FOXP2 in mice and birds affects the development of their basal ganglia.

Mice with ‘humanised’ FOXP2 protein show changes in their cortico-basal ganglia circuits along with altered exploratory behaviour and reduced levels of dopamine (a neurotransmitter  that affects our emotional responses).  So too, human patients with damage to the basal ganglia show reduced levels of initiative and motivation for tasks.

This suggests that FOXP2 is part of a general mechanism that affects our thinking, particularly around our initiative and mental flexibility.  These are critical components of human creativity, and are as it happens, essential for our speech.

Basal ganglia circuits process and organise signals from other parts of the brain into sequences.  Speaking involves coordinating a complex sequence of muscle actions in the mouth and throat, and synchronising these with the out-breath.  We use these same muscles and anatomical structures to breathe, chew and swallow;  our ability to coordinate them affects our speech, although this is not their primary role.

Family KE’s condition, caused by a dominant mutation in the FoxP2 gene, follows an autosomal (not sex-linked) pattern of inheritance, as shown here.   Dominant mutations are visible when only one gene copy is present.  In contrast a recessive trait is not seen in the organism unless both chromosomes of the pair carry the mutant form of the gene.   The FoxP2 transcription factor protein is required in precise amounts for normal function of the brain.  The loss of one working FoxP2 gene copy reduces this ‘dose’ which is enough to cause the problems that emerged as family KE’s symptoms (Image: Annotated from Wikimedia Commons)

Family KE’s condition, caused by a dominant mutation in the FOXP2 gene, follows an autosomal (not sex-linked) pattern of inheritance, as shown here.Dominant mutations are visible when only one gene copy is present. In... more contrast a recessive trait is not seen in the organism unless both chromosomes of the pair carry the mutant form of the gene. The FOXP2 transcription factor protein is required in precise amounts for normal function of the brain. The loss of one working FOXP2 gene copy reduces this ‘dose’ which is enough to cause the problems that emerged as family KE’s symptoms (Image: Annotated from Wikimedia Commons)

In practice, very few of our 25,000 genes are individually responsible for noticeable characteristics.  Most genetically inherited diseases result from the effects of multiple gene loci.  FOXP2 is unusual because of its ‘dominant’ genetic character.  It does not give us our language abilities, but it is involved in the neural basis of our mental flexibility and agility at controlling the muscles of our mouths, throats and fingers.

In addition, genes are only part of the story of our development.  The way we think and subsequently behave alters our emotional state.  Feeling stressed or calm affects which circuits are active in our brain.  This alters the biochemical state of body organs and tissues, particularly of the immune system, modifying which genes they are using.

The dance between the code stored in our genes and the consequences of our thoughts builds us into what we are mentally, physically and socially.  This story is ours to tell.  By our experience, and with this genetic vocabulary, we create what we become.

Text copyright © 2015 Mags Leighton. All rights reserved.

References
Chial H (2008)  ‘Rare genetic disorders: Learning about genetic disease through gene mapping, SNPs, and microarray data’ Nature Education 1(1):192  http://www.nature.com/scitable/topicpage/rare-genetic-disorders-learning-about-genetic-disease-979
Clovis YM et al. (2012) ‘Convergent repression of Foxp2 3′UTR by miR-9 and miR-132 in embryonic mouse neocortex: implications for radial migration of neurons’  Development 139, 3332-3342.
Enard, W (2011) ‘FOXP2 and the role of cortico-basal ganglia circuits in speech and language evolution’  Current Opinion in Neurobiology  21; 415–424
Enard, W et al (2009)  A Humanized Version of Foxp2 Affects Cortico-Basal Ganglia Circuits in Mice  Cell 137 (5); 961–971  http://www.sciencedirect.com/science/article/pii/S009286740900378X
Feuk L et at., Absence of a Paternally Inherited FOXP2 Gene in Developmental Verbal Dyspraxia, in The American Journal of Human Genetics, Vol. 79 November 2006, p.965-72.
Fisher SE and Scharff C (2009) ‘FOXP2 as a molecular window into speech and language’  Trends in Genetics 25 (4); 166-177
Lieberman P  (2009)  ‘FOXP2 and Human Cognition’  Cell 137; 800-803
Marcus GF & Fisher SE (2003) ‘FOXp2 in focus; what can genes tell us about speech and language?’  Trends in Cognitive Sciences 7(6); 257-262
Reimers-Kipping S et al. (2011) ‘Humanised Foxp2 specifically affects cortico-basal ganglia circuits’ Neuroscience 175; 75-84
Scharff C & Haesler S (2005) ‘An evolutionary perspective on Foxp2; strictly for the birds?’ Current opinion in Neurobiology 15:694-703
Vargha-Khadem F et al. (2005) ‘FOXP2 and the neuroanatomy of speech and language’  Nature Reviews Neuroscience 6, 131-138 http://www.nature.com/nrn/journal/v6/n2/full/nrn1605.html
Wapshott N (2013)  ‘Martin Luther King's 'I Have A Dream' Speech Changed The World’ Huffington post, 28th August 2013  http://www.huffingtonpost.com/2013/08/28/i-have-a-dream-speech-world_n_3830409.html
Webb DM & Zhang J (2005) ‘Foxp2 in song learning birds and vocal learning mammals’  Journal of Heredity 96(3);212-216

Upon reflection; what can we really see in mirror neurons?

“Mirror, mirror on the wall,
who is the fairest of them all?”

“Fair as pretty, right or true;
what means this word ‘fair’ to you?
Fair in manner, moods and ways,
fair as beauty ‘neath a gaze…
Meaning is a given thing.
I cannot my opinion bring
to validate your plain reflection!
You must make your own inspection.”


Mirrors shift our perspective, enabling us to see ourselves directly, and reflect ideas back to us symbolically.  But how do we really see ourselves?

Quite recently, neuroscientists discovered a new type of nerve cell in the brains of macaques which form a network across the primary motor cortex, the brain region controlling body movements.  These nerves are intriguing; they become active not only when the monkey makes purposeful movements such as grasping food, but also when watching others do the same.  As a result, these cells were named ‘mirror neurons’.

Comparison of human (above) and chimpanzee brain sizes (Image: Wikimedia Commons)

Comparison of human (above) and chimpanzee brain sizes (Image: Wikimedia Commons)

In monkeys and other non-human primates, these nerves fire only in response to movements with an obvious ‘goal’ such as grabbing food.  In contrast, our mirror network is active when we observe any human movement, whether it is purposeful or not.  Our brains ‘mirror’ the actions of ourselves and others’ actions, from speaking to dancing.

However it is to interpret what function these cells are performing.  Different researchers suggest that mirror neurons enable us to:

– assign meaning to actions;

– copy and store information in our short term memory (allowing us to learn gestures including speech);

– read other people’s emotions (empathy);

– be aware of ourselves relative to others (giving us a ‘theory of mind’, i.e. we have a mind, and the contents of other people’s minds are similar to our own).

Whilst these opinions are not necessarily exclusive, they do seem to reflect the different priorities of these experts.  Their varied interpretations highlight how difficult it is to be aware of how our beliefs and assumptions affect what our observations can and cannot tell us.

What we can say, is that the behaviour of these nerve cells shows that our mirror neuron responses are very different from those of our closest relatives, the primates.

What do we know about mirror neurons from animals?

A tribe of stump tail macaques (Macaca arctoides) watch their alpha male eating.   Macaques making meaningful gestures, such as grabbing for food, triggers the same mirror neurone network in the animal performing the action and the observers. (Image: Wikimedia commons)

A tribe of stump tail macaques (Macaca arctoides) watch their alpha male eating.When these macaques observe a meaningful gesture, such as grabbing for food, this triggers a shift in the electrical status of the same mot... moreor neurons in their brain as the observed animal is using to perform the action.  This ‘mirroring’ is found in other primates, including humans.(Image: Wikimedia commons)

Researchers at the University of Parma first discovered mirror neurons in an area of the macaque brain which is equivalent to Broca’s area in humans.  This brain region assembles actions into ordered sequences, e.g. operating a tool or arranging our words into a phrase.  Later studies show these neurons connect right across the monkey motor cortex, and respond to many intentional movements including facial gestures.

The macaque mirror system is activated when they watch other monkeys seize and crack open some nuts, grab some nuts for themselves, or even if they just hear the sound of this happening.  Their neurons make no responses to ‘pantomime’ (i.e. a grabbing action made without food present), casual movements, or vocal calls.

Song-learning birds also have mirror-like neurons in the motor control areas in their brains.  Male swamp sparrows’ mirror network becomes active when they hear and repeat their mating call.  Their complex song is learned by imitating other calls, suggesting a possible role for mirror neurons in learning.  This is tantalising, as we do not yet know the extent to which mirror neurons are present in other animals.

Our visual cortex receives information from the eyes, which is then relayed around the mirror network and mapped onto the motor output to the muscles.   First, [purple] the upper temporal cortex (1) receives visual information and assembles a visual description which is sent to parietal mirror neurons (2).  These compile an in-body description of the movement, and relay it to the lower frontal cortex (3) which associates the movement with a goal.   With these observations complete, inner imitation (red) of the movement is now possible. Information is sent back to the temporal cortex (4) and mapped onto centres in the motor cortex which control body movement (5) (Image: Annotated from Wikimedia Commons)

Our visual cortex receives information from the eyes, which is then relayed around the mirror network to the motor output to the muscles. First, [purple] the upper temporal cortex (1) receives visual information and ass... moreembles a visual description go the action.  This is sent to parietal mirror neurons (2) which compile an in-body description of the movement, and relay it to the lower frontal cortex (3) where the movement becomes associated with a goal. With these observations complete, an inner imitation (red) of the movement is now possible. Information is sent back to the temporal cortex (4) and mapped onto centres in the motor cortex which control body movement (5) (Image: Annotated from Wikimedia Commons)

 

The primate research team at Parma suggest the mirror system’s role is in action recognition, i.e. tagging ‘meaning’ to deliberate and purposeful gestures by activating an ‘in-body’ experience of the observed gesture.  The mirror network runs across the sensori-motor cortex of the brain, ‘mapping’ the gesture movement onto the brain areas that would operate the muscles needed to make the same movement.

An alternative interpretation is that mirror neurons allow us to understand the intention of another’s action.  However as monkey mirror neurons are not triggered by mimed gestures, the intention of the observed action presumably must be assessed at a higher brain centre before activating the mirror network.

How is the human mirror system different?

Watching another human or animal grabbing some food creates a similar active neural circuit in our mirror network.

The difference is that our nerves are activated by us observing any kind of movement.  Unlike monkeys, when we see a mimed movement, we can infer what this gesture means.  Even when we stay still we cannot avoid communicating; the emotional content of our posture is readable by others.  In particular we readily imitate other’s facial expressions.

As we return a smile, our face ‘gestures’.  Marco Iacoboni and co-workers have shown that as this happens, our mirror system activates along with our insula and amygdala.  This shows that our mirror neurons connect with the limbic system which handles our emotional responses and memories.  This suggests that emotion (empathy) is part of our reading of others’ actions.  As we see someone smile and smile back, we feel what they feel.

Spoken words deliver more articulated information than can be resolved by hearing  alone.  Our ability to read and copy the movements of others as they speak may be how we really distinguish and understand these sounds.  This ‘motor theory of speech perception’ is an old idea.  The discovery of mirror-like responses provides physical evidence of our ability to relate to other people’s movements, suggesting a possible mechanism for this hypothesis.

Sound recording traces of the words ‘nutshell’, ‘chew’ and ‘adagio’.  Our speech typically produces over 15 phonemes a second.  Our vowels and consonants ‘overlap in time’, and blur together into composite sounds.  This means that simply hearing spoken sounds does not provide us with enough information to distinguish words and syllables.  In practice we decode words from this sound stream, along with emotional information transmitted through the tone and timbre of phrases, facial expressions and posture (Images: Wikimedia Commons)

Sound recording traces of the words ‘nutshell’, ‘chew’ and ‘adagio’. Our speech typically produces over 15 phonemes a second. Our vowels and consonants ‘overlap in time’, and blur together into composite... more sounds. This means that simply hearing spoken sounds does not provide us with enough information to distinguish words and syllables. In practice we decode words from this sound stream, along with emotional information transmitted through the tone and timbre of phrases, facial expressions and posture (Images: Wikimedia Commons)

Further studies suggest that these mirror neurons are part of a brain-wide network made of various cell types.  Alongside the mirror cells are so-called ‘canonical neurons’, which fire only when we move.  In addition, ‘anti-mirrors’ activate only when observing others’ movements.  Brain imaging techniques show that frontal and parietal brain regions (beyond the ‘classic’ mirror network) are also active during action imitation.  It is not clear how the system operates, but in combination we relate to others’ actions through the same nerve and muscle circuits we would use to make the observed movements.  We relate in this way to what is happening in someone else’s mind.

Are mirror neurons our mechanism of language in the brain?

Macaques (Macaca fuscata) grooming in the Jigokudani HotSpring in Nagano Prefecture, Japan.   Human and monkey vocal sounds arise in different regions of the brain.  Primate calls are mostly involuntary, and express emotion.  They are processed by inner brain structures, whereas the human speech circuits are located on the outer cortex (Image: Wikimedia Commons)

Japanese macaques (Macaca fuscata) grooming in the Jigokudani Hotspring in Nagano Prefecture, Japan. Human and monkey vocal sounds arise from different regions of the brain. Primate calls are mostly involuntary, and exp... moreress emotion. They are processed by inner brain structures, whereas the human speech circuits are located on the outer cortex (Image: Wikimedia Commons)

Mirror-like neurons activate whether we are dancing or speaking.  Patients with brain damage that disrupts these circuits have difficulties understanding all types of observed movements, including speech.  This suggests that we use our extended mirror network to understand complex social cues.

Our mirror neuron responses to words map onto the same brain circuits as other primates use for gestures.  However signals producing our speech and monkey vocal calls arise from different brain areas.  This suggests that our speech sounds are coded in the brain not as ‘calls’ but as ‘vocal gestures’.  This highlights the possible origins of speaking as a form of ‘vocal grooming’, which socially bonded the tribe.

When we think of or hear words, our mirror network activates the sensory, motor and emotional areas of the brain.  We thus embody what we think and say.  Michael Corballis and others consider that mirror neurons are part of the means by which we have evolved to understand words and melodic sounds as ‘gestures’.

Woman Grasping Fruit’ by Abraham Brueghel, 1669; Louvre, Paris.  The precision control of the grasping gesture she uses to pluck a fig from the fruit bowl is unique to humans.  The intensity of her expression implies many layers of meaning to what we understand from this picture (Image: Wikimedia Commons)

Woman Grasping Fruit’ by Abraham Brueghel, 1669; Louvre, Paris. The precision control of the grasping gesture she uses to pluck a fig from the fruit bowl is unique to humans. The intensity of her expression implies ma... moreny layers of meaning to what we understand from this picture (Image: Wikimedia Commons)

What is unclear is how we put meaning into these words.  Some researchers have suggested that mirror neurons anchor our understanding of a word into sensory information and emotions related to our physical experience of its meaning.  This would predict that our ‘grasp’ of the meaning of our experiences arises from our bodily interactions with the world.

Vocal gestures would have provided our ancestors with an expanded repertoire of movements to encode with this embodied understanding.  Selection could then have elaborated these gestures to include visual, melodic, rhythmical and emotional information, giving us a route to the symbolic coding of our modern multi-modal speech.

We produce different patterns of mirror neuron activity in relation to different vowel and consonant sounds, as well as to different sound combinations.  Also, the same mirror neuron patterns appear when we watch someone moving their hands, feet and mouth, or when we read word phrases that mention these movements.

Wilder Penfield used the ‘homunculus’ or ‘little man’ of European folklore to produce his classic diagram of the body as being mapped onto the brain.  A version of this is shown here; mirror neurons map incoming information onto the somatosensory cortex (shown left) and outputs to the muscles from the motor cortex (right).  These brain regions lie adjacent to each other (Image: Wikimedia Commons)

Wilder Penfield used the ‘homunculus’ or ‘little man’ of European folklore to produce his classic diagram of the body as being mapped onto the brain. A version of this is shown here; mirror neurons map incoming ... moreinformation onto the somatosensory cortex (shown left) and outputs to the muscles from the motor cortex (right). These brain regions lie adjacent to each other (Image: Wikimedia Commons)

We process word sequences in higher brain centres at the same time as lower brain circuits coordinate the movements required for speech production and non-verbal cues.  Greg Hicock suggests that our speech function operates by integrating these different levels of thinking into the same multi-modal gesture.

Mirror neurons connecting the brain cortex and limbic system may allow us to synchronously process our understanding of an experience with our emotional responses to it.  This allows us to consciously control our behaviour, adapt flexibly to our world, and communicate our understanding to others and to ourselves.

Smoke and mirrors; what do these nerves really show and tell?

People floating in the Dead Sea.   Our ability to read emotional information from postures means that we can intuit information about people’s emotional state even when they are not visibly moving (Image: Wikimedia Commons)

People floating in the Dead Sea. Our ability to read emotional information from postures means that we can intuit information about people’s emotional state even when they are not visibly moving (Image: Wikimedia Comm... moreons)

The word ‘mirror’ conjures up strong images in our minds.  This choice of name may have influenced what we are looking for in our data on mirror neurons.  However they appear crucial for language.  This and other evidence suggests that our ability to speak and to read meaning into movement is a property of our whole brain and body.

Single nerve measurements show that the mirror neuron network is a population of individual cells with distinct firing thresholds.  Different subsets of these neurons are active when we see similar movements made for different purposes.  This suggests that the network responds flexibly to our experience.

Cecilia Heyes’ research shows that our mirror network is a dynamic population of cells, modified by the sensory stimulus our brain receives throughout life.  She suggests that these mirror cells are ‘normal’ neurons that have been ‘recruited’ to mirroring, i.e. adopted for a specialised role; to correlate our experience of observing and performing the same action.

This gives us a possible evolutionary route for the appearance of these mirror neurons.  Recruitment of brain motor cortex cells to networks used for learning by imitation would create a population of mirror cells.  This predicts that;

i. Mirror-like networks will be found in animals which learn complex behaviour patterns, such as whales. (They are already known in songbirds.)

One of the many dogs Ivan Pavlov used in his experiments (possibly Baikal); Pavlov Museum, Ryazan, Russia. Note the saliva catching container and tube surgically implanted in the dog's muzzle.   These dogs were regularly fed straight after hearing a bell ring.  In time, the sound of the bell alone made them salivate in anticipation of food.  This experience had trained them to code the bell sound with a symbolic meaning, i.e. to indicate the imminent arrival of food (Image: Wikimedia Commons)

One of the many dogs Ivan Pavlov used in his experiments (possibly Baikal); Pavlov Museum, Ryazan, Russia. Note the saliva catching container and tube surgically implanted in the dog’s muzzle. These dogs were regu... morelarly fed straight after hearing a bell ring. In time, the sound of the bell alone made them salivate in anticipation of food. This experience had trained them to code the bell sound with a symbolic meaning, i.e. to indicate the imminent arrival of food (Image: Wikimedia Commons)

ii. It should be possible to generate a mirror-like network in other animals by training them to associate a stimulus with a meaning, perhaps a symbolic meaning as in Pavlov’s famous ‘conditioned reflex’ experiments with dogs.

Mirror neurons then, show us that something unusual is going on in our brain.  They reveal that we use all of our senses to relate physically to movement and emotion in others, and to understand our world.  They are part of the system we use to learn and imitate words and actions, communicate through language, and interact with our world as an embodied activity.

However beyond this, we cannot yet see what else they reveal.  Until we do, our conclusions about these neurons must remain ‘as dim reflections in a mirror’.

Conclusions

  • Monkey mirror neurons relate the observations of intentional movements to a sense of meaning.
  • The human mirror network activates in response to all types of human movements, including the largely ‘hidden’ movements of our vocal apparatus when we speak.

    Double rainbow. The second rainbow results from a double reflection of sunlight inside the raindrops; the colours of this extra bow are in reverse order to the primary bow, and the unlit sky between the bows is called Alexander's band, after Alexander of Aphrodisias who first described it (Image: Wikimedia commons)

    Double rainbow. The second rainbow results from a double reflection of sunlight inside the raindrops; the raindrops act like a mirror as well as a prism.  The colours of this extra bow are in reverse order to the prima... morery bow, and the unlit sky between the bows is called Alexander’s band, after Alexander of Aphrodisias who first described it. (Image: Wikimedia commons)

  • These neurons are a component of the neural network that allows us to internally code meaning into our words, and ‘embody’ our memory of the idea they symbolise.
  • The mirror network neurons seem to be part of an expanded empathy mechanism that connects higher and lower brain areas, allowing us to understand our diverse experiences from objects to ideas.
  • These cells are recruited into the mechanism by which we learn symbolic associations between items (such as words and their meaning).  This shows that it is our thinking process, rather than the cells of our brain, that makes us uniquely human.

Text copyright © 2015 Mags Leighton. All rights reserved.

References
Aboitiz, F & García V R (1997) The evolutionary origin of the language areas in the human brain. A neuroanatomical perspective   Brain Research Reviews 25(3);381-396. doi: 10.1016/S0165-0173(97)00053-2
Aboitiz, F et al.  (2005) Imitation and memory in language origins’ Neural Networks 18(10);1357.. doi: 10.1016/j.neunet.2005.04.009
Arbib, M (2005) ‘The mirror system hypothesis; how did protolanguage evolve?’  Ch 2 (p21-47 ) in Language Origins  -  Tallerman M (ed), Oxford University Press, Oxford.
Arbib, M A (2005) ‘From monkey-like action recognition to human language; an evolutionary framework for neurolinguistics.’  The behavioural and Brain Sciences 2: 105-124
Aziz-Zadeh L et al (2006) Congruent Embodied Representations for Visually Presented Actions and Linguistic Phrases Describing Actions’  Current Biology 16(18); 1818-1823
Braadbaart, L (2014) ‘The shared neural basis of empathy and facial imitation accuracy’  NeuroImage 84; 367 – 375
Bradbury J (2005) ‘Molecular Insights into Human Brain Evolution’. PLoS Biology 3/3/2005, e50  doi:10.1371/journal.pbio.003005
Carr, L.et al. (2003) ‘Neural mechanisms of empathy in humans: A relay from neural systems for imitation to limbic areas’    Proceedings of the National Academy of Sciences of the United States of America  100(9); 5497-5502
Catmur, C. et al (2011) ‘Making mirrors: Premotor cortex stimulation enhances mirror and counter-mirror motor facilitation’  (2011) Journal of Cognitive Neuroscience, 23 (9), pp. 2352-2362.  doi: 10.1162/jocn.2010.21590  http://www.mitpressjournals.org/doi/pdf/10.1162/jocn.2010.21590
Catmur C et al. (2007)  ‘Sensorimotor Learning Configures the Human Mirror System’  Current Biology 17(17) 1527-1531  http://www.sciencedirect.com/science/journal/09609822
Corballis MC (2002)  ‘From Hand to Mouth: The Origins of Language’ Princeton University Press, Princeton, NJ, USA
Corballis MC (2003)  ‘From mouth to hand: Gesture, speech, and the evolution of right-handedness’   Behavioral and Brain Sciences 26(2); 199-208
Corballis, M (2010) ‘Mirror neurons and the evolution of language’ Brain and Language 112(1); 25-35  doi: 10.1016/j.bandl.2009.02.002
Corballis, M.C. (2012) ‘How language evolved from manual gestures’ Gesture 12(2); PP. 200 – 226
Ferrari PF et al. (2003) ‘Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex’ European Journal of Neuroscience 17 (8); 1703–1714
Ferrari, P.F. et al (2006)  ‘Neonatal imitation in rhesus macaques’  PLoS Biology, 4 (9), pp. 1501-1508  doi: 10.1371/journal.pbio.0040302  http://biology.plosjournals.org/archive/1545-7885/4/9/pdf/10.1371_journal.pbio.0040302-L.pdf Galantucci, B et al (2006) ‘The motor theory of speech perception reviewed’  Psychon Bull Rev. 2006 June; 13(3): 361–377. PMCID: PMC2746041 NIHMSID: NIHMS136489  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2746041/
Gallesse V et al (1996)  ‘Action recognition in the premotor cortex’  Brain 119:593–609
Gentilucci M & Corballis MC (2006) ‘From manual gesture to speech: A gradual transition’  Neuroscience & Biobehavioral Reviews 30(7); 949-960
Hage, S.R., Jürgens, U. (2006) Localization of a vocal pattern generator in the pontine brainstem of the squirrel monkey  European Journal of Neuroscience 23(3); 840 – 844  doi: 10.1111/j.1460-9568.2006.04595.x
Heyes C (2010) ‘Where do mirror neurons come from?’  Neuroscience and behavioural Reviews 34(4); 575-583  http://www.sciencedirect.com/science/article/pii/S0149763409001730
Heyes CM (2001) ‘Causes and consequences of imitation’ Trends in Cognitive Sciences 5; 245–261
Hickok G (2012)  ‘Computational neuroanatomy of speech production’ Nature Reviews Neuroscience 13, 135-145  doi:10.1038/nrn3158
Jürgens, U (2003) From mouth to mouth and hand to hand: On language evolution Behavioral and Brain Sciences 26(2); 229-230
Kemmerer D and Gonzalezs-Castillo J (2008) ‘The Two-Level Theory of verb meaning: An approach to integrating the semantics of action with the mirror neuron system’  Brain Lang. 112(1);54-76  doi: 10.1016/j.bandl.2008.09.010. Epub 2008 Nov 8.
Keysers C & Gazzola V (2009)  ‘Expanding the mirror: vicarious activity for actions, emotions, and sensations’ Curr Opin Neurobiol. 2009 Dec;19(6):666-71. doi: 10.1016/j.conb.2009.10.006. Epub 2009 Oct 31. Review.
Kohler E et al (2002) Hearing Sounds, Understanding Actions: Action Representation in Mirror Neurons Science297(5582);. 846-848 DOI: 10.1126/science.1070311 
Molenberghs P et al (2012) ‘Activation patterns during action observation are modulated by context in mirror system areas’  NeuroImage59(1); 608–615
Molenberghs P et al. (2012)  ‘Brain regions with mirror properties: A meta-analysis of 125 human fMRI studies’  Neuroscience & Biobehavioral Reviews 36(1); 341-349  http://dx.doi.org/10.1016/j.neubiorev.2011.07.004
Molenberghs P, et al. (2009)  Is the mirror neuron system involved in imitation? A short review and meta-analysis.  Neurosci Biobehav Rev. 2009 Jul;33(7):975-80. doi: 10.1016/j.neubiorev.2009.03.010. Epub 2009 Apr 1.
Mukamel R et al (2010) Single-neuron responses in humans during execution and observation of actions Current biology 20(8); 750–756  http://dx.doi.org/10.1016/j.cub.2010.02.045
Pohl, A et al.  (2013) ‘Positive Facial Affect – An fMRI Study on the Involvement of Insula and Amygdala’  PLoS One 8(8): e69886. doi:10.1371/journal.pone.0069886  PMCID: PMC3749202
Prather, J. F., Peters, S., Nowicki, S., Mooney, R. (2008). "Precise auditory-vocal mirroring in neurons for learned vocal communication." Nature 451: 305-310.
Pulvermüller F (2005) ‘Brain mechansims linking language and action’  Nature Reviews Neuroscience 6:576-582
Pulvermüller F et al (2005) ‘Brain signatures of meaning access in action wordrecognition’  Journal of Cognitive Neuroscience 17;884-892
Pulvermüller F et al (2006)  ‘Motor cortex maps articulatory features of speech sounds’ PNAS 103 (20); 7865–7870  doi: 10.1073/pnas.0509989103
Rizzolatti G & Luppino G  (2001)  ‘The cortical motor system’ Neuron 31:889–901
Rizzolatti G (1996)  ‘Premotor cortex and the recognition of motor actions. Cogn. Brain Res. 3:131–41
Pavlov IP (1927)  ‘Conditioned Reflexes; an investigation of the physiological activity of the cerebral cortex’  OUP, London (republished 2003 as ‘Conditioned Reflexes’ Dover Publications Ltd, NY, USA)
Rizzolatti G, et al (1996)  ‘Localization of grasp representation in humans by PET: 1. Observation versus execution. Exp. Brain Res. 111:246–52
Rizzolatti G, Fogassi L, Gallese V. 2002.  ‘Motor and cognitive functions of the ventral premotor Cortex’  Curr. Opin. Neurobiol. 12:149–54
Rizzolatti G. et al (2001) ‘Neurophysiological mechanisms underlying the understanding and imitation of action’  Nature Reviews Neuroscience,2(9);661-670. doi: 10.1038/35090060
Umilta et al (2001) ‘I know what you are doing.  A neurophysiological study’  Neuron 31:155-165