Putting things into words

The dolphin calf is barely three weeks old.

His mother nuzzles at him, calling softly. The calf responds, mimicking her call.

This unique set of sounds is his signature; the name that mother is teaching him to recognise, and which she will use to call to her calf as she teaches her baby to hunt. Later he will use this signature whistle so that others in his own pod will recognise him.

Again and again as his mother calls this name, he repeats it back.

Words hold ideas in code. As in all communications, the meaning of a signal must be agreed between the sender and receiver. We give our words their meaning by shared agreement.

A bonobo (Pan paniscus) ‘fishing’ for termites using a stick tool, at San Diego Zoo (Image: Wikimedia Commons)

As we remember and recall them, we access the information they hold. Collectively we use our words as tools to store information in symbolic form, and so bring our memories ‘to mind’.

Human languages, whether sung or spoken, produce words by controlling the pitch and articulation of these distinct sets of sounds with the lips and tongue. We process our words in the brain through the same fine motor control circuits as we (and our primate cousins) use to coordinate our hands and fingers.

This means that we use words almost as if they are tools in our hands. At the neurological level, words are gestures to which we give meaning, and then use as tools to share that meaning.

Why did our ancestors need words?

Hearing words bring our ideas to mind, coordinating the thoughts of our social group. New words are coined when that group agrees to associate a syllable sequence (a word), itself distinct from existing words, with a new unique meaning. Some other animals, e.g. dogs, can learn to associate human word sounds or gestures with simple meanings.

Dogs (Canis lupus subspecies domesticus) have lived alongside humans for over 33,000 years. We have selectively bred these animals to shepherd, hunt, sniff out and retrieve our targets for us upon demand. Human-directed selection has ‘evolved’ working dogs that can be trained to recognise around 200 human words. Commands, or proxies for them such as whistled signals, are tools use to coordinate their activity with ours (Image: Wikimedia Commons)

Dogs (Canis familiaris) have lived alongside humans for over 33,000 years. We have selectively bred these animals to shepherd, hunt, sniff out and retrieve our targets for us upon demand. Human-directed selection has ... moreevolved’ working dogs that can be trained to recognise around 200 human words. Commands, or proxies for them such as whistled signals, are tools use to coordinate their activity with ours (Image: Wikimedia Commons)

However our understanding and use of words – as symbolic tools – is highly flexible. Our application of words is often playful, for example puns and ambiguities can extend the meaning of a word, or apply it in a new way. This furthers our use of these tools for social interaction.

Perhaps starting around 2.5 Ma, our ancestors began to experience selective forces that ultimately promoted a remarkable mental flexibility, resulting in the development of elaborate and multi-purpose manual tools. This expanded tool use corresponds with the onset of cultural learning. The making and development of tools is learned from our social group, as is our speech.

Are we then unique in our ability to coin new words? Dolphins and some other whales broadcast ‘signature calls’ when hunting in murky and deep water, enabling them to stay connected with their pod. Vocal self-identifying calls would have provided similar benefits to our hominin ancestors in dense, low visibility forest habitats, and perhaps also across large distances in open grassland habitats. Specific word tools for sharing information, e.g. warning of snakes or poisonous fruit, would enable this group to collectively navigate their world more effectively than they could alone.

We often talk whilst using our hands to make additional gestures, or operate tools. Here those tools are knives, forks and wineglasses (Image: Wikimedia Commons)

As with manual tools, the act of using words provides immediate feedback. Our language may have a gestural basis in the brain, but our vocal-auditory speech mode is much more efficient. Although we often move our hands when we talk, we can speak whilst conducting other manual tasks.

How did our ancestors begin to use words as tools?

– Peter MacNeilage suggests that our language arose directly as vocal speech. Our ancestors’ circumstances may have selected for specific vocal signals, received using their auditory communication channel, whilst their hands were busy with other tasks. This could include hunting with manual tools, foraging or attending to their young.

– William Stokoe and others argue instead that sign came first. Hand gestures use the visual channel as a receiver. They suggest that vocal gestures emerged later, perhaps as a combined visual and auditory signal.

This photograph of the New York Curb Association market (c1916) shows brokers and clients signalling from street to offices using stylised gestures. Similar manual signs have arisen in many open floor stock exchanges ar... moreound the world, where they made it possible to broker rapid ‘face-to-face’ deals across a noisy room (known as the ‘open outcry’ method of trading). Today these manual languages have been largely superseded by the advent of telephones and electronic trading through the 1980’s and 1990’s (Image: Wikimedia Commons)

In practice, we often use manual and vocal channels synchronously, but they don’t mix; we never create words that oblige us to combine hand movements with mouth sounds. Sign languages based on gestures do arise ‘naturally’ (i.e. much like a pidgin language) usually in response to a constraint, such as where deafness is present in some or all of the population, or where other forms of common language are not available within the group. Manual languages arising under such circumstances reveals just how flexible and adaptable our speech function really is.

Before our ancestors could assign meaning to words, however, they had to learn how to copy and reproduce the unique movements of the lips and tongue that each new word requires.

What might those first words have been?

Babbling sounds are learned. Hearing-impaired infants start to babble later, are less rhythmical and use more nasal sounds than babies with normal hearing. Children exposed only to manual sign language also ‘babble’... more with their hands. Language learning robots that hear and interact with adult humans quickly pick out relevant one-syllable words from a stream of randomly generated babble. These initial syllables act as ‘anchors’, allowing the machines to more quickly distinguish new syllables from the human sounds it hears (Image: Wikimedia Commons)

Babies begin to control the rhythmical movements involved with both eating and vocalising as they start to babble, at around 4-6 months. Making these movements involves coordinating the rhythmical nerve outputs of multiple Central Pattern Generator neural circuits.

Central Pattern Generators operate various repetitive functions of the body, including breathing, walking, the rhythmic arm movements that babies often make as they babble, and the baby hand-to-mouth grab reflex.

Babies begin to babble simply by moving their lower jaw at the same time as making noises with the larynx. These sounds are actually explorations of syllables the child has already heard; around half are the simple syllables used in ‘baby talk’.

Learning to make these sounds involves mastering the simplest of our repetitive vocal movements; typically this involves opening and closing the jaw with the tongue in one position (front, central or back) inside the mouth.

To suckle, babies raise the soft palate in the roof of their mouths to close off the nasal cavity. This creates a vacuum in the mouth that enables them to obtain milk from the breast. After swallowing, the infant opens ... morethe soft palate to take a breath through the nose; this often results in an ‘mmm’ sound in the nasal cavity (Image: Wikimedia Commons)

Say ‘Mmmmm’, then ‘ma-ma’… Where do you feel this sound resonating? A suckling child’s murmuring sounds have this same nasal resonance.

Our first vocal sounds as babies show our desire to connect with our parents. This connection is two-way; neural and hormonal responses are triggered in human parents upon hearing the cries of their child.

A baby makes nasal murmuring sounds when its lips are pressed to the breast and its mouth is full. Perhaps as a mother repeats these soothing sounds back to her child, they become a signal that the infant associates with its mother and later mimics to call to her. Selection may have favoured hominins able to connect with their offspring using vocal sounds.

Unlike young chimps who cling to their mothers, human babies need to be carried. A hominin mother soothing her child by voice sounds would be able to put down her baby and forage with both hands.

There is more. Consider walking. Adopting an upright posture provoked a re-structuring of our hominin ancestors’ body plan.

Breathing and swallowing whilst standing up required a re-orientation of the larynx. This organ acts as a valve controlling the out-breath, prevents food entering the trachea, and houses the vocal folds (vocal cords) controlling the pitch and volume of the voice.

The nasal resonant consonants of ‘mama’ are made with the tongue at rest and soft palate open (above). In this position the nasal cavity is continuous with the mouth and throat. To produce ‘dada’, the soft palat... moree elevates (below), closing off the nasal cavity and limiting resonance to the oral chamber (Image: Modified from Wikimedia Commons)

Breathing and eating whilst standing upright also requires that the soft palate (velum) in the roof of the mouth can move up and down, closing off the nasal cavity from the throat when swallowing. Moving the soft palate also changes the size and connection between the resonating chambers in our heads.

‘Ma-ma’ sounds are made with the soft palate in an open position and opening and closing the jaw to articulate the lips together. Closing the soft palate shifts resonance into the mouth, producing ‘pa-pa’ from the same movement.

Most world languages have nasal and oral resonance in their ‘baby talk’ names for ‘mother’ and ‘father’. Peter MacNeilage highlights this as the only case of two contrasting phonetic forms being regularly linked with their opposing meanings. The desire of hominin infants to connect specifically with one or the other parent may have resulted in them producing the first deliberately contrasted sounds.

Could these sounds, perhaps along with sung tones, have been part of our first ‘words’?

How do we apply meanings to these vocal gestures?

Chimpanzees make spontaneous vocal sounds that express emotions such as fear and pleasure, much as we do. They also communicate intentionally, using non-verbal facial gestures e.g. lipsmacks.

We gesture with our faces, hands and voices across all languages and cultures. Human baby babbling is also voluntary, combining sound from the larynx with lipsmack-like movements to create simple syllables.

Mandarin Chinese is a tonal language; words with different meanings are coded into the same syllable using changes in pitch (Image: Wikimedia Commons). Click to hear the four main tones of standard Mandarin, pronounced ... morewith the syllable “ma”.

The initiation of vocal sounds arises from different regions of the brain in humans and other primates. Primate calls arise from emotional centres within the brain (associated with the limbic system), whereas human speech circuits are focussed around the lateral sulcus (Sylvian fissure).

Within the lateral sulcus, a zone of the macaque brain (area ‘F5’) thought to be the equivalent of Broca’s area in humans, houses nerve pathways called ‘mirror neurons’. The mirror circuits are involved with producing and understanding grasping actions associated with obtaining food, decoding others’ facial expressions, and making mouth movements related to eating.

These circuits reveal that neurological routes link hand-and-mouth action commands. Broca’s area in humans is essential for speech, hand gestures and producing fine movements in the fingers.

A female blackbird (Turdus merula) with nest building materials. As well as speaking and manipulating food in a precise way, many animals and birds use their mouths as a tool to manipulate objects. The materials with wh... moreich this blackbird builds her nest are also tools. Using the eating apparatus to perform other tasks is common amongst vertebrates (Image: Wikimedia Commons)

This and other higher brain areas control Central Pattern Generator circuits in the lower brain which coordinate eating movements and voice control. The same circuits that operate grasping gestures with the hands also trigger the moth to open and close in humans and other higher primates (the automatic hand to mouth reflex of babies).

Mirror neuron networks in humans interconnect with the insula and amygdala; components of the limbic system that are involved in emotional responses. Maurizio Gentilucci and colleagues at the University of Parma suggest that mirror neurons which link these components of the emotional brain with higher brain circuits for understanding the intention of food grasping gestures may have enabled our ancestors to associate hand or mouth gestures with an emotional content. Tagging our observations with an emotional response is how we code our own vocal and other gestures with meaning.

Pronunciation chart for Toli Pona, a simple constructed language, designed by Toronto-based language Sonja Lang. Toli Pona has 14 main sounds (phonemes) and around 120 root words, and was designed as a means to express ... moremaximal meaning with minimal complexity.Each human language uses a unique selection of sounds from the syllables which are possible to make using our vocal apparatus. As our children learn their first words, they replicate the spoken sounds they hear. In this way, the sounds we learn as part of our first languages are specific to our location and circumstances (our environment), and reproduce local nuances in pronunciation. As we become fluent in our native language, producing these sounds becomes ‘automatic’. We are rarely conscious of the syllables we choose, focussing instead on what we want to say.People learning a foreign language as adults tend to speak that language using the sounds repertoire of their native tongue.Listen to the same panphonic prhsase repeated in a variety of accents, (and to contribute your own if you wish), visit the speech accent archive here (Image: Wikimedia Commons)

Many primates vocalise upon discovering food. Gestures then may be a bridge linking body movement to objects and associated vocal sounds. Hearing but not seeing an event take place allows the hearer to visually construct an idea of the associated experience in their mind. Once an uttered sound could trigger an associated memory, our hominin ancestors could then revisit that experience.

When we hear or think of words that describe an object or a movement, the same mirror neuron circuits are activated as when we encounter that object or make the movement. Thinking of the words for walking or dancing also triggers responses in our mirror neuron network that are involved with walking or dancing movements. When we think of doing something, and then do it, we are literally ‘walking our talk’.

Conclusions

Words are tools produced by unique sets of movements in the vocal apparatus. They may have developed in our hominin ancestors as a sound-based form of gesture.

Young chimpanzees from the Jane Goodall sanctuary of Tchimpounga (Congo Brazzaville). Wild chimpanzees (Pan troglodytes) make ‘pant hoot’ calls upon finding food, such as a tree laden with fruit. These calls are rec... moreognisable by other members of their group. Adjacent groups of wild chimps with overlapping territories adjust and re-model their pant hoot calls so that their group call signature is distinctive from that of the other tribe. These remodelled calls seem to indicate group learning amongst these animals (Image: Wikimedia Commons)

Studying how our babies learn to speak gives us some insights into how hominins may have made the transition to talking. Our ancestors’ first word tools may have been parental summoning calls. The vocal calls of babies assist them to bond strongly with their parents.
Words inside the brain replicate our physical experience of the phenomena that they symbolise.
The mental flexibility to agree new sound combinations and associate these with meaning provided our hominin ancestors with a powerful resource of vocal tools that allow us to share our learning. This ability to share learning has many potentially selectable survival advantages.

References

Davis, B.L. and MacNeilage, P.F. (1995)  Reconsidering the evolution of brain, cognition, and behaviour in birds and mammals.  Journal of Speech, Language, and Hearing Research 38, 1199-1211.

Eisen, A. et al. (2013)  Tools and talk: an evolutionary perspective on the functional deficits associated with amyotrophic lateral sclerosis.  Muscle and Nerve 49, 469-477.

Falk, D. (2004)  Prelinguistic evolution in early hominins: whence motherese? Behavioural and Brain Sciences 27, 491-541.

Gentilucci, M. and Dalla Volta, R. (2008) Spoken language and arm gestures are controlled by the same motor control system. Quarterly Journal of Experimental Psychology 61, 944-957.

Gentilucci, M. et al. (2008) When the hands speak. Journal of Physiology-Paris 102, 21-30.

Goldman, H.I. (2001)  Parental reports of “mama” sounds in infants: an exploratory study.  Journal of Child Language 28, 497-506.

Jakobson, R. (1960)  Why “Mama” and “Papa”?’  In Essays in Honor of Heinz Werner (R. Jakobson, ed.) pp. 538-545.  Mouton.

Johnson-Frey, S.H. (2003)  What's so special about human tool use?  Neuron 39, 201-204.

Johnson-Frey, S.H. (2004)  The neural bases of complex tool use in humans.  Trends in Cognitive Sciences 8, 71-78.

Jürgens, U. (2002)  Neural pathways underlying vocal control.  Neuroscience and Biobehavioural Reviews 26, 235–258.

King, S.L. and Janik, V.M. (2013)  Bottlenose dolphins can use learned vocal labels to address each other.  Proceedings of the National Academy of Sciences, USA 110, 13216-13221.

King, S. et al. (2013) Vocal copying of individually distinctive signature whistles in bottlenose dolphins. Proceedings of the Royal Society of London, B 280, 20130053.

Lieberman. P. (2006)  Toward an Evolutionary Biology of Language. Harvard University Press.

Lyon, C. et al. (2012)  Interactive language learning by robots: the transition from babbling to word forms.  PLoS ONE 7, e38236.

MacNeilage, P. (2008)  The Origin of Speech. Oxford University Press.

MacNeilage, P.F. and Davis, B.L. (2000)  On the origin of internal structure of word forms.  Science288, 527-531.

MacNeilage, P.F. et al. (2000) The motor core of speech: a comparison of serial organization patterns in infants and languages. Child Development 71, 153–163.

MacNeilage, P.F. et al. (1999)  Origin of serial output complexity in speech.  Psychological Science 10, 459-460.

Matyear, C.L. et al. (1998)  Nasalization of vowels in nasal environments in babbling: evidence for frame dominance.  Phonetica 55, 1-17.

Mitani, J.C. et al. (1992)  Dialects in wild chimpanzees?  American Journal of Primatology 27, 233-243.

Petito, L.A. and Marentette, P. (1991)  Babbling in the manual mode: evidence for the ontogeny of language.  Science 251, 1483-1496.

Savage-Rumbaugh, E.S. (1993)  Language comprehension in ape and child.  Monographs of the Society for Research into Child Development 58, 1-222.

Stokoe, W.C. (2001)  Language in hand: Why sign came before speech. Gallaudet University Press.

Tomasello, M. (1999)  The Human Adaptation for Culture.  Annual Review of Anthropology 28, 509-529.