Human Language and Machine Language

Community Essay

Aug 18

The explosion of large language models means that we are all suddenly awash in machine-generated language, spoken, written and even sung. There are many advantages to the artificial intelligence produced like this, and many disadvantages too: it remains to be seen what the eventual outcome will be. But there is one particular idea that keeps bubbling up to the surface: that AI is going to decide to kill us all. This, I suggest, is based on a misunderstanding of the nature of language.

As English is the international language of science, computer scientists, and scientists in other fields who use computers, meaning almost everyone, have to speak English, though it is possible to function in Chinese only as long as you stay in China. This means that British, American and other native speakers of English have a natural advantage in that they already speak the language, and in common with most other native English speakers tend to be monoglots: a tendency that is getting stronger with time. While 20% of Britons claim to speak another language fluently, the fact that the most widespread of these other languages are Urdu, Hindi, Punjabi, Cantonese and Welsh tells its own story. Successive governments have tried to impose language learning on the population through education policies, but these are failing. The situation is similar in the United States, with about 20% of the population speaking a second language, overwhelmingly Spanish but with other immigrant community languages also represented, and again, decades of compulsory language education have failed to make native English-speakers actually learn another language.

This means that most of the British and American computer scientists working on large language models lack one essential skill that would give them insight into how language works. Indeed, not just other languages but the whole science of linguistics is seen as irrelevant. If your model of language production is ChatGPT, then this is natural enough. Language is something produced instantly and cleanly by computation, so why bother analysing the slow and inefficient processes by which human beings do it?

The talking computer is fast approaching, though understanding natural speech is still some way away, the toothpaste not wanting to go back into the tube. The first scientist to make a serious attempt to produce one was Alan Turing. It is a curious story.

The most famous and also the most influential paper in computer science is Turing’s 1950 journal article ‘Computing Machinery and Intelligence’, published in Mind. Right at the end of that paper there is a strange passage. “It can also be maintained that it is best to provide the machine with the best sense organs money can buy, and then teach it to understand and speak English. This process could follow the normal teaching of a child. Things would be pointed out and named, etc.”

Turing is discussing a hypothetical ‘learning machine’, and his concept of such a machine was informed by two intellectual movements of the 1940s: Behaviourism and Cybernetics. Behaviourism had an enormous vogue, and seemed to offer a practical alternative to the Freudian psychoanalysis that had swept through a previous generation in the English-speaking world. It was not necessary to understand some arcane inner processes to understand human psychology. All you had to do was consider outward behaviour, and this could be studied by looking at inputs and outputs only. One reason for this according to the Behaviourists is that it is impossible to understand the inner processes involved. This is often called a ‘black box’ model, meaning that what goes on inside the box is dark and unknowable.

In Turing’s paper we read “An important feature of a learning machine is that its teacher will often be very largely ignorant of what is going on inside, although he may still be able to some extent to predict his pupil’s behaviour.” He explicitly contrasts this with normal computing, where the programmer does know exactly what is going on. The learning machine is so designed and constructed that it will generate complex internal behaviour, not exactly random but sufficiently incoherent that there is no point trying to understand it. This will result in learning. The way to harness this is to add a camera, microphone and loudspeaker so that the learning machine is connected to the outside world, and specifically to its teacher and benefactor, namely Alan Turing himself.

The electrical model for this black box was provided by the other advanced idea of the 1940s, Cybernetics. This mathematically-based theory on the face of it has little in common with Behaviourism, but Turing evidently thought he had created a workable synthesis. Cybernetics is a large theory, but one aspect that evidently caught his eye was the idea that the most perfect cybernetic system that has been produced by evolution is the human nervous system. This is explained structurally as a large and complex feedback system for processing information and generating action while maintaining self-regulation. In the context of computing, this belief would naturally give rise to the idea that a computer built along the same lines would essentially be a synthetic human being: and this is how Turing writes of his ‘learning machine’. It is ‘like a child’. It is a ‘child machine’.

Turing was intensely interested in cybernetics. At the time of his Mind article he was an active member of the Ratio Club, an informal society founded by a neurologist, John Bates in 1949, whose members gave papers on various aspects of what seemed to them to be the coming cybernetics revolution, focusing in particular on modelling the brain and nervous system, and on developing self-regulating automatic machines. It is a great shame that the papers given by Turing there are lost. The list of papers has survived, and on 7 December 1950 Turing gave a talk entitled ‘Educating a Digital Computer’ which would be particularly interesting in this context. The club included all the leading British cyberneticists of the time (not that many, actually).

He was given an opportunity to put these ideas into action. In late November 1946, well before the foundation of the Ratio Club, Turing – evidently already under the spell of Cybernetics - had written to the psychiatrist W. Ross Ashby, creator of the Homeostat and author of Design For A Brain (1952), who would become a leading light in the Club and was the most prominent British practitioner of Cybernetics. Turing was already at the planning stage of building his own computer, the Automatic Computing Engine or ACE, at the National Physical Laboratory at Teddington. In his letter to Ashby he says “I am more interested in producing models of the action of the brain than in the practical applications of computing. (…) The ACE will be used, as you suggest, in the first instance in an entirely disciplined manner, similar to the action of the lower centres, although the reflexes will be extremely complicated. The disciplined action carries with it the disagreeable feature, which you mentioned, that it will be entirely uncritical when anything goes wrong. It will also be necessarily devoid of anything that could be called originality. There is, however, no reason why the machine should always be used in such a manner: there is nothing in its construction which obliges us to do so. It would be quite possible for the machine to try out variations of behaviour and accept or reject them in the manner you describe and I have been hoping to make the machine do this.”

We see that with its talk of ‘the lower centres’ and ‘reflexes’ the nervous system model is self-evident, but the actual machine needed to be built. Witnesses describe him putting in long, obsessive hours at Teddington over the next couple of years, not on programming, but on soldering, physically putting together the system he had designed. A peculiarity of his situation was that he couldn’t explain to anyone how he had learned about computers and how to build them. As a leading figure in the wartime effort to defeat German and then Japanese military ciphers, he had already built one very successful computer, but the existence of this historic project was a state secret which he took to his grave. So nobody else at the laboratory really knew what he was up to. On the balance of probabilities, it seems that this was supposed to be the ‘digital computer’ that he was planning to educate, and the ‘learning machine’ of his Mind paper. In effect, he was hoping to create his own electrical Frankenstein’s monster: a child of his own to bring up and talk to, as described in his papers. A child machine intelligence which he could nurture and teach. While his paper ostensibly deals with intelligence, what he seems to be doing in reality is attempting a kind of artificial life.

The project failed. Turing, frustrated with the lack of progress, abandoned the task, and the ACE was completed by lesser men, who took the idiosyncratic design with its vacuum tubes and mercury delay lines and made it into something workable, though when IBM produced a machine based on transistors, that type of design suddenly became obsolete and independent British computing came to a halt. Turing’s great Unfinished Symphony at no point showed any sign of starting to talk or to behave independently, but perhaps the mere technicians who worked on it after him didn’t know what to look for.

Can we see this phantasmagorical contraption as a prescient hardware version 1 of ChatGPT? Not in detail: while today’s neural networks use models distantly related to biological neurons, there is no cybernetic modelling of LLMs based on the structure of the human nervous system, and there is more to the interface than just hooking up a camera, microphone and loudspeaker; but there is something in the Behaviourist black box idea that looks similar. LLM engineers do indeed say that they don’t know what these systems will do next, and do indeed often seem at a loss to explain exactly how their creations produce their results. And whatever they might say, this lack of understanding is demonstrated in practice by the many failures to control these systems, as when they start saying how much they admire Hitler or advising users to kill other people or themselves.

The pre-eminent theorist of artificial neural networks today is Geoffrey Hinton, awarded the 2024 Nobel Prize in physics for this work. Having famously left Google so as to be able to warn people about the dangers of AI, he has become a highly visible critic of the large language model explosion, with the unpredictability of their behaviour a recurring theme. This leads him to talk about language at length, and as another British-educated monoglot English-speaker, his ideas about linguistics are what you could describe as ‘armchair’.

What Hinton seems to believe is that since LLMs do such a creditable job of simulating human language production, he and his colleagues must somehow have hit upon the actual mechanism that generates language in people too. As far as I know, no academic linguists share this belief. Hinton in fact has taken to openly ridiculing academic linguistics, apparently for the failure of linguistic theory to explain how LLMs do their stuff. Noam Chomsky, who founded modern linguistics theory in the 1950s, directly as a counterweight to the Behaviourism which was now infecting linguistics too, is, declares Hinton, obsolete. It is hard to believe from what he says that he has read one word of Chomsky, as his description of Chomsky’s theories is an absurd caricature.

The work of Chomsky himself and of his many students and followers has gone through many phases and changes of concepts and terminology, but in general terms his core idea is that there is a single unifying logical structure (deep structure) that underlies all human languages. So even languages with radically different surface structures are actualizations of the same built-in human capacity for language, which produces structurally identical patterns of phonology and grammar, and arguably also semantics, at a sufficient level of analysis. Surface structure is gradually generated by the human language instinct or capacity in the early years of life, and this depends on interactions with other people, so that cultural differences produce the great variety of human languages. This was a great sea change in linguistics, which had been largely concerned with studying the features of individual languages, or their family connections: an approach mainly deriving from its roots in classical philology.

One might think that this formal approach, using a lot of mathematical terminology and seeking out common formal structures, would appeal to Hinton and other theorists of computer-generated language, but there is a fatal sticking point. Chomsky gives the analogy of saying that an aeroplane flies like an eagle. It is kind of true in a limited way, but the aeroplane clearly does not fly exactly like an eagle. The LLM produces language like a human being. Yes, in a way it does. It can produce a term paper or the lyrics for a country and western number. But it just as obviously does not really talk like a human being. The theoretical sticking point is that central idea of the uniform deep structure of all human languages. LLMs do not work like that at all. They make statistical inferences from the surface structures only.

Leaving aside Hinton’s inexplicable and misguided personal grudge against Chomsky, many other contemporary computer scientists, now suddenly all interested in language, make what is fundamentally the same mistake. They have learned several computer languages: does this not make them experts on language in general? Is there really any significant difference between a computer language and a natural language? Aren’t they both just sets of rules and symbols by which well-formed strings can be formed? And isn’t the success of LLMs proof of this?

The answer to all of these questions is ‘No’.

Music is often said to be a language, but it would take a brave musicologist to propose that all human languages share the structures of a sonata or a polka. It is just an accident of terminology. If Basic and Cobol had been called ‘computer codes’ rather than ‘computer languages’ this confusion would never have arisen. Computer languages have little in common with natural languages apart from their misleading name.

Moving to a different level of sophistication, Stephen Wolfram’s model of the universe as computation necessarily, trivially even, includes language along with everything else, so that it must be true from first principles that human language, as a natural phenomenon, can be explained as computation. But so can every other aspect of human behaviour and action. What is needed to give this idea some weight is a specific theory of language in computational terms. Ideally, this should align with important findings from mainstream linguistics. Again, identifying the core deep structure of real human languages and generating language from that by computation could look like a plausible approach. Instead, Wolfram reverts to Alan Turing’s idea of pointing at objects and naming them. If that were really how language acquisition or in fact language creation works, then yes, why not say that human languages and computer languages are basically the same?

Alan Turing never had children, and given his unusual, isolated life perhaps didn’t have much of an opportunity to observe a baby acquiring language, but pointing at objects and naming them is only one element in the process, and it is a quixotic decision to declare this its central mechanism.

The first words an English-speaking baby will say are likely to be Dada and Mama, and it stretches credulity to suggest that this is due to assiduous pointing out of these characters and repetitions of their names. It takes some time for real progress to start, but once it starts, toddlers acquire language at a fantastic rate, and any pointing and naming that might be done is basically irrelevant to this process.

There is a continuing debate in academic linguistics about the ‘poverty of the stimulus’, a term coined by Chomsky in 1980. One side says that children do not get enough information from other people or the environment to learn language as fast and as well as they do; the other says that there must be some other explanation we have not yet found. The first position is used to support the idea of universal grammar as something built-in.

It is impossible to ask babies how they are doing it, but there is some fascinating evidence about language acquisition from a practical experiment carried out in Thailand. In 1985, Dr J. Marvin Brown, an American linguist who among other interests knew Thai, was put in charge of a course to teach Thai to adults. It is often said that adults lose the ability to learn language like babies: he decided to put this to the test.

Thai is a difficult language, and traditional courses would have featured dense text books and many complex exercises. Instead, students were forbidden to use books or dictionaries, and were not allowed to make notes. A key innovation was that they were also not allowed to talk in class. Rather than teachers, Brown recruited ordinary Thais with some acting ability who were also used to interacting with people (tour guides were a good source) and got them to act out scenes and conversations resembling everyday life in front of the class, who were not allowed to ask questions. This continued for months, and after about 700 hours one by one the students would begin to speak Thai: not of the variety learned from books, but true, native Thai. He reports that people from elsewhere in South-East Asia learned faster, not due to any linguistic similarity, but because Thais accepted them more naturally as people likely to speak their language, but that the main thing was that the students didn’t cheat by looking at textbooks when they were not in class. Those who had studied other languages were at a disadvantage, as it seems that traditional methods of studying a language actually prevent you from learning it. The best results were achieved by students who made no effort at all. This system outperformed traditional teaching methods massively, and is still running in Thailand as the Automatic Language Growth programme.

From this experience quite a different idea of language acquisition emerges: no pointing and naming at all. Instead, a great flow of language, day after day, from people, from the TV, from songs… At first it is all incomprehensible, and then little by little the learner notices one word or phrase recurring and hangs on to that, and then another, and pieces it together; and then starts to reproduce it.

This looks like a good general model, but there is an important difference between adults learning a language and what babies do. Babies are not independent beings, and they need parents or other carers to look after them all the time. So what they learn to express from the beginning are their emotional or physiological needs rather than facts about the world. Observing a rock and dispassionately remarking ‘rock’ might take place, but wailing for food, screaming in terror or gurgling happily at mama’s breast are more the normal thing.

The point is that we are animals. The needs of a human baby are about the same as the needs of a baby chimpanzee or gorilla. Safety, comfort, food and so on. Our abilities to vocalize depend on a development of primate vocalization, and much of what we need as babies is common to all mammals. As adults, we might move on to discussing Schopenhauer or quantum mechanics, but that does not stop us from being animals: which means that our language is an animal language. English is an animal language. Welsh. Thai. Whatever it is.

Turning to the origin of language in the sense of the passage from a population that had no language to one that did, conclusive evidence is lacking. Some point to the physiological development of the larynx, others to the development of physical symbols as some kind of proof of an ability for symbolic reasoning, but as the other human species are now extinct along with our pre-human ancestors, the best we can do is to look at our closest primate relatives, the great apes, especially chimpanzees and bonobos. They vocalize quite a lot, though obviously not in the articulate way that humans do. However, they do use facial expressions, a limited repertoire of vocalizations, and gestures of various kinds to communicate with members of their own groups, and also to warn off members of rival groups.

So if we are looking for an equivalent of the constant stream of linguistic performance that enables language acquisition in human beings, we have to look at a wider palette of activity than vocalization alone. As with humans, there is a special repertoire of behaviour for caring for infants, who are similarly dependent on adults in early life, but there is one thing which is so important in the life of chimpanzees that it can be said to be their main activity. This is grooming.

Chimps groom each other all day every day. The amount of time an individual will groom or be groomed varies, but in a social group if they are awake there is always grooming going on. Apart from its function of maintaining cleanliness by removing parasites, dirt and other foreign bodies, it maintains status hierarchies and also relationships among individuals. Sometimes groups of individuals will get together and groom themselves collectively: it is a constant stream of sensation, demanding a kind of relaxed concentration and indicating that there is nothing more urgent that needs doing.

This activity continued in humans for a surprisingly long time. Mediaeval Europeans in family groups and among friends would combine grooming for lice with talking, with no discussion, lice being such a universal affliction. The kind of distracted, low-pressure conversation that might have accompanied the search for lice is itself a kind of enhanced grooming activity, and we see in our daily lives that much conversation performs the same kind of social functions as grooming in chimps and other primates.

At its simplest, grooming is a sign of equality, but it can also indicate submissive behaviour towards a social superior who will allow it to take place, or express other social or family dynamics. Much everyday language fulfils similar functions, though it is clear that there is a whole semantic aspect that is missing from the lives of other primates. But when a mother chatters away to her baby, who doesn’t understand and can’t answer, what is it but verbal grooming? Groups of women will often keep up a constant stream of relaxed talk whose purpose is not so much the exchange of information but to establish a friendly, safe atmosphere. Men will do something similar in groups, though the conversational dynamics are different. Verbal grooming once again.

Chimps are not born ready to groom, but they evidently have a grooming instinct which gradually develops, stimulated no doubt by the pleasurable feelings of being groomed themselves. If, as Chomsky has maintained for 70 years, human language works on roughly similar lines – an instinct that is activated after birth to perform a vital social function – this offers a paradigm that is more realistic than the Turing/Wolfram point and shoot idea. Indeed, simple nouns are quite a long way down the evolutionary road from primate crooning or the other vocalizations that primitively accompany grooming.

The version of language discussed by philosophers and other theorists tends to focus on its propositional qualities, A is B and so on, or its means of expressing logical relationships. But much of what we say expresses emotion rather than fact. Even when we are saying something that if written down might appear totally factual, there is often an emotional element to the statement as well, expressed by tone of voice or something similar. Doubt, suggestion, hidden menace… all the normal elements of daily conversation. Rock? Rock… Rock! etc. When we read something, we unconsciously put in some of this emotional content to humanize it. Written AI content already appears sufficiently human to fool us into doing this, and the synthetic AI voices are now so life-like that we increasingly can’t tell the difference between real animal emotion and the ersatz silicon variety. But there is no animal there.

What the animal element provides above all is motivation. The cat is motivated to go out and kill something at dusk. The rabbit is motivated to run away. I am motivated to write this paper. Computers are not motivated in that way at all. ChatGPT does not suddenly send us its spontaneous thoughts, because it doesn’t have any. It doesn’t want anything. It can no doubt be taught to simulate some specific kinds of motivation – the motivation to answer questions has already been simulated, for instance. But now that we are presented on a daily basis with ersatz linguistic content that appears to have been produced by human beings, we see that the famous Turing Test and the attached Imitation Game do not in fact convince us that we are dealing with an intelligent being. Instead, we just feel cheated.

So the dangers of AI, while real, have to be re-framed. AI will not suddenly decide to kill us. It doesn’t matter what level of artificial intelligence it reaches, it will not spontaneously develop an instinct for predation – not, at least, until someone invents an algorithm for artificial motivation. Who knows where that would lead? Artificial morality? Artificial religion? If AI kills us it will be because some human being has decided to use it for that purpose, or, just as likely, it will happen by mistake while a human being is trying to do something else and screws it up.