Measuring Human Intelligence
October 17, 2018 - Monocle Research Department
The term “artificial intelligence” was coined in 1956 by computer scientist John McCarthy, who used it to refer to the idea of defining all aspects of human intelligence in such detail that a computer could be programmed to simulate each aspect, and thus give the appearance of intelligence. But human intelligence is an inscrutable thing that cannot easily be defined, let alone replicated.
Intelligence is largely accepted to be a context-dependent construct that evolves over time and varies in different cultures, making it notoriously hard to define and measure. Western theorists have generally agreed that there are multiple areas of capability that make up intelligence, but they have often been divided on exactly what these areas are and how they relate to one another. Charles Spearman, for example, proposed the idea of generalised intelligence in 1904, positing that children who showed intelligence in one academic area – such as numerical reasoning – tended to display intelligence in the other areas as well – such as verbal proficiency and spatial reasoning. Other psychologists argued in favour of specialised intelligence, with Howard Gardner famously outlining eight distinct areas of intelligence in 1983, which included musical, bodily/kinesthetic, and interpersonal intelligences. He argued that a person was more likely to excel in one or some of these areas, but very rarely in all. In 1995, Daniel Goleman further suggested that it was emotional intelligence that was the most important factor for determining a person’s success, calling into question the long-standing emphasis on cognitive abilities.
As the theoretical study of intelligence has grown more complex, we have clung to the over-simplified notion of using a single metric to measure intelligence, conveniently reducing this multidimensional phenomenon to a neat, comparable number known as an Intelligence Quotient (IQ). Efforts to quantify human intelligence began in the 1800s through the work of Sir Francis Galton, who was the half-cousin of Charles Darwin. Following the publication of Darwin’s On the Origin of the Species (1859), Galton became obsessed with recording variation in physical human traits, including variation in mental abilities or “genius”, as he referred to it in his book Hereditary Genius (1869). As a statistician, he was determined to quantify genius and to track its variation, as well as to demonstrate that intelligence – like other human characteristics, such as height and chest size – was biologically inherited and normally distributed in a population. His use of mathematical methods to analyse the data he had collected made him a pioneer in the field of psychometrics, but his commitment to eugenicist principles skewed his research significantly. He argued that intelligence was highly correlated with eminence in a profession, such as law or medicine, and concluded that eminence, and therefore high intelligence, ran in families – especially wealthy Victorian ones.
In 1905 the idea of measuring intelligence was revisited when psychologists Alfred Binet and Théodore Simon received a request from the French Ministry of Education to develop a test that would identify children likely to struggle at school, so that they could be separated from those with “normal” intelligence. The Binet-Simon test consisted of 30 tasks of increasing difficulty and aimed to measure a child’s mental abilities in relation to that of their peers of the same age. This essentially allowed educators to compare a child’s chronological age with their “mental age”. An average child would have a mental age equal to their chronological age, whilst a less intelligent child would have a mental age lower than their actual age. An extraordinarily intelligent child, in contrast, would have a higher mental age than their actual age, matching the average intelligence of an older child. German psychologist William Stern introduced a formula for calculating an intelligence quotient that would make this comparison even simpler. The IQ score is calculated by dividing mental age by chronological age and multiplying this by 100 – thus, an average IQ score is 100, with a standard deviation of 15.
The Binet-Simon IQ test was constructed to measure a variety of cognitive processes, including visual and spatial processing, fluid reasoning, working and short-term memory, vocabulary and reading comprehension skills, and quantitative reasoning. However, despite the seemingly wide scope of the test, Binet himself highlighted its shortcomings, maintaining that something as multifaceted as intelligence could not be accurately captured by a quantitative test. He noted that intelligence not only encompassed certain difficult-to-measure aspects, such as creativity and emotional intelligence, but that it was also influenced by a child’s upbringing and not purely the result of genetic coding. Intelligence was not a fixed or singular thing that people possessed – it was highly malleable and could develop at different rates and in different ways in different people, he argued. Giving a child a test and assuming that the result provided any kind of concrete information about their mental abilities – or their potential for success in life – was therefore short-sighted.
However, the lure of being able to screen, sort and compare people proved far too tempting. With the development of an IQ test specifically for adults by David Wechsler in the 1930s, intelligence tests quickly gained popularity in a variety of educational and vocational settings in Europe and the US. And despite Binet’s warnings about the limitations of quantitative intelligence tests, they continued to form the backbone of many ethnocentric and eugenicist arguments. These arguments advanced Galton’s idea of biological differences between race groups, often claiming that higher intelligence among the dominant white group – as evidenced by generally higher IQ scores than those achieved by other race groups – was proof of superior genes. The assumption that was made was that a low IQ score indicated inherently low levels of intelligence – rather than a lack of access to education, the foreignness of the IQ testing process, low English proficiency or unfamiliarity with the types of tasks being tested. And these scores had some very real consequences, as they were used to justify programmes that aimed to accelerate natural selection through “selective breeding”. In the US, forcible sterilisation of “feeble-minded people” and “imbeciles” (often determined by IQ tests) occurred up until the 1960s. The majority of these individuals were black, female and from a low socio-economic background.
Despite the historical misuse of IQ tests, in many ways they remain the most advanced option we have for measuring or predicting intelligence. Modern IQ tests have undergone significant development and have been shown to strongly predict scholastic achievement, making them a useful tool that is still commonly used to identify children who could benefit from specialised academic assistance. Crucially, these tests aim not to measure how much a person already knows, but rather to gauge their ability to learn – in other words, to minimise the advantage of having prior knowledge in any particular field and to test the ability of the person to make generalisations that will enable them to deduce new information from abstract rules. Learning has, thus, become synonymous with intelligence – and this is true for both humans and machines, as it is on this fundamental learning capability that developments in artificial intelligence are focused. Big Tech companies, such as Amazon, Google and IBM are using the model of human reasoning to guide the improvement of the products and services they offer, rather than to produce a perfect replication of the human mind. In these organisations, the terms “machine learning” and “deep learning” are commonly used, providing a more accurate description of their goals than “artificial intelligence” provides.
But even learning is no easy feat. In July 2018, AI company Deep Mind provided some insight on how far the learning abilities of AI technology have advanced, by developing a test for abstract reasoning that was based on pattern-recognition questions from a typical human IQ test. The questions feature several rows of images, with the final image in each sequence missing. The test-taker is required to determine what the last image in each sequence should be, by detecting patterns in the images preceding it. The pattern could be related to the number of images, the colour, the shape, or their placement. Deep Mind trained AI systems on these types of questions using a program that can generate unique image sequences. The AI systems were then tested, with some image sequences that were the same as in the training set, and some that had never been seen by the system before. And it quickly became clear that whilst the computers did fairly well at identifying the missing image when they had seen the pattern before, they were unable to extrapolate this prior information to determine the image in new patterns. This rang true even when the test sequence only varied slightly from the training sequence – such as when dark-coloured images were used instead of light-coloured images.
Although the initial goal of AI may have been to replicate human intelligence in its entirety, the enormity of such a task has become resoundingly clear. Human intelligence is not a stable idea – despite our best attempts, we still cannot agree on exactly what it is or how to test if a person possesses it. The only thing we seem to be able to agree on is that intelligence encompasses the ability to learn. And even in this singular aspect, machines currently still pale in comparison to humans.