Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78
Deep Learning

Most people can recall what they were doing  on September 11, 2001, when their otherwise normal Tuesday was interrupted as they were drawn to a nearby television to watch in horror and disbelief as two hijacked aeroplanes – one a United Airlines flight and the other an American Airlines carrier – crashed into the World Trade Centre in the Lower Manhattan borough of New York. Within an hour and 42 minutes both the North and South towers had collapsed. Still reeling, the world was then informed that a third plane had crashed into the Pentagon. A fourth, which was headed towards San Francisco, crashed into a field, with the heroic passengers having managed to thwart the plans of their hijackers. Nearly 3 000 people were killed, and a further 6 000 were injured.


The 9/11 attack is one of the most shocking events in modern history – in mere moments, the entire population of the world’s most powerful country had been reduced to a state of fear, panic, and confusion. People evacuated business buildings unsure of whether theirs might be the next target. Families desperately tried to make contact with one another to ensure their loved ones were safe. Alarm spread throughout the world – for if America was not safe, who was? To think that anyone could have planned to benefit financially from such an event is almost beyond human comprehension. And yet, in the days leading up to the attack, trading in derivatives on both United Airlines’ and American Airlines’ stock – and only these two airlines’ stock – was unusually erratic. At one point, the put-to-call ratio in United Airlines was at 12:1, where it would normally be about 1:1. It was self-evident that certain parties had known that the attack was going to happen and used this knowledge to make a profit.


An investigation by the National Commission on Terrorist Attacks Upon the United States determined in 2003 that the unusual financial activity was coincidental and not the result of informed trading. But articles published in The Journal of Business in 2006 and The Foreign Policy Journal in 2010 outlined how researchers had uncovered new evidence using statistical analyses that confirmed that put buying in the airlines was consistent with informed investors having traded their stock before the attack. This evidence stretched beyond just  the airlines, suggesting also that traders from around the world had colluded in manipulating international markets by trading shares in insurance companies that would be liable to pay out billions of dollars after the attack. The Wall Street Journal also noted unusual activity in the trading of Treasury bonds.


In piecing together the strange movements in financial markets leading up to 9/11, researchers had to sift through an inordinate amount of seemingly unrelated information to be able to construct a stable argument that demonstrates that there were people who knew about the attacks before they happened and that they planned to benefit from them. The reason that it was so difficult to unravel these transactions is that they were often performed through shell companies based in offshore domiciles and through over-the-counter trading. These transactions were not regulated through an exchange, ensuring that those who benefitted from them were hidden behind an opaque screen that protected their anonymity and concealed their involvement in terrorist activities. The uncomfortable reality is that the financial activities of specific persons in the days before 9/11 may have been enough to indicate, before that tragic Tuesday morning, that the attacks were going to happen, if anyone had been watching close enough to identify and connect the clues.


The Role of Banks

The 9/11 case highlighted the fundamental fact that financial institutions can play a central role in safe- guarding not only the international monetary system, but the physical and social well-being of the general public as well. As a result, the responsibilities of banks have changed substantially in recent years. Banks are no longer simply institutions where we keep our money or where we obtain loans, and they are no longer only responsible for the business of banking. They have become paramount in the detection of terrorist, criminal or otherwise unethical behaviour through their focus on preventing money laundering, terrorist financing, and tax evasion. Through anti-money laundering (AML) and counter-terrorist financing regulations, banks have become responsible for investigating the origin of funds that are transferred through their systems, as well as for flagging suspicious transactions that may be connected to terrorist activities. In addition, regulations such as IT3B, Automatic Exchange, and the Foreign Account Tax Compliance Act (FATCA) aim to reduce the effects of tax evasion, essentially rendering banks extensions of the State, responsible for ensuring that tax laws are adhered to, and reporting on cases where they are not. This has not always been the case, however. Historically, in most countries, a sectoral model of regulation has dominated the financial sector, resulting  in banks being regulated separately from other kinds of financial institutions, such as insurance companies. In 1995, Dr Michael Taylor – an official with the Bank of England – suggested that this model was outdated, given the overlap that was regularly occurring in the work of different regulators. He proposed a “Twin Peaks” structure for regulation in the financial industry: one regulator to oversee prudential regulation – which is concerned with elements such as capital levels and liquidity – and one regulator for ensuring good market conduct and consumer protection – through regulations concerned with AML and counter-terrorist financing, for example. Both of these regulators would be responsible for both banks and other financial institutions. After the 2007/2008 financial crisis, this model was applied in many countries and South Africa is currently in the process of migrating to the Twin Peaks system.


The Twin Peaks approach to regulation represents a fundamental shift in the role banks play in society. In the past, banks were simply expected to manage the credit and market risk inherent in their positions. They were not expected to monitor the origins and purposes of funds, or identify the people who moved them, and this made offshore structures an especially easy target for criminal activity. However, new market conduct regulations require that banks evaluate their customers and their transactions on a much more personal level, culminating in the need for a substantial amount of personalised information about the people who use a bank’s services. Through the Know-Your-Customer process, banks are required, on an ongoing basis, to verify the identities of potentially high-risk clients, such as politically exposed persons, to assess the potential risk of accepting them as clients. They do this through a detailed analysis of documentation and other external data, and continually monitor their transactions for any suspicious behaviour.


Banks have essentially become crucial watchdogs in larger State efforts to predict and quell criminal activity. And the need for banks to ensure effective client background checks and to perform ongoing surveillance of the funds that are channelled through their business is more pressing than we may have realised, until recently. For whilst an analysis of the unusual trades leading up to 9/11 may provide an extreme example of the ability for greed or destructive impulses to eclipse all sense of morality, the reality is that unethical financial activities are disturbingly commonplace – as the Panama Papers case clearly demonstrated.


How the Rich Stay Rich

In March 2018, Panamanian law firm Mossack Fonseca announced that it was closing its doors. The firm, which was established in 1977, had operated up until 2016 as the fourth largest provider of offshore financial services in the world, with many of its offices located in tax havens such as Jersey, Cyprus, and Luxembourg. But then came its great fall, when German newspaper Süddeutsche Zeitung revealed the central role that the company had played in helping its clients to circumvent tax regulations and commit fraud. These clients were not just a handful of gangsters or smugglers – but included respected  businessmen, current and former heads of state, public officials, and celebrities. And this was not just a case of a newspaper trying to punt sensationalist news, based on a few shaky pieces of evidence. The shocking story was based on one of the largest data leaks of all time – 2.6 terabytes of data comprised of 11.5 million confidential documents that detailed the transactions of over 214 488 offshore entities from the 1970s up until 2016.


The description of the events that took place leading up to the publication of the story – as recounted by Süddeutsche Zeitung journalists Bastian Obermeyer and Frederik Obermaier in the book The Panama Papers (2016) – reads much like a John Grisham novel. Late one night, Obermeyer is contacted by a mysterious “concerned citizen” who offers him access to a well of data so enormous that it will take nearly 400 journalists from over 80 countries over a year to sift through it and follow-up on the leads it provides. The source will not give any information about him or herself, and fears for their life if they are ever found out. A secret encrypted method for sharing the files must be devised, and the journalists must piece together many disparate pieces of information to discover how the names are linked to the extraordinarily large monetary figures that are recorded.


Mossack Fonseca denied any wrongdoing after the story broke. And indeed, much of the data the journalists received detailed the use of offshore accounts for purposes that are considered perfectly legal. Business people in politically or economically volatile countries may, for example, hold financial assets offshore to protect them from being frozen or seized. Others may choose to make use of offshore accounts for estate planning and inheritance purposes. But, as the German journalists note in their book, “the fact is that people often have recourse to an anonymous offshore company because they want to hide something – from the taxman, their ex-wife, their former business partner or the prying eyes of the public.” And the data painted a clear picture of unethical and exploitative behaviour on a global scale. With Mossack Fonseca’s help, the wealthy were routinely maximising the advantages of offshore tax havens to anonymously hide vast sums of money from the governments of their home countries.


The method was relatively simple: contact was made with Mossack Fonseca through an intermediary, such as a bank or lawyer, who was then technically Mossack Fonseca’s actual client. Mossack Fonseca then acted as an incorporation agent, as the firm was licenced to register companies in Panama – a desirable business domicile with low rates of taxation and a high degree of secrecy. It was quick and relatively cheap to set up the company and its offshore bank account, and to close both when they had served their purpose. Moreover, the owner’s identity remained hidden as Mossack Fonseca appointed three directors from within its own staff base to act as the official representatives of the shell company. The real owner, or their lawyer, was given a power of attorney by the directors to access the company’s bank account – a perfectly legal loophole that most people were unaware of.


Though establishing this scheme may have been  a relatively easy task, its effects were devastating, with these offshore accounts essentially working collectively to widen the already substantial gaps that exist between the very rich and the very poor through tax evasion and money laundering, and  the  further  facilitation  of criminal activities and anti-competitive business practices. And the scope of the data informing the exposé indicated just how routinely the laws and obligations of the international monetary system are manipulated for these selfish purposes.


In many ways, the exposing of the Panama Papers ushered in a new wave of investigative journalism – one that relies on huge amounts of data, sophisticated software, and mobile collaboration between various international publications. The use of computer software called Nuix – which is routinely used by international investigators – was central to the journalists’ ability to sift through the enormous amount of data they received, and to find important connections in this information. Optical character recognition was first applied to the various documents – which included shareholder registers, bank statements, passport photos, and emails – to alter the format of the data to searchable text. This enabled the journalists to search through the data using a list of keywords – which included the names of prominent politicians, criminals and celebrities – much as one would conduct a Google search. A more detailed investigation was then conducted into each of the people whose names appeared in the leaked documents.


The problem with the statistical analyses of the trades that occurred just before 9/11 and the journalists’ computer-aided investigation into the Panama Papers is that, in both cases, the patterns and connections in the data that provided the key evidence of wrongdoing could only be detected after a significant amount of damage had already been done. But what if we could monitor and detect these patterns in such a way as to predict these types of events before they occur?


In the final pages of The Panama Papers, there is a statement written by the anonymous source responsible for leaking the Mossack Fonseca data to the Süddeutsche Zeitung. In the final paragraph he or she claims: “Historians can easily recount how issues involving taxation and imbalances of power have led to revolutions in the past. Then, military might was necessary to subjugate peoples, whereas now, curtailing information access is just as effective, or more so, since the act is often invisible. Yet we live in a time of inexpensive, limitless digital storage and fast Internet connections that transcend national boundaries. It doesn’t take much to connect the dots: from start to finish, inception to global media distribution, the next revolution will be digitised.” This is the possibility that unsupervised deep-learning AI presents us with, through its ability to isolate potentially dangerous transactions and connect this to other information that would enable us to answer questions such as “Where did the money come from?”, “Where is it going?” and “Who will benefit from this?” AI can collect and process vast amounts of data and find – in moments – connections that a human would take hours, days, weeks, or even years to see, if they see it at all. And one of its most important applications is undoubtedly in the world of banking, where the detection of patterns or trends that indicate suspicious or irregular activity could not only prevent an expensive financial fallout, but protect people on a mass scale from the various criminal forces that threaten their safety.


But this momentous opportunity also presents a critical new challenge, because how do we decide who is allowed access to such unlimited and intimate information about people? Just over a month after 9/11, with an overwhelming majority in Congress, the US Patriot Act was signed into law, giving the government unprecedented powers to potentially invade personal lives through monitoring and surveillance activities. A number of legal challenges to the Act ensued, with courts agreeing that certain of its provisions were un- constitutional, and amendments were made. But the US’s eagerness to spy on its citizens was highlighted again in 2013 as a result of the Edward Snowden National Security Agency leak, which revealed that the US was monitoring even innocent civilians through various surveillance programs. More recently, the act of spying outside of government surveillance activities has gone digital, with it having become evident the extent to which Big Tech companies are monitoring our lives. From the websites we visit, to our likes, dislikes and moods, to our most intimate messages and pictures – everything we do is being examined, recorded and used to predict what our future actions and intentions may be. And as AI advances, the inferences that will be made will only grow more invasive.


If  banks were allowed to collect data about customers and counterparties and share it with each other, as well as access information from media and Big Tech companies, we would undoubtedly be living in a safer world. But it would be one that has been deeply sanitised and that risks undermining the personal rights and freedoms of all people.  For the moment, at least, AI does offer individual institutions the possibility of accessing the data that could be used – under the guidance of strict international regulations – to predict the actions of corrupt parties before they can commit grave injustices, or perhaps even to prevent the most tragic world events.


Deep Learning

We say that necessity is the mother of invention, and that certainly was the case at Bletchley Park between 1942 and 1944, as the secret headquarters of British codebreakers during the Second World War. As depicted in the 2014 historical drama The Imitation Game, we see Benedict Cumberbatch play the role of Alan Turing, one of the most brilliant mathematicians and cryptanalysts of his time. Turing, along with a crack team of codebreakers, was tasked by the British military with deciphering the unbreakable Nazi code encrypted by the infamous Enigma machine. Whilst thousands of Allied soldiers and civilians died by virtue of the secret messages and co-ordinates sent to German U-boats and the Luftwaffe via Enigma, Turing and his team had to race against time to come up with a solution.


Within just a few weeks of working on the Enigma code, Turing had radically altered the course of the military’s efforts. The plan he proposed was to make use of a cryptanalytic machine that could help break the German cypher. Whilst in the film it is insinuated that Turing and his team conceptualised and built the machine from scratch, it was in fact modelled on a Polish machine called the Bomba – albeit with some very important alterations insisted upon by Turing. This code-breaking machine was named the Bombe – as a nod to its predecessor and because of the ominous ticking sound made by the dozens of indicator drums continuously testing possible outcomes – and it would significantly change the course of history.


With the help of this electromagnetic cryptanalyst machine that effectively automated and optimised the trial of different possibilities in the code-breaking process, Turing and his team managed to crack the previously unbreakable Enigma code. Many historians argue that this breakthrough was critical for the Allies to eventually go on to win the war, with hundreds of intercepted German messages being decoded to give their forces a distinctive strategic advantage going into battle. After the war, Turing made great strides in advancing early computing developments, and to this day, many call him the father of modern computing.


In many ways, the history of AI begins with the very first manifestations of the digital electronic computer, dating back to Turing’s earliest research in the 1930s. But in the strictest sense, the highly effective and decorated Bombe machine built by Turing’s team at Bletchley Park could not truly be called a computer. For one thing, the Bombe could only solve one problem. And secondly, it could not store or retrieve data, these being the critical functions that allow modern computers to achieve the level of programmability that makes them so powerful today.


An AI Winter

Despite the Bombe not quite being classified as the first ever computer, Turing’s truly visionary work after the war demonstrated incredible foresight into the future of computing. In a paper called “On Computable Numbers, with an Application to the Entscheidungsproblem” (1936), Turing detailed mathematical proofs that there could exist a machine that could calculate any conceivable computation, given that it was representable in the form of an algorithm. These theoretical machines were to be called Universal Turing Machines (UTM), a seminal idea that would later be used by John Von Neumann to create the Electronic Discrete Variable Automatic Computer (EDVAC) in 1949. Built for the US Army’s Ballistics Research Laboratory in Pennsylvania, EDVAC was the first ever electronic stored-program computer, and unlike previous manifestations, used a binary numbering system as opposed to a decimal system – the format still used in modern computer programming today.


As was the case with the EDVAC, the first ever machine intended to “learn” was also funded by the US Military, this time through the Office of Naval Research and built by Frank Rosenblatt at the Cornell Aeronautical Laboratory in 1957. The Perceptron, as it was called, was an early prototype for machine learning, making use of a rudimentary neural network for image recognition. Unlike modern AI, the Perceptron was a machine, not a program. And although the “learning” aspect of the machine works similarly to neural networks of today, with neurons processing incoming data and altering the weights (or relative importance of inputs) attached to these neurons depending on the resultant output, the weightings connected to neurons of the Perceptron were physically altered (as opposed to digitally) via small electrical motors. This early form of AI was called connectionism. But what seemed at first to be a significant breakthrough in machine learning and artificial intelligence, would ultimately, but unintentionally, be a massive burden to the entire field of study.


After a very promising and fruitful period for artificial intelligence research and development from the mid-1950s to late-1960s, what ensued was to be called the “AI Winter”, largely catalysed by the reception and review of the Perceptron machine by one single book in particular – Perceptrons: An Introduction to Computational Geometry (1969). The famous work – produced by American cognitive scientist Marvin Minsky and the South African-born American mathematician Seymour Papert – focused on the limitations of the Perceptron system, specifically providing mathematical proofs that such a neural network was not capable of learning an exclusive disjunction (XOR) function.


So influential was this book that it would change the course of AI research for decades to come. The result was a significant slowdown in sponsorships and a general feeling of pessimism around the discipline, with most experts on the matter espousing the limited capabilities of the earliest forms of neural networks – in the form of connectionist systems such as the Perceptron – resulting in an industry-killing funding freeze. Between the release of Perceptrons in 1969 and the eventual revival of AI research in the mid-1980s, funding for connectionism-type projects – as the earliest forms of neural networks – was near-impossible to attain. It would not be until the advent of multi-layered neural networks (capable of deep learning) that artificial intelligence research and optimism surrounding machine learning would make a revival, thanks in no small part to a few stubborn and dedicated researchers on the ground who battled through the AI Winter without much support and often under much criticism.


Unfortunately, the inventor of the Perceptron, Frank Rosenblatt, would not live to see the revival of his field – having died in a boating accident not long after the release of Perceptrons – but the late Marvin Minsky would, living long enough at least to swallow his words and completely change his mind. Minsky, who for many years doubted the capability of neural networks, would even later go on to co-found the Massachusetts Institute of Technology’s (MIT) AI laboratory, becoming one of the foremost experts in the field and a great believer in the massive potential of the “learning machine.”


A New Dawn

What Marvin Minsky and Seymour Papert did not account for in their industry-altering book was that neural networks would become multi-layered – a breakthrough that, along with the significant developments in the processing power of computers, would eventually end the AI Winter and open up a world of possibilities for the implementation of artificial intelligence and machine learning. Like many of mankind’s greatest technological triumphs, the most significant technique propelling artificial intelligence into the future is inspired by nature. The concept of the artificial neural network (ANN),  as perhaps the most advanced system in the realm of machine learning at present, is loosely based on the neural circuits that occur naturally in the brain. In a biological neural network, chemical and electrical synapses connect unimaginably intricate circuits of neurons that link together to make up the central nervous system. Each of these neurons has dendrites (receptors) and axons (transmitters) that respectively receive and send signals across a network of neurons, each of which translates various signals and stimuli into meaningful information for use in the brain.


Whilst much simpler in design compared to their biological counterparts, artificial neural networks work in much the same way. At the most conceptual level, neural networks can “learn” through considering many inputs via their neurons and adjusting the translation or processing of the data, based on the relevance of the output to the desired result – which may or may not be dictated by the user. This is what makes this system of machine learning so powerful – the ability to learn and self-correct without the need for continuous manual intervention by the programmer. And critical to the evolution of neural networks in the quest for true self- actualising artificial intelligence has been the advent of deep learning.


Deep Learning

Deep learning involves the training of artificial neural networks that are several layers of neurons deep – known as deep nets. Instead of one node processing all incoming data and producing a final result, deep nets rely on sequentially filtering data through multiple layers to refine the output. One can think of these layers of neurons as a stack of sieves or nets, each with a different sized mesh, allowing some particles through whilst blocking others. In natural neural networks this filtering process is similar – individual neurons decide which stimuli are most relevant, and which are not, in determining whether or not the synaptic connection will fire to pass on the signal to the next layer of neurons. Crucially, however, this filtering process is not a binary yes-no system, but rather relies on the calibration of a finely-tuned weighting mechanism for each neuron. The adjustment of these weights on the inputs to the neuron is the key capability that allows a neural network to “learn”.


To  better understand this concept, let us imagine  a common use for neural networks in the real world – image recognition. To narrow this down even further, let us just focus on a system that can recognise hand-written numbers. In this example, as in all deep learning neural networks, there are multiple layers of neurons making up the system. The first layer receives the external input, whilst the last layer delivers the prediction – in this case, a number from zero to nine. The set of layers wedged in between the first and last layers is where the calibration and filtering process happen, and these layers are often called the hidden layers. The activation of various neurons in these hidden layers will determine the final prediction in the last layer of neurons.


The example of image recognition for handwritten text is fairly complex, since the data being fed into the system is not in a neat numerical format – yet this is where neural networks have an advantage over other machine learning processes. In the case of recognising a number, for example, the image would typically be inputted in the form of a grid, where each block of the grid would represent a pixel. In a 28 by 28 grid, there would then be 784 blocks, and each block would be represented by a neuron in the first layer of the network. The first layer would then be 784 neurons long, each capturing the grayscale value of their corresponding pixel, often as a value from zero to one, where zero is pure white and one is pure black, for example.


Now that the system has converted an image into numerical data, it can begin the process of trying to recognise which number is being depicted in the image. In different implementations of neural networks, this step will vary greatly, but in this case, the hidden layers within the net will usually try to identify various shapes in the image by analysing the hard edges of the picture. By analysing the grid in a way that distinguishes between the black markings and white spaces, various regions of the grid can be given scores that may correspond to a specific shape – a curve or a straight line, for example. Across the several hidden layers of neurons, the various shapes recognised will trigger different combinations of neurons, eventually signalling to the last layer of neurons which number it is most likely to be. These weights determine to what extent a given input is relevant to a certain neuron. Since each neuron receives multiple inputs, the weights serve as the filter for these inputs, to let the neuron know what factors should be regarded as most important – much in the same way that dendrites in the biological neural network filter the multitude of stimuli attempting to make their way to the processing centre contained in the cell body.


Training the Network

Thus far, however, the actual analysis and learning process has not yet begun, since the manner in which the system decides on a score for each grid, or any other input  for that matter, is based on equivalently or randomly weighting each neuron input at each level in the deep net. The system will not be successful in recognising handwritten numbers unless it optimises its recognition capability by re-weighting each of the neuron input weights throughout the network. In order to do so, a process of reverse engineering takes place on a continuous basis in an advanced form of trial and error. This process involves determining mathematical parameters for each input, based on the success of predicting the output in a particular run. The specific mathematical calculations of these parameters involve relatively simple calculus techniques, specifically the calculation of partial derivatives for each input. Through a process called backpropagation, the system is able to repeatedly re-weight the inputs into each and every neuron at each level in the network (in a backward fashion from the last to the first layer), in order to achieve what is now commonly known as deep learning.


Integral to this process of refining the weightings and biases between neurons to improve their predictions, is the activation function. Think of an activation function as being at the heart of what the neuron does to transform the inputs it receives into an output. The signals that the neuron receives are first converted into a single value that is the weighted sum of all the inputs received from neurons in the previous layer, plus the addition of a bias factor. This number is then essentially ready to be processed by the activation function that sits at the heart of the neuron. The function itself could be simplistically linear in nature, a hyperbolic function, a threshold function, or most commonly, a sigmoid function. What is important is that it is a function that converts a linear weighted sum value into a new value, which then becomes an input for the next layer of neurons. The input to the next neuron is itself then taken through this process again, until finally the neural network’s last layer produces a single output value, which will then be compared with a result.


When first training a network, the weightings and biases that are meant to be able to recognise important information from less important information are set  at random. And naturally, because these weightings are random at first, the network will initially be very bad at predicting correct outcomes. To improve these predictions, the network must be trained through backpropagation, often using well-heeled mathematical optimisation techniques, such as “gradient descent”, which make use of a cost function to evaluate the outcomes of the network and to steer it in the right  direction as it refines its weightings. This cost function, in simple terms, determines how far off the network is with its predictions.


Let us think back to the example of the image recognition network for handwritten numbers. Initially, when using random weightings, the network may light up or activate totally incorrect neurons in the last layer which is meant to represent a number from zero to nine. With random weightings, when fed the handwritten number “3”, the network may at first light up the corresponding neurons for “8”, “6”, “5” and “3”, for example. To train a network using supervised learning, as is the case here, the cost function will penalise the incorrect outputs using training data that is labelled with the correct output.


To improve the network’s prediction accuracy, this cost function must be minimised. This is where the “gradient descent” methodology is actioned. The best way to visualise this method is to imagine standing in a valley (in the shape of a “U”). Your goal is to find the lowest point of the valley, representing the local minimum of the cost function. To do this, you must calculate the slope of your current position, in order to determine in which direction you must travel to find the bottom of the valley. Using your random input (representing your random or unknown location on the hill of the valley), you can calculate the slope of your current position on the cost function. If this slope is negative, then you know you are on the left hill of the valley and need to step to the right to get closer to the bottom. Conversely, if the slope is positive, you know that you are on the right hill of the valley and need to step to the left to reach the local minimum. Depending on the steepness of the slope, you know how close you are to reaching the bottom, as the slope flattens out near the bottom.


It is this autonomous iterative process that makes modern day neural networks so powerful. It is worth noting, however, that these mathematical techniques, whilst not in and of themselves particularly complex, could not until recently be performed effectively to the extent that real progress was made in simulating intelligence. This is mainly owing to two important factors. Firstly, the enormous data sets required to effectively train these systems did not exist before the explosion of the internet and social media, and secondly, the processing capability required to perform the many, many rounds of backpropagation required across these enormous data sets was not yet available to AI researchers. Thus, it was only when these two requirements were met that artificial intelligence was kick-started to the point where it could have a significant impact on society. This is especially true of more complex, unsupervised neural networks that do not make use of user-defined training sets as a guide, but rather rely on large volumes of data to refine their own training sets and outcomes through continuous refinement of predictions without the guidance of an external source. The benefits of these more data-intensive unsupervised models is that the neural net can identify previously unrecognised paths or tactics to a desired outcome far better than a human, and can be used for more than one strictly-designed task because of their open and more generally applicable methodologies.


The AI Revolution

Even though the earliest roots of artificial intelligence can be found as far back as in the 1930s with Alan Turing’s various research papers concerning the ideas and proofs for an intelligent machine, it would not be until the late 1990s and early 2000s that the mainstream media and big industry players would take AI seriously. Whilst there existed some useful applications in the technology sector before this time, especially in image and voice recognition, it was very much behind the scenes and out of the eye of the public. But despite the less-than- enthusiastic attitude of big corporations and government towards funding artificial intelligence projects, especially given the underwhelming results it had provided in the twentieth century, there were always a few isolated believers in AI who truly understood the potential of deep learning.


One such important group of researchers, who would struggle through the fallow times in artificial intelligence research and whose steady belief in these methodologies would ultimately be justified, is known to the AI community as the Canadian Mafia. This tightly-knit group of artificial intelligence evangelists – including such luminaries as Geoffrey Hinton, Yoshua Bengio and Yann LeCun – are today considered to be the rockstars of the AI field. They, for example, were the researchers that would make great strides in developing the critically important backpropagation method that would significantly advance the learning capabilities of neural networks.


Geoffrey Hinton, as an example – an English- Canadian cognitive psychologist and computer scientist – is regarded by many as the godfather of neural networks. And whilst his research has today been recognised as fundamental to the success of AI in recent times, this was not always the case. For many years, Hinton and those who studied under him, including LeCun and Bengio, were considered academics in a dying field of study. As the funding freeze of the AI Winter set in and all other researchers set their sights on what were considered more promising areas of speciality, Hinton’s group pressed on regardless with their research into mathematical methodologies to improve neural networks.


Their continued research, however, was ultimately to pay off when, in 1997, a massive turning point came for AI, especially in the mind of the public. This was the year that IBM’s Deep Blue beat the world chess champion, Garry Kasparov, in a televised event that captured the imaginations of millions of onlookers worldwide. It was the first time that many people realised the potential of machines to mimic intelligence and this sparked mass interest in the field of artificial intelligence. The Deep Blue event, which attracted more than 70 million viewers, was also indicative of the progress hardware had made in significantly shortening processing times, allowing for speeds never seen before, with IBM’s chess- playing machine being able to run through a reported 200 million moves per second.


This excitement, combined with the progress made in raw computing power, led to many significant milestones for artificial intelligence in the coming years. Such successes included the driving of a completely autonomous vehicle for 131 miles – on a route previously unknown to the vehicle – in the DARPA challenge, won by a team from Stanford University in 2005. Then in 2011, in another highly publicised event, IBM’s Watson beat a team of two champions at the quiz show game of Jeopardy! by a significant margin, demonstrating that artificial intelligence had moved beyond simple brute force number-crunching and could now process written language within complex contexts. And possibly most impressively to date, an unsupervised neural network beat the world champion Go player, Lee Sedol, in an ancient strategy game that has so many possible combinations that it is impossible for a computer to run through every possible board position, as in chess, but has to learn intuitively how to improve its own gameplay in an autonomous fashion.


Importantly, these are just some of the most publicly exposed examples of artificial intelligence. Many of the more practical and industry-important applications go very much unnoticed by the common user, as the AI is often hidden in technologies and software that we use every day, such as our laptops and smartphones, as well as within the apps and social media platforms that consume so much of our attention. We simply need to think of recent developments in facial recognition that allow us to unlock our phones, or that recognise and tag our friends in our uploaded pictures, and it becomes apparent that we unwittingly use deeply complex artificial intelligence technologies almost every day. In fact, avoiding interaction with artificial intelligence has become near-impossible in today’s connected world, especially since the omnipotence of so-called Big Tech. These behemoth firms, namely Facebook, Apple, Google and Amazon, have so utterly pervaded our daily lives that trust has become a default setting for the users of their services. And in allowing them free access to our lives, we provide them with one of the key components to success in the development of even more powerful and intrusive AI capabilities – our data.


Whilst the artificial intelligence revolution promises to change the world in many positive ways, there is the risk that this exciting field may be wholly absorbed by Big Tech and subsequently used in any way that a handful of powerful executives wish. Unfortunately, as recent history has shown, these mega-corporations’ agendas are not aligned to the best interests of their users, but rather to the maximisation of profits at all costs. What is even more disturbing is that the individuals who once held the torch as AI purists – academics who had always looked at the bigger picture and wanted to use artificial intelligence to solve real pressing problems in the world – have now been lured into the research labs of Big Tech. This includes even the die-hard Canadian researchers who brought AI from the backrooms of academia to the forefront of modern technology, with Hinton working for Google and LeCun for Facebook. And whilst the third and youngest member of the Canadian Mafia, Yoshua Bengio, has managed to resist the extravagant salaries given to AI experts by Big Tech, it does seem as if he is fighting against the tide. One can only hope that such an important field of research will not continue to be overly dominated by a few large corporations, bringing to mind a Terminator-esque future controlled by the likes of an all-powerful Skynet. And in this sense, the story of artificial intelligence has just begun.



Deep Learning

Garry Kimovich Kasparov is considered by many to be the greatest chess player of all time. But the Russian grandmaster, despite his incomparable genius, is not a man without controversy. In 1993, Kasparov was the reigning world champion of the Fédération Internationale des Échecs or World Chess Federation (FIDE), but he had grown frustrated with the organisation’s pedantic bureaucracy and seemingly arbitrary changes in the code of conduct for players. In what he later described as the biggest mistake of his career, as it caused great disunity in the chess community, Kasparov broke away from FIDE to create a rival chess body called the Professional Chess Association (PCA), declaring irreconcilable differences in opinion and even accusing the FIDE of corruption.


When considering this erratic and somewhat stubborn behaviour, Kasparov’s infamous claims that the IBM Deep Blue team cheated to win against him in 1997, suddenly seem a little less credible. But 20 years later, in a 2017 interview with Google’s DeepMind CEO Demis Hassabis, it seems that Kasparov has somewhat refined his argument, saying that “When signing a contract, you always have to read the fine print. When people ask me if IBM cheated, no, they just bent the rules in their favour. They followed the letter but not the spirit of the agreement.” It seems almost sad, and somewhat pitiable, how passionately Kasparov stands by his convictions two decades after his humbling loss – but he may actually have a point.


If you are not familiar with Garry Kasparov’s appearance, it would be hard to imagine him as a chess grandmaster. The Russian is tan with thick dark hair, has a propensity for nicely tailored suits and is very charismatic. He speaks articulately and certainly lacks no confidence in himself or his abilities. Speaking to journalists before his famous match against IBM’s Deep Blue in 1997, he did not even seem to contemplate the possibility of losing to the computer. And why would he? Since the age of 22, he had never lost a match against a human or computer opponent. He had even beaten an earlier version of Deep Blue the previous year, and before the match, he scoffed at the organisers’ suggestion to split the prize money of half a million dollars 60-40 between the winner and the loser. He wanted it all, and in 1996, he won it all. Kasparov was the best in the world, and he knew it. In fact, in the 19 years between 1986 and his retirement from competitive chess in 2005, Kasparov was ranked first in the world for 225 out of 228 months. He was virtually unbeatable.


The match in 1997 was different in so many ways. The six-game challenge started well for a confident Kasparov, and he easily beat Deep Blue in the first game, thanks to what would later be called a glitch in the machine’s programming. All was going to plan for the world champion, but it was to be a pivotal moment in the second game that would change everything.


In that iconic game, Kasparov intended to set a trap for his opponent using a variation of an opening strategy known as the Ruy Lopez or Spanish Opening. As the strategy played out, a pivotal point required the placement of the “poisoned pawn” to entice the opponent into a compromising situation – a technique the Russian had used against countless opponents, especially against computers, which often could not recognise the danger they were playing themselves into. But to Kasparov’s great surprise, at the critical moment, Deep Blue did not take the bait. Instead, the machine played a move that was far more subtle and forward-thinking than anyone – except the IBM team – could have ever anticipated. A grandmaster that was commentating on the game, John Nunn, called it “a stunning move played by a computer”. It was something no professional had seen before from chess-playing programs. Kasparov was visibly shaken.


That move was the beginning of the end of chess grandmasters beating computers. And at that point, Kasparov’s emotions took a hold of him. He vigorously rubbed his face, stood up, and walked around the room rather aimlessly. At one instance, Kasparov stood alone in the middle of the room staring at his mother – who was seated in the audience – shaking his head. In several interviews after the match, the Russian called the move in question “human-like”. And for years to come, there was suspicion around the true source of the move – with Kasparov and many others suspecting there was some kind of human intervention from the IBM team on behalf of Deep Blue.


In part, the suspicion was justified. The IBM team did not exactly fight fair in 1997. Before the match, although initially agreeing to let Kasparov study games played by Deep Blue in training, they rescinded their agreement on the premise of a technicality – the contract stated that Kasparov was entitled to any games played in official tournaments, but Deep Blue had technically not played any games in official tournaments. IBM therefore did not provide him any games to study. As Kasparov then states even 20 years later, “It was a black box,” as he had no idea of what he was up against compared to the previous version he had beaten in 1996.


This was just one of numerous distracting tactics the IBM team used to ultimately defeat their opponent. Other subtle psychologically targeted elements built into the programming of Deep Blue included the manipulation of timing in processing the output for a move. When playing a human opponent, one can attempt to read the body language and timings of a player to gauge the various emotions and thought processes they may be experiencing. A quick move after your own, for example, could indicate excitement in a plan coming together, or in contrast, a slow move could be a sign that they are perhaps unsure how the next few moves may turn out. With a computer, there is no body language, but in chess playing programs of years gone by, a longer processing time could mean the computer was having to recalculate a strategy. The IBM team recognised that such timings could psychologically affect Kasparov and opted to programme Deep Blue to sometimes take longer than it strictly needed, to give the idea that a move may be more complex than it seemed at first glance.


All these factors combined to eventually break Kasparov down psychologically. Even he admits that certain moves – a number of which were later revealed as mistakes by Deep Blue – played heavily on his mind, as to whether they were brilliant or idiotic, visionary or a glitch. And all such uncertainty was further compounded when IBM refused to release the log files of Deep Blue’s activities, creating even more doubt around the already rather devious tactics employed by the multinational corporation. Whilst the victory was a milestone for computing power, Deep Blue would not strictly be seen as a “learning machine” by today’s artificial intelligence standards. Where modern AI can learn and adjust, Deep Blue simply exhausted every possibility in its tree search schema, running through hundreds of thousands of moves per second to find the best, pre-programmed outcome. What Deep Blue did prove, however, was that brute force and cold hard steel can outlast, and eventually outplay, even the most gifted human counterpart. The computer was ultimately not smarter or more strategic than Kasparov, but it was more resilient, more composed, and less emotional – simply less human.


Deep Learning

Lee Sedol has an air of genius about him, although not an air of command or confidence. His thick, dark hair is cut in a bowl-like style and his voice is rather high pitched with an almost childish tone. Born on the South Korean island of Bigeumdo, when Sedol arrived in the capital of Seoul at the age of eight to attend the Korean International Baduk Academy (KIBA), he was given the nickname “Bigeumdo Boy” by his classmates because of his rural upbringing and his subsequent naive and deeply curious nature in relation to his new environment. The KIBA school – dedicated to training up professional Go players from an early age – was founded by Kweon Kab-yong, a legendary Go teacher in Korea who has produced many of the greatest players of modern times. As a pupil of the academy, Sedol attended classes from 9am to 9pm, seven days a week, and eventually ended up living with Master Kweon as one of the most promising students he had ever come to recognise.


For a young Lee Sedol, Go instantly captured his imagination as wildly fun and something that came very naturally to him. At just 12 years and 4 months old, he became the fifth youngest ever professional Go player in South Korean history and enjoyed the thrill of thoroughly beating international professionals that were double, triple and even quadruple his age. Sedol was a child prodigy, regarded as a genius by many, including his teacher Master Kweon who had taught thousands of young aspiring Go players, and who commented that “unlike the other children, his eyes shone brightly.” By February 2016, a 33-year-old Lee Sedol had won the second highest number of international titles in Go history and was generally considered the greatest player in the world at the time.


Dating back to the 4th century BC, as recorded in the ancient historical commentaries contained in the Zuo zhuan manuscripts, Go – or weiqi, as it is known in China – is considered one of the oldest board games in existence, having been played consistently for at least 2 500 years. Known by the name baduk in Korea, Go reached the nation’s borders by the 5th century CE and has held a special place in the culture of the Korean people for over a millennium. So deeply is the game entrenched in Korea culture that it is traditionally considered one of the core pursuits for higher literacy along with similarly noble disciplines such as music, poetry and painting. And because of this special place in Korean culture, those who excel at Go are generally regarded as some of the most intelligent individuals amongst their peers.


In 2016, after being acquired by Google in 2014, an artificial intelligence company based in London called DeepMind proposed to Sedol an exhibition match against their Go playing computer program for a grand prize of $1 million. The program was called AlphaGo, and Sedol agreed. The first of five games was scheduled for 9 March 2016, broadcast live to the world from Seoul, South Korea – the televised event ended up attracting over 200 million viewers.


Building up to the spectacle, AlphaGo had trained itself on hundreds of thousands of recorded online Go games between amateurs and semi-professional players, studying statistical probabilities of moves in relation to winning outcomes. This foundation of learning was one of three knowledge systems used by the programme to become a better player, known as the “policy network” – to identify what a good move looks like. The second system is called the “value network”, built up through reinforcement learning by playing thousands of games against itself, each time becoming better at evaluating how a certain board position would affect the odds of a winning outcome.


With 9 March approaching, Sedol was confident in his chances, believing that if the computer program managed to take even one of the five games off him, he would consider it a great success for the developers. The fact was, Sedol had challenged many Go-playing programs in the past, and none had come close to defeating him – why would this one be any different? And in many ways, Sedol was not misguided in doubting the ability of a computer to reach the levels of complex gameplay capably displayed by the highest-ranked Go players. This handful of so-called “9 dan” players on the international Go rating system, including Lee Sedol, could be compared to Roger Federer, Michael Schumacher, or the chess grandmaster Garry Kasparov – each widely considered the best in their discipline at the time, or even the best of all time.


So, when AlphaGo comprehensively beat Sedol 4-1 in a five-game match, the world took notice. Young Korean children cried as their hero was defeated by a British computer, Go experts were flabbergasted by the intricate gameplay of the machine, Lee Sedol was near inconsolable, and one of South Korea’s biggest daily newspapers stated, “Last night was very gloomy […] Many people drank alcohol.” Like Deep Blue’s victory against Kasparov in 1997, AlphaGo had beaten a human world champion at their own game. But this time was different. Unlike the controversy and blame games that resulted in 1997, AlphaGo had won fair and square. Even Sedol was humbled and impressed by the gameplay and tactics employed by the program in post-match interviews. And Go is far more complex than chess.


Whilst Go has just two rules, there are more possible board configurations than atoms in the universe. Whilst the tree search schema combined with a value policy per piece of a chess program can map all of the hundreds of thousands of possible combinations in seconds – to quickly select the best possible and most strategic move – AlphaGo does not have this option. At any point, Go has far too many possible board positions to map out completely, and therefore artificial Go-playing programs cannot rely on the brute force manner in which a chess move could be solved.


Neural networks and deep learning have, however, changed the way programs like AlphaGo play the games they are taught. Instead of searching through every branch of possibilities in its catalogue, AlphaGo improves on its mistakes using reinforcement learning and backpropagation to improve its understanding of the game. And the greatest advantage of the computer is that it never tires – with AlphaGo playing 300+ million games against itself in a matter of days, each time making incremental improvements on its gameplay strategy.


The result of this tireless and near-unlimited learning capability is what Lee Sedol and other champions have likened to a Go god. Whilst most Go players replicate a style passed on by masters or craft their own through the adaptation of well-known strategies, AlphaGo had strayed from the path. During the epic battle with Sedol, there were times when expert commentators were left confused by wholly unconventional moves that looked almost ridiculous at the time, but in hindsight were unbelievably intricate in a far-reaching strategy that was almost unimaginable to human minds. It was these moments and the humbling defeat Sedol experienced that prompted him to say that not only had a computer opened his eyes to a new way of Go, but even a new way of life. And perhaps the most astonishing thing is, AlphaGo’s gameplay has been surpassed by a new version of itself – AlphaZero – which in a hundred game match beat its predecessor 100-0.


Deep Learning

In mathematics, a singularity is a point at which the normal logic of mathematics breaks down and an object does not behave in the manner that is typically expected. As the simplest example of a singularity, when the x in the function f(x) = 1/x is equal to zero, the answer cannot be defined. This is because when x = 0, the function’s outcome seems to explode to an undefined state where the result must be something close to positive or negative infinity. In physics, a singularity can theoretically occur in a similar manner, most commonly exemplified in the physics related to black holes. Karl Schwarzschild defined this in his famous Schwarzschild radius equation in 1916, where every object with mass possesses a physical parameter corresponding to the event horizon of a black hole. According to this equation, any object with a physical radius smaller than the Schwarzschild radius, whilst maintaining the same mass, would create a situation where everything in between these two parameters would be unable to escape, as the gravitational pull towards the object’s centre would exceed the speed of light. In this scenario, not even light would be able to escape the Schwarzschild radius – creating a black hole.


There is, however, another kind of singularity that has become a favourite topic of debate – that of the technological singularity. This theory is based on the notion that, one day, an artificial super-intelligence will be created that is so far superior to its creators that it will begin a cycle of self-learning and self-improvement, spiralling beyond the control of human intervention. But the opinions on how close we are to this technological singularity, or if it is even possible, vary greatly.


The first mention of a technological singularity in the 1950s was aptly, and perhaps somewhat tellingly, uttered by the Hungarian-American mathematician, physicist, and computer scientist, John von Neumann, who is widely recognised as a founding figure in the world of computing. Von Neumann was no stranger to potentially world-ending technological advancements, as one of the leading scientists in the Manhattan Project during World War II, he helped to produce the nuclear weapons that were dropped on Hiroshima and Nagasaki in August of 1945. In the 1950s, a peer of von Neumann, Stanislaw Ulam, recalled a conversation with him that “centred on the accelerating progress of technology and changes in the mode of human life, which gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue.”


In recent years, perhaps the most prominent voice of the singularity has been that of computer scientist and futurist, Raymond Kurzweil. Among Kurzweil’s predictions were the disintegration of the Soviet Union because of the advancement of technologies such as cellphones, and the explosion in internet usage from the 1990s. He also foresaw that chess software would beat the best human player by the year 2000 – a feat that was achieved in 1997 when IBM’s Deep Blue beat world champion Garry Kasparov in a globally-broadcast match. And in terms of the concept of a technological singularity, Kurzweil predicts that by the year 2045, “the pace of change will be so astonishingly quick that we won’t be able to keep up, unless we enhance our own intelligence by merging with the intelligent machines we are creating.”


Kurzweil’s prediction, as described in his book The Singularity is Near (2005), relies heavily on a theory called “the law of accelerating returns.” He argues that the singularity is closer than many think, because humans tend to reason in terms of linear progression. Yet, as he describes in his book, technology, as with many of our most important advancements, is progressing at an exponential rate – a reality observed by Gordon Moore, co-founder of Intel, in 1965. Moore observed that the number of transistors per square inch on integrated circuits had doubled every year since the integrated circuit was invented, and he predicted that this would continue to be the case for the foreseeable future. In recent years, the pace of technological development has slowed down but only slightly, with the capacity of computer chips roughly doubling every two years, according to what has become known as “Moore’s Law”. At certain times the rate of this growth seems linear, Kurzweil explains, because when looking back at the first half of the curve, it is much flatter than what comes after the “elbow” of the curve. At that point, beyond the elbow, advancements that previously took decades to see major progress, could suddenly double and then quadruple in effectiveness, usability and adoption. And one such advancement – that many have deemed slow and laborious in its development and practical applications in the last few decades – is artificial intelligence. Just as Kurzweil explains, when we stand at this point in time and look back at the rate of progress in the field of AI since the beginning of the 20th century, it certainly can seem linear in nature, if not pedestrian.


However, what has predominantly been holding AI back is not a lack of ideas or useful implementations, but a shortage of both computing power and the data necessary to achieve deep learning in AI. In recent years, both of these necessities have experienced substantial growth, providing the major players who have collected these vast amounts of data a seemingly endless number of possibilities for the penetration of artificial intelligence into every industry imaginable, as well as into almost every sphere of our daily lives. In many ways, if we are indeed currently situated at the “elbow of the curve”, the conditions do seem perfect for AI to accelerate exponentially in the coming years – perhaps even in time to realise Kurzweil’s expectation of a technological singularity by the year 2045.


The idea of technological singularity is not, however, one that everybody views with as much optimism as Kurzweil, owing to the widespread fear that machines may gain the intelligence to one day rise up and, without empathy or compassion, overcome and annihilate the human species. This fear has a long history and has been manifested in many cultural expressions – from literature, to film, to art – and it has only grown more intense as the power of artificial intelligence has accelerated.


This fear, however, is predicated on the belief that humans and machines are completely separate and competing entities. It ignores the possibility that we could be moving towards the singularity that Kurzweil describes – one that envisions the ultimate advancement of human intelligence through the corporeal merging of human and machine. Kurzweil believes that as humans continue to evolve, we will inevitably reach a point where computational capacity will supersede the raw processing power of the human brain, enabling us to move beyond the present limits of our biological bodies, and our minds.


Kurzweil’s enthusiasm for the singularity is echoed in a 2017 article in An International Journal of Computing and Informatics, where researchers Mikhail Batin, Alexey Turchin, Markov Sergey, Alisa Zhila and David Denkenberger assert that there will be three stages of AI development, and that we are currently only in the first stage of “narrow” AI. They predict that what will follow is artificial general intelligence and then super-intelligence, by which point the possibility of uploading human minds and creating disease-fighting nanotechnological bodies will lower the probability of human death to close to zero. Biomedical gerontologist and chief scientist at the Strategies for Engineered Negligible Senescence (Sens) Research Foundation Dr Aubrey De Grey raised eyebrows when he proclaimed that the first person that will live to be 1 000 is probably already alive today. De Grey’s long white beard and sometimes eccentric opinions have perhaps – to some extent – made him easy to dismiss, especially among the scientific community. But the acceleration of medical research as a result of AI – and the possibilities for achieving radically improved approaches to health and medical care – are not easy to disregard. Companies such as Insilico Medicine, IBM Medical Sieve, Google DeepMind Health and are already involved in projects that will aid in advancing disease detection and treatment.


The enhancement of the human body through technological means has intrigued us for years and has been explored extensively in fiction through the character of the cyborg – from the James Bond supervillain Dr No and his bionic metal hands, to the replicants in Ridley Scott’s Blade Runner, Molly Millions in William Gibson’s The Neuromancer, and Tony Stark in the Iron Man Marvel comics, to name a few. Literature scholars have highlighted that the recurring use of the cyborg character reflects our concerns about the changes in human nature and identity that are taking place through the blending of technology and corporeality. The willingness of writers to mix elements that are human with those that are not has been hailed as a potentially significant transgressive act, producing characters whose identities are fluid and permeable – for they are neither strictly human, nor machine. The power of the cyborg was articulated perhaps most famously by Donna Haraway in her academic essay A Cyborg Manifesto (1984), where she argues that the figure of the cyborg allows for the possibility of envisioning a world where the human and the non-human merge seamlessly.


Haraway’s argument is an important one to consider, as fiction increasingly becomes reality. Humans have been augmenting themselves for years and it could be argued that, in fact, almost all of us are already cyborgs to some extent. We use synthetic drugs to improve our health, to stave off life-threatening disease, and to enhance our performance both mentally and physically. Artificial devices are routinely used to improve our eyesight or hearing, to give people new limbs, or to keep hearts beating. As a species, we are getting smarter, running faster, and living longer thanks to artificial augmentation. We have always been altering the limits of what the human body is capable of. It is therefore perplexing that we should fear the more advanced physical enhancements that artificial intelligence is likely to facilitate in the not-too-distant future. Perhaps it is because in the case of pace-makers and prosthetics, we feel that technology is only restoring a normal level of physical functionality, rather than enhancing the natural body. But this is not entirely true.


Before he made headlines for his involvement in a dramatic murder, South African double-amputee Oscar Pistorius was the subject of an international debate of a rather different kind. In 2008, the International Association of Athletics Federations (IAAF) banned Pistorius from competing against able-bodied runners, as it claimed that his prosthetic limbs gave him an unfair advantage over human legs. These artificial appendages were reported to make Pistorius more energy efficient than normal sprinters and to reduce the time between strides to such a degree that researchers estimated that he would have as much as a seven second advantage in a 400m race. These small, but significant, advancements had given a damaged human body more functionality than the average human body, and collectively, we generally view these enhancements with suspicion.


A far more advanced development in prosthetics is robotic limbs, which rely on brain-computer interfaces to help amputees regain an unprecedented level of movement and control over their bodies. Jesse Sullivan – who had both of his arms amputated following an electrocution accident – underwent a nerve graft to join his shoulder muscles to his pectoral muscles, and a computerised prosthesis was joined to his body where his right arm used to be. Using thought control, Sullivan is able to contract the muscles in his chest, and the computer in the arm is able to interpret these signals to perform the desired motion. When he thinks “close hand”, the chain of communication through his body – and its artificial addition – work seamlessly and his prosthetic hand closes. Researchers in Utah moreover announced in 2017 that they had developed a hand that can simulate over 1 000 unique touch sensations in the brain of a user, enabling them to interact with their environment in a tactile sense. These prosthetics are far more technologically advanced than Pistorius’s legs – and yet, society views these as medical marvels.


Generally, our first response is to elicit fear when futurists like Kurzweil talk about singularity, or when businessmen like Elon Musk talk about a neural lace enabling human brains to interface directly with computers. These ideas may seem outlandish, and yet to consider them so is to draw a fairly arbitrary line between what are acceptable augmentations to the human body and what are not. Humans have always found ways to enhance themselves, and as AI advances, the possibilities for pushing beyond our present physical limits are only growing. What is perhaps more interesting than the debate currently taking place about the potential impact of AI, is to question ourselves as human beings more deeply – to ask ourselves what we mean when we speak of human intelligence. After all, if we believe that we can create an artificial intelligence, then it is incumbent upon us to have a better grip on the meaning and history of what is understood by us as human intelligence itself.


Deep Learning

By all accounts, Phineas Gage was a friendly, professional and level-headed person before his accident. But on the afternoon of 13 September 1848, everything changed. At the age of 25, Phineas was a construction foreman heading up a small team of railway workers, primarily involved in blasting and clearing ground for tracks to be laid. This job was achieved by drilling a thin vertical hole into the rock that was to be blasted and sprinkling a sufficient amount of gunpowder into the hole. A fuse was then added and the top of the hole was tightly packed with sand or clay – a practice called tamping – to direct the blast into the surrounding rock and to contain the explosion safely below ground.

As foreman, Phineas was tasked with first loading and gently packing the gunpower into the hole with an iron tamping rod. His assistant would then pour sand or clay on top of the powder and Phineas would more vigorously tamp the sand into the hole to ensure a tight fit. Much like a writer may perhaps have a favourite pen, Phineas used a personalised tamping rod with a tapered tip for ease of holding, a piece he had commissioned from a local blacksmith. On that Wednesday, Phineas used his personalised rod to perform a task that he had completed thousands of times before, almost unconsciously.


Late that afternoon, after a long day of hard work, something went wrong. Witnesses claimed that Phineas was distracted by a commotion as some members of the team were noisily packing blasted debris into the back of a truck for removal. Phineas turned his head to see what the fuss was about and upon returning to his tamping, he failed to notice that his assistant had not yet poured sand into the blast hole. Striking down into the gunpowder-filled cavity with more force than normal, Phineas’ personalised rod set off a spark against the drilled rock. Without the buffer of the sand or clay, the gunpower was ignited and the subsequent explosion propelled the javelin-like tamping rod from the narrow hole, like a bullet from a gun.


The iron rod – which was over a metre long and weighed close to six kilograms – shot straight through Phineas’ skull. According to several accounts, although unlikely verifiable, the rod landed tip first over 20 metres away, covered in blood and an oily layer, presumably from the fat-rich cells of the brain. The force had thrown Phineas Gage onto his back and yet, even with the rod passing through the front left part of his brain, he miraculously survived. Even more remarkable was the fact that he never lost consciousness, sitting up and talking as he was taken into town by his co-workers, and even greeting the arriving physician, as recorded in The Boston Medical and Surgical Journal, saying, “Doctor, I have some business for you.”


The case of Phineas Gage has lived on in university text books for decades, as a favourite example amongst lecturers across a broad range of disciplines. What was special about this case was Phineas’ dramatic change of personality after the accident. Where before he was described as a gentle and considerate soul, after the accident his friends no longer knew Gage as Gage, as he was fitful, irreverent and prone to the crudest profanities. And although making a miraculous recovery in all cognitive abilities of memory, language, motor skills and reasoning, Gage had changed on a personal and social level, as was attested to by close friends and family. So drastic was this negative change to his personality that the railroad company that hired him turned him away after the accident, despite Gage displaying full functionality in all physical and mental endeavours related to his work.


As the first medically recorded incident of the type, the case of Phineas Gage marked a turning point in the study of the human brain and its relation to who we are as human beings. Whilst retaining basic motor and cognitive functions, the damage to Gage’s frontal lobe caused a radical change in his higher cognitive functions that relate to social interaction and inter-personal behaviours. Since this instance, there have been hundreds of cases where an illness, accident or trauma to the brain has significantly altered the thoughts and behaviours of individuals in strange and unexplainable ways, with particular kinds of epilepsy making people more religious, for example, certain cases of Parkinson’s making people lose their faith, and the medication for Parkinson’s subsequently turning patients into compulsive gamblers.


Similarly, when we ingest drugs or alcohol, we may act in ways that are said to be “out of character”, or we even claim that “we were not ourselves” when perpetrating certain unfavourable actions. But what exactly is it then to be you? If you can become “not you” through a forced change in your physical and chemical makeup, either permanently or temporarily, then surely the conception of our own being is artificially created, beyond the realm of physical observances – as some abstract, unchangeable and incorruptible version of the “I”.


This theoretical conception of our own being has a long and complicated history in Western thinking, with the 17th Century French philosopher René Descartes proposing a type of metaphysical dualism that separates the world into physical and non-physical states. For Descartes, the abstract workings of the mind – such as thoughts and feelings relating to love and our ideas of morality, for example – belonged to the non-physical realm, as core characteristics that make us who we are. Thus, although our physical bodies may change, there still remains an incorruptible essence that constitutes what it means to be “me” or “you”, as a unique being that cannot be replicated.


How unduplicatable this consciousness is, however, is up for debate. Certain fields of modern neuroscience research, for example, aim to digitally map and replicate every neural connection and interaction of a living organism, to begin to better understand, among other things, how our consciousness operates. One such project – which may serve as a launchpad for a wider investigation into consciousness – is OpenWorm. The goal of the OpenWorm project was to build the first ever comprehensive computational model of a living organism, digitally replicating each cell and every neuron, to become the world’s most detailed virtual lifeform. The particular organism in question is Caenorhabditis elegans (C. elegans), a microscopic nematode, or roundworm, comprised of less than a thousand cells. Something of a hermaphroditic superstar in the scientific community – with research on C. elegans going back over 40 years – it is biologically perhaps the best understood creature in existence. To date, it is the only organism with every one of its 959 cells and 302 neurons completely mapped out in what is known as a connectome – a wiring diagram of sorts, detailing the connections and interactions of the neural system.


Despite its simple biology, C. elegans is a relatively complex creature compared to its microscopic worm counterparts. Unlike many its size, this roundworm is constantly reacting to stimuli, solving problems such as finding food, locating a mate and even avoiding deadly predators like the Pristionchus pacificus, a fellow nematode that preys on the slightly smaller 1mm long C. elegans. It is this combination of physical simplicity and sufficiently complex behaviour that make it the perfect subject for a project such as OpenWorm, as a first step in understanding the ephemeral relationships and interactions between individual neurons and how these biochemical reactions ultimately affect behaviour.


Once every neural interaction was captured in code, as a virtual brain of sorts, the next natural step for the OpenWorm project was to give the digitally replicated C. elegans a body. And once the software containing the encoded neural interactions was linked to some motors and sensors attached to a Lego body with wheels, without prompting, the robot came to life – it began to move on its own, in ways that the scientists described as characteristic of a nematode. They had created a virtual lifeform and the digitally replicated worm began to interact with its environment as if it were a real, organic organism.


What the scientists at OpenWorm aimed to prove – although only on a miniscule level – is that it is possible to model and replicate organic lifeforms in a digital format. And if it is true that the chemical pathways between neurons can be mapped, understood and exactly replicated, then there is conceptually nothing stopping the eventual recreation of a functioning human brain – either digitally or physically – given the fullness of time and the advancement of technology.


Deep Learning

The term “artificial intelligence” was coined in 1956 by computer scientist John McCarthy, who used it to refer to the idea of defining all aspects of human intelligence in such detail that a computer could be programmed to simulate each aspect, and thus give the appearance of intelligence. But human intelligence is an inscrutable thing that cannot easily be defined, let alone replicated.


Intelligence is largely accepted to be a context-dependent construct that evolves over time and varies in different cultures, making it notoriously hard to define and measure. Western theorists have generally agreed that there are multiple areas of capability that make up intelligence, but they have often been divided on exactly what these areas are and how they relate to one another. Charles Spearman, for example, proposed the idea of generalised intelligence in 1904, positing that children who showed intelligence in one academic area – such as numerical reasoning – tended to display intelligence in the other areas as well – such as verbal proficiency and spatial reasoning. Other psychologists argued in favour of specialised intelligence, with Howard Gardner famously outlining eight distinct areas of intelligence in 1983, which included musical, bodily/kinesthetic, and interpersonal intelligences. He argued that a person was more likely to excel in one or some of these areas, but very rarely in all. In 1995, Daniel Goleman further suggested that it was emotional intelligence that was the most important factor for determining a person’s success, calling into question the long-standing emphasis on cognitive abilities.


As the theoretical study of intelligence has grown more complex, we have clung to the over-simplified notion of using a single metric to measure intelligence, conveniently reducing this multidimensional phenomenon to a neat, comparable number known as an Intelligence Quotient (IQ). Efforts to quantify human intelligence began in the 1800s through the work of Sir Francis Galton, who was the half-cousin of Charles Darwin. Following the publication of Darwin’s On the Origin of the Species (1859), Galton became obsessed with recording variation in physical human traits, including variation in mental abilities or “genius”, as he referred to it in his book Hereditary Genius (1869). As a statistician, he was determined to quantify genius and to track its variation, as well as to demonstrate that intelligence – like other human characteristics, such as height and chest size – was biologically inherited and normally distributed in a population. His use of mathematical methods to analyse the data he had collected made him a pioneer in the field of psychometrics, but his commitment to eugenicist principles skewed his research significantly. He argued that intelligence was highly correlated with eminence in a profession, such as law or medicine, and concluded that eminence, and therefore high intelligence, ran in families – especially wealthy Victorian ones.


In 1905 the idea of measuring intelligence was revisited when psychologists Alfred Binet and Théodore Simon received a request from the French Ministry of Education to develop a test that would identify children likely to struggle at school, so that they could be separated from those with “normal” intelligence. The Binet-Simon test consisted of 30 tasks of increasing difficulty and aimed to measure a child’s mental abilities in relation to that of their peers of the same age. This essentially allowed educators to compare a child’s chronological age with their “mental age”. An average child would have a mental age equal to their chronological age, whilst a less intelligent child would have a mental age lower than their actual age. An extraordinarily intelligent child, in contrast, would have a higher mental age than their actual age, matching the average intelligence of an older child. German psychologist William Stern introduced a formula for calculating an intelligence quotient that would make this comparison even simpler. The IQ score is calculated by dividing mental age by chronological age and multiplying this by 100 – thus, an average IQ score is 100, with a standard deviation of 15.


The Binet-Simon IQ test was constructed to measure a variety of cognitive processes, including visual and spatial processing, fluid reasoning, working and short-term memory, vocabulary and reading comprehension skills, and quantitative reasoning. However, despite the seemingly wide scope of the test, Binet himself highlighted its shortcomings, maintaining that something as multifaceted as intelligence could not be accurately captured by a quantitative test. He noted that intelligence not only encompassed certain difficult-to-measure aspects, such as creativity and emotional intelligence, but that it was also influenced by a child’s upbringing and not purely the result of genetic coding. Intelligence was not a fixed or singular thing that people possessed – it was highly malleable and could develop at different rates and in different ways in different people, he argued. Giving a child a test and assuming that the result provided any kind of concrete information about their mental abilities – or their potential for success in life – was therefore short-sighted.


However, the lure of being able to screen, sort and compare people proved far too tempting. With the development of an IQ test specifically for adults by David Wechsler in the 1930s, intelligence tests quickly gained popularity in a variety of educational and vocational settings in Europe and the US. And despite Binet’s warnings about the limitations of quantitative intelligence tests, they continued to form the backbone of many ethnocentric and eugenicist arguments. These arguments advanced Galton’s idea of biological differences between race groups, often claiming that higher intelligence among the dominant white group – as evidenced by generally higher IQ scores than those achieved by other race groups – was proof of superior genes. The assumption that was made was that a low IQ score indicated inherently low levels of intelligence – rather than a lack of access to education, the foreignness of the IQ testing process, low English proficiency or unfamiliarity with the types of tasks being tested. And these scores had some very real consequences, as they were used to justify programmes that aimed to accelerate natural selection through “selective breeding”. In the US, forcible sterilisation of “feeble-minded people” and “imbeciles” (often determined by IQ tests) occurred up until the 1960s. The majority of these individuals were black, female and from a low socio-economic background.


Despite the historical misuse of IQ tests, in many ways they remain the most advanced option we have for measuring or predicting intelligence. Modern IQ tests have undergone significant development and have been shown to strongly predict scholastic achievement, making them a useful tool that is still commonly used to identify children who could benefit from specialised academic assistance. Crucially, these tests aim not to measure how much a person already knows, but rather to gauge their ability to learn – in other words, to minimise the advantage of having prior knowledge in any particular field and to test the ability of the person to make generalisations that will enable them to deduce new information from abstract rules. Learning has, thus, become synonymous with intelligence – and this is true for both humans and machines, as it is on this fundamental learning capability that developments in artificial intelligence are focused. Big Tech companies, such as Amazon, Google and IBM are using the model of human reasoning to guide the improvement of the products and services they offer, rather than to produce a perfect replication of the human mind. In these organisations, the terms “machine learning” and “deep learning” are commonly used, providing a more accurate description of their goals than “artificial intelligence” provides.


But even learning is no easy feat. In July 2018, AI company Deep Mind provided some insight on how far the learning abilities of AI technology have advanced, by developing a test for abstract reasoning that was based on pattern-recognition questions from a typical human IQ test. The questions feature several rows of images, with the final image in each sequence missing. The test-taker is required to determine what the last image in each sequence should be, by detecting patterns in the images preceding it. The pattern could be related to the number of images, the colour, the shape, or their placement. Deep Mind trained AI systems on these types of questions using a program that can generate unique image sequences. The AI systems were then tested, with some image sequences that were the same as in the training set, and some that had never been seen by the system before. And it quickly became clear that whilst the computers did fairly well at identifying the missing image when they had seen the pattern before, they were unable to extrapolate this prior information to determine the image in new patterns. This rang true even when the test sequence only varied slightly from the training sequence – such as when dark-coloured images were used instead of light-coloured images.


Although the initial goal of AI may have been to replicate human intelligence in its entirety, the enormity of such a task has become resoundingly clear. Human intelligence is not a stable idea – despite our best attempts, we still cannot agree on exactly what it is or how to test if a person possesses it. The only thing we seem to be able to agree on is that intelligence encompasses the ability to learn. And even in this singular aspect, machines currently still pale in comparison to humans.


Deep Learning

One of the most famous German legends of all time is a story about an erudite named Faust. Having excelled in all areas of learning, Faust becomes bored and frustrated by the limits of his knowledge and is tempted by a demon called Mephistopheles to make a perilous deal with the devil. Faust agrees to trade his soul in the afterlife for infinite knowledge and power while he is alive. The disastrous outcomes of his willingness to commit himself to eternal damnation in exchange for a higher understanding of earthly matters differ from one version of the story to another. But in every form – from medieval English playwright Christopher Marlowe’s The Tragical History of the Life and Death of Dr Faustus to German writer Johann Wolfgang von Goethe’s two-part play, Faust – Faust’s tale remains a cautionary one, warning people of the tragic downfall that awaits when moral integrity is sacrificed in the pursuit of intellectual ambition.


The moral of the story is one that is reiterated in Mary Shelley’s famous 1818 tale of Viktor Frankenstein and his monster. From a young age, Viktor displays an insatiable thirst for knowledge, but to his father’s dismay, this leads to a fascination with the mystical philosophies of alchemy.  Whilst advancing his learning at university, however, Viktor is convinced to turn his attention to modern science and chemistry. In an ambitious merging of the fantastical dreams of the alchemists with the logic of hard science, he seeks to create a new race of beings – but his experiment goes awry, producing a monster so deformed and terrifying in appearance that Viktor runs away from it, leaving his creation unsupervised. Though he tries, the monster is unable to successfully integrate himself into human society, a reality that results in the death of many people – including Viktor’s younger brother and his new wife. Like Faust, Viktor Frankenstein’s quest for super-human knowledge becomes his ultimate downfall.


Though hundreds of years old, these stories are particularly pertinent in the age of AI, where the drive to know more has gained unprecedented momentum and led to an attempt to produce a thinking creature, created in our own image. The goal of unsupervised deep learning is to use vast amounts of data and advanced processing algorithms modelled on the human brain to make machines that are capable of doing things more accurately and more efficiently than we are capable of in our normal human state – and to drive us, like Faust, towards a place of infinite knowledge. But like Frankenstein’s monster, modern AI has emerged out of an ambitious combination of mysticism and science, with little regard for the moral implications of such a pairing. The pioneers of Big Tech were, after all, once the hippies that drove an LSD-laden counterculture movement.


The idea that humans – in their normal, mortal state – are somehow limited in their ability to access and understand infinite knowledge is a theme that has been explored throughout time and across cultures. Psychedelic drugs have often been at the centre of the pursuit to access a higher level of consciousness, from the peyote used by the medicine men of North America, to the iboga used in initiation ceremonies by the Babongo in Gabon, the kava that features in the sacred rituals in the Pacific Islands, and the yagé used by tribes in the South American rainforest. And in contemporary Western society, we have lysergic acid diethylamide (LSD).


LSD was first synthesised by the Swiss chemist Albert Hofmann in 1938 and produces a range of perceptual, emotional and cognitive effects in varying degrees amongst different users. Common experiences include visual and auditory hallucinations, vivid mental imagery, synaesthesia, a broadening and intensification of access to one’s emotions, and increased cognitive flexibility. In the most powerful “trips”, users report a total sense of fluidity between the self and the external environment – a so-called oneness with the universe.


Neuroscientific investigations into the link between the pharmacological and phenomenological effects of LSD have produced insights about the drug’s biological effect on users. These studies have largely focused on the drug’s ability to improve communication between different parts of the brain through increased synaptic connections. In humans, the more synapses between neurons that are stimulated, the better our ability to learn. LSD has a particularly strong activation effect on serotonin receptors in the prefrontal cortex, which plays a key role in enabling the brain to process and integrate information from other regions and in making decisions. These findings have been integrated with generally accepted neurodynamic understandings of the mind, which suggest that the brain makes use of filtering or constraining mechanisms in our perceptual and cognitive systems to manage the overwhelmingly large amount of information it continuously receives from the external environment. This prevents us from becoming incapacitated by large amounts of information and facilitates efficient information-processing and decision-making in our daily lives. Researchers suggest that drugs such as LSD interfere with this information-processing limiting mechanism, literally expanding our perceptual, emotional, and cognitive capabilities. Neuroimaging studies have also revealed that LSD increases neural communication across synaptic connections between the parts of the brain that are involved in introspection and those responsible for sensory and perceptual processes. This accounts for the loss of boundaries between the self and the world that LSD users often report.


But the long-term effects of LSD use can be devastating, and can result in ongoing hallucinations, paranoia, cognitive disorganisation, and mood disturbances. Use of the drug can also trigger severe mental illnesses, such as bi-polar disorder and schizophrenia. In The Doors of Perception (1954), Aldous Huxley – who took inspiration from William Blake’s famous poem The Marriage of Heaven and Hell – describes the brain as a “reducing valve” that limits our ability to access full consciousness. Recent neuroscientific research on the brain’s constraining mechanisms provides uncanny biological proof of Huxley’s suspicion that the brain, in its normal state, limits thought. But what Huxley and so many others have failed to consider is that this may be a crucial development in our evolution, rather than a flaw. During childhood, the brain streamlines its functions through a process of synaptic pruning, and this is what enables us to learn. Neural connections are formed and those that are repeatedly activated – because they have proven to be useful or rewarding patterns of thought – are physically reinforced, whilst those that are unused are dissolved. Cognitive biases also pervade our thinking on a daily basis, providing a set of heuristics that make our decision-making faster and more effective. In short, there is a reason our brains have developed these constraining mechanisms, and without them a person can become so overwhelmed by the sensory information they receive, that they become dysfunctional. LSD may enhance the number of connections the brain makes, but these are rarely useful and can often lead to long-term psychological damage.


Unaware, or perhaps unconcerned, about the high cost of accessing higher knowledge using LSD, Hofmann’s “sacred drug” – as he referred to it in his memoir LSD: My Problem Child (1980) – was popularised during the counterculture movement as an aid for accessing “the mystical experience of a deeper, comprehensive reality.” During this time, Timothy Leary – a clinical psychologist working at Harvard – famously developed a theory of consciousness expansion through psychedelic substances, after experimenting with their controlled use in the treatment of alcohol addiction and criminal behaviour. He argued that the drug allowed people to gain unprecedented insight into themselves and increased their alertness to the external world, leading to consciousness-expansion and releasing them from the consciousness-narrowing effects that result from the ritualistic compulsions of addiction. Leary was dismissed from Harvard for failing to give his required lectures, although the fact that he was pressuring students to take psychedelics, and taking the drugs himself with them, were more likely the true causes of his employer’s discontent.  But he continued his research off-campus, hosting retreats at a mansion in New York that combined psychedelic experiences with meditation, yoga and group therapy sessions. His most famous catchphrase was “Turn on, tune in, drop out”, and through this he urged people to make use of psychedelic drugs to sensitise their brains to the world (turn on), to engage with the new perceptions they could access as a result of these drugs (tune in), and then to question established social norms and authorities (drop out).


Leary’s theory quickly became intertwined with more spiritual and mystical ideas as his patients described their accounts of the life-altering awakenings they had experienced whilst taking LSD. These mystical experiences became central in the pursuit of knowledge across multiple fields, owing to the commonplace use of the drug among university students. The archetypal hippie was not only free-thinking and politically liberal, but often intellectual and highly educated as well. The counterculture movement produced some of the most famous art, literature and music of our time, and it was also the era in which many influential modern civil rights movements gained a foothold. After the Kent State shootings – which left four students dead after a protest against bombings in Cambodia by US military forces – the movement became explicitly political, with protests against the US’s involvement in the Vietnam War spreading general dissatisfaction with the government, and the traditional values it had promoted.


The role of LSD use among the future leaders of Big Tech during the 1960s has also been well-documented in John Markoff’s What the Dormouse Said: How the Sixties Counterculture Shaped the Personal Computer Industry (2005) and Ryan Grim’s This is Your Country on Drugs: The Secret History of Getting High in America (2009). Innovators who seem to owe their ideas, at least in part, to LSD include Doug Engelbart (the inventor of the computer mouse), Kevin Herbert (the inventor of virtual reality), and Steve Jobs, who openly claimed that “taking LSD was a profound experience, one of the most important things in my life. LSD shows you that there’s another side to the coin.” Like Viktor Frankenstein bringing alchemy and chemistry together to create his monster, so the pioneers of the tech industry have merged the other-worldly imaginings produced by LSD with the logic of science in their attempts to create a robot with human-like cognitive abilities. But there is a chilling irony in the fact that these inventors were attempting to replicate the functions of the human brain, whilst physically altering their own. Given what we have recently learnt about the manner in which personal data has been used by Big Tech companies for commercial gain, it could also be argued that, like Viktor Frankenstein, they have been blinded by their ambition, unleashing their creation on the world, imperfect though it is, and with little regard for the damaging impact it may have. And whilst Viktor Frankenstein was something of an anomaly in his time, the mad scientists of our generation wield incredible influence in today’s society.


Timothy Leary re-emerged in the 1980s, describing computers and the internet as “the LSD of the 1990s” and altered his catchphrase to urge people to “turn on, boot up, jack in.” But just as LSD users of the 1960s punted the benefits of the drug with little regard for the damage it could cause, we are not yet fully aware of the implications that our digital LSD could have in the future. Certainly, through the power of smart technologies, we have gained access to a vast amount of information – but this access is worryingly restricted to a select few, monopolised by Big tech firms such as Facebook, Google, Amazon and Apple. And it seems that whilst the forerunners of this industry may have once marched for the noble political ideals espoused by the counterculture movement, the legacy they have left is one driven by profits and devoid of an ethical code or a sense of social responsibility. That knowledge is power is a truism that has echoed through history, and it is deeply concerning to realise that those who currently have access to the most knowledge may have sold their souls to acquire it


Deep Learning

A key area of interest in AI development is natural language processing (NLP), which aims to programme computers to communicate with humans through natural language recognition, natural language understanding, and natural language generation. AI developers have had several successes in this area – just think of the popularity of digital assistants such as Siri and Alexa, for example – but they have also faced many challenges. And this is unsurprising, for although we use it every day, language remains a highly mysterious feature of human intelligence.


Language has long been a key focus in the debate about human intelligence, because it is only through language that we are able to communicate our thoughts and advance our knowledge collectively. For years, linguists have engaged in heated debates about how it is that we have acquired this crucial skill, with their arguments often falling on one side or the other of a nature versus nurture dispute. In Verbal Behaviour (1957), behaviourist BF Skinner famously suggested that language is a learned skill, acquired through a process of operant conditioning. He maintained that children learn language by being exposed to it, imitating it, and then receiving positive reinforcement when they use it correctly. For example, a child learns that if he says “up” when he wants to be picked up, he will achieve the desired outcome. The achievement of the goal – being picked up – reinforces in his mind what the word “up” means, and he will be more likely to use it again in the future in the same scenario.


Many people agreed with Skinner’s theory or developed their own similar theories, which supported the idea that language was not innate in humans but learned through our engagement with other people and our environment. American linguist Noam Chomsky, however, took a strongly opposing view, arguing that the human ability to acquire and use language is a biological feature, hard-wired in the neural networks of the brain. This is not to say that children are born with the ability to fluently speak their mother tongue, but rather that children are bestowed with a natural syntactic knowledge, a fundamental ability to understand and apply the rules of language – regardless of whether they will learn to speak English, Spanish, French or any other language from their parents. Chomsky argued that humans are naturally programmed with a universal grammar, and that we need only learn the parochial features of any particular language in order to speak it.


Chomsky’s theory gained credibility as it accounted for the fact that children do not have to be explicitly taught every specific word and sentence that could possibly exist in order to use them. Rather, children are exposed to a limited vocabulary in their early years and learn very quickly to apply the rules of their language to construct an infinite number of sentences. Chomsky highlighted that often, children will actually utter things that are highly unlikely to have been heard by them from any adult, such as the erroneous application of the “ed” suffix to irregular past tense verbs, such as “runned”. The overgeneralisation of the “ed” for past tense verbs shows that the child is clearly attempting to apply a rule, rather than simply imitating something they have heard – pointing to an internal aptitude for language.


Chomsky’s argument led to a wave of linguistic research into the biological mechanisms that might facilitate language acquisition, and investigation into which aspects of language may be consistent across different languages. If similarities in the structure of all languages known to exist in the world could be proven, this would further support the idea that all humans are endowed with universal grammar.


The idea of universal grammar was further popularised through the work of cognitive psychologist Steven Pinker, who offered an evolutionary perspective on Chomsky’s theory in his 1994 book The Language Instinct. Chomsky had described universal grammar as something innate, a part of our very DNA, and the special feature that elevates humans above other animals. But, unwilling to accept that language is simply part of us, Pinker sought to provide a biological explanation for why humans had acquired this skill, arguing that universal grammar is the result of evolution and natural selection. He agreed that language is a uniquely human ability but posited that it has developed as a specialised adaptation that ensures that we can survive and thrive in our environment – much like a spider’s natural instinct to create a web. He described universal grammar as being representative of the structures in the brain that recognise the language rules and patterns in another person’s speech. This natural affinity speeds up the process of language acquisition in children when they are exposed to language in their external environment and is disassembled to some degree as the child grows up – having mastered language, the brain frees up capacity for other functions, no longer prioritising the learning of language. This is a phenomenon that would be well understood by anyone who has tried to learn a language as an adult.


However, it is important to note that the theory of universal grammar has not gone uncontested. Linguistic anthropologist Daniel Everett, for example, claimed to have found a language that does not display the key evidence for universal grammar – namely, recursion, which enables a limited number of words to be combined in an infinite way – in the Pirahã tribe of the Amazon. He maintained that this was sufficient evidence to dismantle the possibility of language being innate, and many other linguists have followed suit, arguing against universal grammar in increasingly technically nuanced debates. That linguistic theory almost always orientates itself in relation to Chomsky’s universal grammar is, however, telling.


In fact, academic debate as to whether or not a universal grammar exists has reached fever pitch and, if ones reads deeply into the arguments, they can sometimes appear to be quite petulant and almost personal in many regards. The reasons for this may not only reside in the fact that peoples’ careers would be somewhat undermined if the existence of universal grammar were to be definitively proven, but also because the existence of a universal grammar is viewed by many as somewhat akin to a linguistic argument for the uniqueness of human intelligence. The debate therefore appears to be much more about whether humans are special and distinct from other species which are not bestowed with the so-called gift of language.


Learning language

In contemplation of one of the great unsolved problems of AI – that of natural language acquisition – it is fundamentally important to note that if a universal grammar were to exist, then no amount of data or neural network engineering or hardware or cloud-space would ever enable a machine to acquire language as humans do. The only way that a machine could ever acquire language would be to solve the problem in a so-called closed-form parametric solution. To be precise, if a universal grammar exists in humans, then neuroscientists would need to understand exactly how the human brain works in this specific regard, and exactly where in the brain this universal grammar actually resides. And this would then have to be replicated exactly in a machine. However, this presents something quite distinct and different from the goals of unsupervised deep learning neural networks, which seek to replicate the process of learning in general, rather than endowing machines with an architectural structure containing the specific, preordained rules and parameters that are only accessible to human beings and to no other species.


On the other hand, if we suppose that people do not have some kind of “language acquisition device” in their brain, as Chomsky referred to it, and that a child learns language by hearing it, then it would seem logical that it would be possible for machines to similarly perfect human language, simply by analysing enough data. But despite the wealth of data that is currently at our disposal, there are a number of examples that show that natural language processing (NLP) remains a challenging area for AI. NLP research has certainly made great strides, not only teaching computers the words that exist in a language and the rules that govern their grammatical combination, but also teaching them when to use certain sentences. The first NLP computer program was created in the 1960s and was called Eliza. Although Eliza could hold a conversation with humans, she lacked any understanding of the exchange, using a pattern-recognition methodology to select responses from a pre-determined script. More recently, virtual assistants have advanced voice-recognition AI, performing tasks as directed by voice commands from users. However, these programs have not been without their problems and have been criticised for misinterpreting instructions and for requiring that commands be given with an unnatural degree of stiffness. These devices can certainly hear, interpret and respond to language – which in the most basic terms means they can communicate – but this communication is limited, has been prone to problems, and is certainly nothing akin to what even young children are capable of.


In 2011, a chatbot called Cleverbot was able to fool 59.3% of the human participants at the Techniche festival at the Indian Institute of Technology Guwahati into thinking they were chatting, using text messages, with another human. The chatbot had been trained on millions of conversations with humans and worked by searching through these conversations to select the most fitting responses to the messages it received. This marked a key development for chatbots, with Cleverbot proving itself capable of communicating in a far more conversational manner than the NLP devices that had preceded it. But it still lacked any real understanding of context or social propriety, a shortfall that would likely become more obvious in a more spontaneous or lengthy real-world interaction.


A 2018 WIRED article further explains how – an American company active in the digital assistant market – is trying to create a chatbot that can schedule meetings for busy professionals. But even this seemingly straightforward task is proving immensely complex, thanks to the quirks of natural human language. The developers have found that often people send meeting requests that are muddled by conversational niceties and “small talk”, or that are ambiguous when describing their availability. The chatbots are programmed to send responses asking for clarification, but it can be arduous and frustrating for the user if they must respond to several emails just to schedule a meeting. In a bid to try to prepare for every conversational possibility,’s trainers are feeding a vast amount of data into the system, in a tireless quest to refine the algorithm to the point where it will be able to communicate with all the nuance and flexibility that a human does. Whether or not the machines will ever be able to communicate like a human remains uncertain. And this is because language does not only perform a practical function – enabling us to share content with one another – but a far more abstract social one as well.


The dual functions of language

In The Stuff of Thought: Language as a Window into Human Nature (2007), Steven Pinker cements the idea that language is a distinctly human trait, by focusing on how our use of language reveals insights into the internal dynamics of how we think. As arbitrary – and at times, irksome – as using correct grammar may seem, Pinker argues that we have come to a consensus to use certain words in certain ways for a particular reason. The way we speak reflects the way we think, betraying the intuitive physics that underpin our understandings of the world. Prepositions, for example, reflect our conceptions of space, whilst nouns reflect our conceptions of matter, tenses our conception of time and verbs our conception of causality. The words we use are anything but arbitrary – they are intended to communicate very specific meanings, which are aligned with the mental models and cognition processes we use to make sense of the world.


Pinker does not, however, advocate the idea that language is merely the tool by which we communicate our thoughts to others, much like a memory stick that can take information from one computer and transfer it to another. After all, we do not only use language to describe when an event occurred, or where an object is currently placed. He reminds us that we are always communicating with someone else, explaining that language therefore has a dual purpose: it must convey content whilst at the same time negotiating a social relationship. The phrase “if you could pass me the salt, that would be awesome,” for example, does not make much literal sense and is a highly inefficient way to communicate, if all the speaker wants is the salt. But the speaker is aware that they are making a request of another person and will not wish to sound overly demanding – so they hedge the request in a way that is more polite. Language, then, is doing something far more complex than simply transferring our internal thoughts to one another. It is continuously confirming that we belong to a social group.


The social function of language is evident in the fact that when we communicate, very rarely do we do so in a manner that would convey content in the simplest and clearest way. Often, we use language in a very abstract way, relying on the listener’s human ability to understand what is being said in context. Metaphor, together with the combinatorial power of language – what Pinker refers to as “the infinite use of finite means” – makes possible infinite creativity in language, and infinite meaning. When we interpret an utterance, we draw on a vast body of knowledge about the world and about people, combining this with our own experiences to instinctively interpret what is being said. In addition, we temper our interpretation with a host of other information – such as the person’s tone, the length of their pauses, their body language and facial expressions – which helps us to understand what the person actually means.


It is unlikely that neuroscientists will ever be able to prove the existence of Chomsky’s universal grammar – for this was a concept that he invented to produce a general theory of language acquisition, rather than to describe a particular biological structure. But, if AI research and development is to replicate one of the crucial pillars of intelligence – the acquisition of natural language – it would, in any event, have to do it without human intervention in the form of scripts or templates. Whether universal grammar exists or not, AI would still have to find a way to address not only the transfer of content, but the ongoing negotiation of the social function that language performs – with all the verbal and physical nuances and contextual variations that affect the meaning of the words we utter. Until AI can achieve this, it has not demonstrated artificial intelligence at all – no matter how many Chess or Go champions it beats.


In May 2018, Bloomberg Businessweek published an article entitled “These tech revolutions are further away than you think.” The list included a practical use for blockchain technology, the mainstream use of augmented reality, the death of cable television, the full-scale implementation of renewable energy sources, total data portability, and fully robotic factories. Natural language processing was, however, curiously absent from the list. Given the limitations that virtual assistants and chatbots have displayed in performing even the most menial communication-based tasks, it seems clear that whilst vast amounts of data and highly refined neural network structures may threaten jobs, entire industries, and even our values, it is highly improbable that machines will be capable of the distinctly human capability to acquire and use language effectively in the near future.


Deep Learning

In the pursuit of constructing artificial intelligence, it would clearly be an assumption on behalf of AI researchers and indeed the general public, that the qualities of intelligence would include rational thinking, problem solving and the ability to divine general truths from data and a series of facts. This would be true, for example, in the construction of an accurate historical description of what had led up to a particular event, such as the 2007/2008 financial crisis. Ironically, it may in fact be this very quality of veracity expected to be achieved from artificial intelligence that may make machine learning distinct from a very specific and important quality in human beings, and therefore make it impossible for machines to actually achieve true intelligence. That quality, unique to and inherent in humans, is the ability to lie.


The ability to lie, or to construct a fictional story, according to American cognitive scientist Elizabeth Loftus, is an important capability that is directly involved in the way we find meaning in a complex world. It is also integral to how we store this meaning in the form of reconstructive memory. And whilst this fictional construction of memories serves a positive role in the healthy mental functioning of the individual creating those memories, these “false memories”, as Loftus calls them, can in certain instances be detrimental in determining the course of someone else’s life – as was the case with the falsely accused Steve Titus.


In the early evening of 12 October 1980, on the desolate outskirts of Seattle, Washington, a teenage girl was raped. Upon being questioned by the police about the details of the incident, the young victim described the rapist as a twenty-something white male, about 6-foot tall with brown hair and a beard, wearing a light-coloured three-piece-suit, driving what she remembered as a dark blue car with vinyl-covered bucket seats and a temporary number plate beginning with either 776 or 667.


That same night, a young restaurant manager in the Seattle area, Steve Titus, was driving home after a romantic dinner with his fiancé. On the way back to his house, Titus was pulled over by the police and arrested. Unfortunately for Titus, the description of the rapist’s car given by the teenage girl slightly resembled the car he was driving that evening. What made the situation even more dire for Titus, was the fact that his car had temporary paper number plates, since he had bought the car not long before the incident. The final straw was that Steve Titus also rather closely matched the physical description of the perpetrator, being 5 foot 8 inches tall with brown hair and a beard. This left the police little choice but to arrest him based on these similarities.


When given a photo line-up of suspects, the victim pointed out Steve Titus, saying that he was the closest match to what she remembered. Months later in court, the victim claimed under oath that she was absolutely positive that Titus was the man who had raped her. But he was not. From the start, Titus’ arrest had been a miscarriage of justice. Whilst the rapist was described wearing a three-piece suit, Titus owned no suits. The car that the victim described and the tyre tracks observed by the police were those of a Honda Accord, but Titus drove a new Chevrolet Chevette. Furthermore, witnesses attested to the fact that Titus had been with family, friends and his fiancé for most of the day and night, making it impossible for him to have been near the scene of the crime at the time it was committed.


The case was rightly overturned and Titus only stayed in jail for one night, but the trauma would weigh heavily on him for years. Shortly after his release, Titus lost his job and broke up with his fiancé, who was disturbed by his extreme bitterness towards the authorities conducting the trial and struggled to cope with the negative turn his life had taken. Titus decided to dedicate his life to suing the state department for damages, but sadly died of a stress-related heart attack shortly before proceedings began, at just 35-years of age.


These types of cases that involve instances of mis-memory or false memories are the specialist domain of Elizabeth Loftus, who cites the Steve Titus example as just one of many times false memories have influenced the outcome of important events. According to Loftus, our memories are very fragile, to the extent that implanting false memories is far more easily achieved than one may think. In one such experiment, Loftus explains that when subjects were shown a video of a car crash, their answers when recalling the scene varied widely. This was especially true when the questions led the subjects to make logical leaps, asking one sample group, for example, “How fast were the cars going when they bumped into each other?” and then asking the other sample group, “How fast were the cars going when they smashed into each other?” In the answers to the second question, the estimated speed rose by as much as 20% on average, simply by inserting one emotive word into the sentence – even though the subjects had watched precisely the same video.


For neuroscientist David Eagleman, this phenomenon makes for an interesting investigation into the brain’s function in retaining memories. According to Eagleman, this fallibility of memory is a result of the pressure the neural matrix is put under to hold onto old memories, whilst simultaneously having to constantly experience new external stimuli. The memory of an event, for example, requires different groupings of neurons to work together to compose the larger scene – where different groups of neurons retain different details. Yet, as time passes, each neural group, and indeed each neuron, must begin to multitask in the sense of building new memories as well as attempting to retain old ones. The enemy of memory then, as Eagleman puts it, is not time, but new memories fighting for space in the brain and gradually causing the vividness of the old memories to fade.


One way that we preserve these memories – both Eagleman and Loftus agree – is in the reconstruction of details into abstract stories, or even myth. Memories and their details, are thus, over time, cemented or refuted to form a mental narrative based on the supplementation of new experiences. As many scientists have found, this natural storytelling function of the brain is an important human trait, even if often premised on false memories that are empirically irreconcilable with objective observations. Recent studies have also shown that the ability of children to tell fictional stories, or even explicitly lie, is a strong indicator of healthy social development at an early age – where children who fail to display this trait often struggle to fit in socially and more frequently exhibit delinquent behaviour in the future.


One very influential thinker that has written extensively about the importance of lying, or more specifically, the importance of storytelling in relation to the way in which humans solve problems, is Austrian-born philosopher Karl Popper – regarded by many as one of the 20th century’s greatest philosophers of science. In one of his later books, Knowledge and the Body-Mind Problem: In Defence of Interaction (1995), Popper describes how this storytelling ability of humans is an integral part of what he calls the problem-solving schema. To extend the sphere of our knowledge and understanding, he explains, we as humans instinctively guess at a likely answer to difficult questions, although no empirical rationalisation can be given to support the statement at that time. From this point of assumption, it is then up to us, or those around us, to test the soundness of that story – eventually either proving or disproving the truth of the claim. Through this hypothetical solutioning, we can use our scattered memories and observations as a resource for forward-looking analysis, to predict a likely future outcome despite a lack of currently observable evidence for such a presumption.


When considering the possibilities or limits of artificial intelligence then, it is perhaps this ability to form a story, or to simply lie about the reality of our experiences, that sets us apart from our machine counterparts. For whilst humans and machines can both process the data or experiences inputted into our systems – machines often faring far better than ourselves at this task – we alone can purposely construct an unproven or even false analysis. The ability to lie is to find meaning in disparate facts, not only so that we are able to store these stories in our brains more effectively from a biological point of view, but also so that we are able to socially integrate and convey this new meaning to our fellow human beings. This quality is not currently a requirement of artificial intelligence, but it may prove to be one of AI’s most telling shortcomings when eventually attempting to measure up to the natural intelligence of the human brain.