AlphaGo vs Lee Sedol
October 17, 2018 - Monocle Research Department
Lee Sedol has an air of genius about him, although not an air of command or confidence. His thick, dark hair is cut in a bowl-like style and his voice is rather high pitched with an almost childish tone. Born on the South Korean island of Bigeumdo, when Sedol arrived in the capital of Seoul at the age of eight to attend the Korean International Baduk Academy (KIBA), he was given the nickname “Bigeumdo Boy” by his classmates because of his rural upbringing and his subsequent naive and deeply curious nature in relation to his new environment. The KIBA school – dedicated to training up professional Go players from an early age – was founded by Kweon Kab-yong, a legendary Go teacher in Korea who has produced many of the greatest players of modern times. As a pupil of the academy, Sedol attended classes from 9am to 9pm, seven days a week, and eventually ended up living with Master Kweon as one of the most promising students he had ever come to recognise.
For a young Lee Sedol, Go instantly captured his imagination as wildly fun and something that came very naturally to him. At just 12 years and 4 months old, he became the fifth youngest ever professional Go player in South Korean history and enjoyed the thrill of thoroughly beating international professionals that were double, triple and even quadruple his age. Sedol was a child prodigy, regarded as a genius by many, including his teacher Master Kweon who had taught thousands of young aspiring Go players, and who commented that “unlike the other children, his eyes shone brightly.” By February 2016, a 33-year-old Lee Sedol had won the second highest number of international titles in Go history and was generally considered the greatest player in the world at the time.
Dating back to the 4th century BC, as recorded in the ancient historical commentaries contained in the Zuo zhuan manuscripts, Go – or weiqi, as it is known in China – is considered one of the oldest board games in existence, having been played consistently for at least 2 500 years. Known by the name baduk in Korea, Go reached the nation’s borders by the 5th century CE and has held a special place in the culture of the Korean people for over a millennium. So deeply is the game entrenched in Korea culture that it is traditionally considered one of the core pursuits for higher literacy along with similarly noble disciplines such as music, poetry and painting. And because of this special place in Korean culture, those who excel at Go are generally regarded as some of the most intelligent individuals amongst their peers.
In 2016, after being acquired by Google in 2014, an artificial intelligence company based in London called DeepMind proposed to Sedol an exhibition match against their Go playing computer program for a grand prize of $1 million. The program was called AlphaGo, and Sedol agreed. The first of five games was scheduled for 9 March 2016, broadcast live to the world from Seoul, South Korea – the televised event ended up attracting over 200 million viewers.
Building up to the spectacle, AlphaGo had trained itself on hundreds of thousands of recorded online Go games between amateurs and semi-professional players, studying statistical probabilities of moves in relation to winning outcomes. This foundation of learning was one of three knowledge systems used by the programme to become a better player, known as the “policy network” – to identify what a good move looks like. The second system is called the “value network”, built up through reinforcement learning by playing thousands of games against itself, each time becoming better at evaluating how a certain board position would affect the odds of a winning outcome.
With 9 March approaching, Sedol was confident in his chances, believing that if the computer program managed to take even one of the five games off him, he would consider it a great success for the developers. The fact was, Sedol had challenged many Go-playing programs in the past, and none had come close to defeating him – why would this one be any different? And in many ways, Sedol was not misguided in doubting the ability of a computer to reach the levels of complex gameplay capably displayed by the highest-ranked Go players. This handful of so-called “9 dan” players on the international Go rating system, including Lee Sedol, could be compared to Roger Federer, Michael Schumacher, or the chess grandmaster Garry Kasparov – each widely considered the best in their discipline at the time, or even the best of all time.
So, when AlphaGo comprehensively beat Sedol 4-1 in a five-game match, the world took notice. Young Korean children cried as their hero was defeated by a British computer, Go experts were flabbergasted by the intricate gameplay of the machine, Lee Sedol was near inconsolable, and one of South Korea’s biggest daily newspapers stated, “Last night was very gloomy […] Many people drank alcohol.” Like Deep Blue’s victory against Kasparov in 1997, AlphaGo had beaten a human world champion at their own game. But this time was different. Unlike the controversy and blame games that resulted in 1997, AlphaGo had won fair and square. Even Sedol was humbled and impressed by the gameplay and tactics employed by the program in post-match interviews. And Go is far more complex than chess.
Whilst Go has just two rules, there are more possible board configurations than atoms in the universe. Whilst the tree search schema combined with a value policy per piece of a chess program can map all of the hundreds of thousands of possible combinations in seconds – to quickly select the best possible and most strategic move – AlphaGo does not have this option. At any point, Go has far too many possible board positions to map out completely, and therefore artificial Go-playing programs cannot rely on the brute force manner in which a chess move could be solved.
Neural networks and deep learning have, however, changed the way programs like AlphaGo play the games they are taught. Instead of searching through every branch of possibilities in its catalogue, AlphaGo improves on its mistakes using reinforcement learning and backpropagation to improve its understanding of the game. And the greatest advantage of the computer is that it never tires – with AlphaGo playing 300+ million games against itself in a matter of days, each time making incremental improvements on its gameplay strategy.
The result of this tireless and near-unlimited learning capability is what Lee Sedol and other champions have likened to a Go god. Whilst most Go players replicate a style passed on by masters or craft their own through the adaptation of well-known strategies, AlphaGo had strayed from the path. During the epic battle with Sedol, there were times when expert commentators were left confused by wholly unconventional moves that looked almost ridiculous at the time, but in hindsight were unbelievably intricate in a far-reaching strategy that was almost unimaginable to human minds. It was these moments and the humbling defeat Sedol experienced that prompted him to say that not only had a computer opened his eyes to a new way of Go, but even a new way of life. And perhaps the most astonishing thing is, AlphaGo’s gameplay has been surpassed by a new version of itself – AlphaZero – which in a hundred game match beat its predecessor 100-0.