What is the Turing test?
In his 1950’s work Computing Machinery and Intelligence, Alan Turing (1912–1954), who is considered by many the father of Artificial Intelligence, laid out the following question:
Can machines think?
This question, despite its short length and old origin, still remains a frequent source of discussion, navigating the frontier between technology, philosophy, neuroscience and theology.
However, more than half a century ago Turing proposed an indirect way to answer it: Through the famous Turing Test.
Turing believed that for us to answer this question without ambiguity, the question itself must be rephrased, specifying or replacing the meaning of ‘think’ and ‘machines’.
Lets first see how we can smooth the ‘think’ out of the equation. Turing proposed to do this by first modifying the question from “Can Machines Think?” to:
“Can a machine do what we as thinking entities can do?”
In other words, can a machine mimic or imitate a person? The answer to this question lies within The Imitation Game.
The Imitation Game
The new form of this problem – answering the previous question- is formulated in the following terms.
Imagine we have a man (A), a woman (B), and a neutral interrogator (C). Each of these subjects is in a separate locked room with no windows or form of visual connection, and only a screen and a keyboard through which C can communicate and interact with A and B.
The goal of the game is for C — the interrogator — to discover who is the male (A) and who is the female (B). He does this by asking questions to each of them, like for example sending a message to A (without knowing that he is the man) saying ‘What is the length of your hair?’.
The goal of A, in this case, is to make the interrogator fail, that is, fool him into thinking that he is the female. Because of this, the answer to the previous question could be something like ‘My hair is singled, and the longest strands are about nine inches long’, or something even more ambiguous.
You might be wondering what all of this has to do with computers and machine-intelligence. Let’s ask the real relevant question now: what will happen when a machine takes the role of A in this game?
This is the essence of the Turing Test: an interrogator has a conversation through the aforementioned method (screen and keyboard) with a certain entity, which can either be a human or a machine. This conversation is of limited duration in time, and is completely open.
If after this conversation the interrogator is not able to tell if the entity it was interacting with is a machine or a human, or even better, says it certainly was a human, and it turns out such entity was actually a machine, then this machine is said to have passed the Turing Test.
In the rest of the paper, Turing clarifies what for him is a machine, describes the characteristics of digital computers, and tries to dismantle certain critics to his answer to the question of whether machines can think or not.
This highlights the very little number of arguments that he has to defend his views, taking him to make a greater effort pointing out the weaknesses of the critics to his proposal than to defending it.
Before exiting the content of the paper to further discuss The Turing test, we will see how Turing takes some of the first approaches in conceptualising what will be one of the main drivers of Artificial Intelligence: Machine Learning.
In the final sections of Computing Machinery and Intelligence Turing discusses how a machine with chances of being successful in playing the imitation game could be constructed.
He argues that to be able to code a program that fills up an even a moderate percentage of the storage capacity of the brain, it would take decades for an experienced group of programmers.
If we want to imitate an adult human mind, we should somehow replicate the process that has taken place in such a mind, which he says, is made up of three components:
- The initial state of the mind, at birth.
- The education to which this mind has been subjected.
- Experience different from education, as that coming from the environment, to which the mind has been exposed.
The goal here is to create a program that replicates the mind of a child, and then educate it — make it learn — in order for it to reach the characteristics of the adult brain.
He goes on to discuss the importance of punishment and reward as a part of the learning process — similar to what we know today as Reinforcement Learning — , but clarifies that this by itself is not enough, highlighting the need of learning through some sort of inference system from which reasoning and logical insights can be extracted — similar to supervised learning —.
Isn’t it delightful how by trying to answer the question of whether machines can think or not, we arrive at a glance of what would become the most disruptive and core field of Artificial Intelligence?
Discussion and evolution of The Turing Test
Summed up in a sentence, the Turing test basically approves the capabilities of a machine to think if this machine can be undistinguished from humans in a typed conversation for more than one third of the times (or some other established threshold) that the test is performed.
Some of the strongest critics to this attribution of intelligence to the machines who are victorious in the Turing test come from the narrow domain of intelligence that it challenges.
The test only evaluates textual communication capabilities — comprehension, and expression — . Is it enough for us to grant intelligence to the master of these two? Most critics tend to differ.
Many modern Artificial Intelligence applications, built using Machine Learning, succeed and perform with outstanding levels in very narrow domain specific tasks, even beating some of the best humans in these tasks, like with Deep Blue and chess or AlphaGo and the game of Go.
Does this mean that the machines understand the games? If they do not understand, can we say that they are intelligent?
One of the most famous critics of the Turing Test is what is known as The Chinese Room. Let’s take a look at it.
The Chinese Room
The Chinese room is a mental experiment, suggested in 1980 by John Searle, and popularised by the famous physicist Roger Penrose, that tries to challenge the validity of the Turing test, arguing that computation can not derive into ‘thinking’, or at least, not in the proposed manner.
The Chinese room tries to break the analogy between mind and computer arguing that while the former is utterly restricted to the manipulation of symbols, the mind possesses a semantic capacity to understand or be conscious of the meaning of such symbols.
The experiment goes like this: imagine that we have some sort of black-box machine or computer, to which we can feed different texts, and it responds with a very appropriate textual reply in the same language.
We have no idea what kind of process is going inside of it, however, this machine can hold a perfect conversation, so it passes the Turing test with ease.
The question that Searle asks is:
Does this machine understand Chinese?
Now, instead of this black-box, imagine that we have a person inside the computer, who has no idea of speaking Chinese, however, has at his or her disposal a large list of instructions and manuals which can be used to give perfect replies for the input texts. These instructions have rules like ‘if these symbols are included in the input text, reply this and this’.
This raises three more questions:
Does the person inside the room understand Chinese?
Do the manuals of instructions and books understand Chinese?
Does the whole system understand Chinese?
The Chinese room can pass the Turing test, as we have said. Does this mean it can think then? This is what the experiment tries to dismantle by defending that thinking and understanding should go hand by hand, and therefore a system that can pass the Turing Test is not doing anything similar to thinking if it is not understanding.
The Legacy of the Turing Test
To finish off, let’s explore some of the most recent versions of the Turing test, along with when and how it has been passed, and the conclusions from such experiments.
ELIZA was a conversational bot, developed in the MIT in 1966 by Joseph Weizenbaum, being one of the first programs capable of processing natural language. This program, to which we could chat through a messenger service, mimicked a psychologist, encouraging the person at the other end to talk more and more about their problems.
It used sentences like ‘I see’ or ‘tell me more about this’ (see the last line of the following conversation), and repeatedly bombarded the human at the other end with questions, like shown in the following interaction.
Person: Do you think I can use this computer tomorrow?
ELIZA: Why do you ask?
Person: I should make more on-line sales.
ELIZA: Why are you not sure?
Person: My potential clients have never seen a demonstration of this system.
ELIZA: That does mean much for you?
ELIZA: You seem very enthusiastic about it
Person: This is ridiculous
ELIZA: What do you mean this is ridiculous?
Person: Is there any reason for which I can not use the computer now?
ELIZA: Tell me more about your potential clients.
In this case, the person speaking to ELIZA was told before the interaction that he was speaking to another human, and after the conversation, he didn’t suspect he had been instead talking to a machine at all.
ELIZA worked using a dictionary or keyword approach, looking for specific words within the input sentence that if found would trigger a pre-made response, which was probably a question. If none of the keywords was found, a generic response was given. Nowadays, much more sophisticated systems exist.
The Loebner Prize
The Loebner Prize is a competition hosted every year since 1990. It has had many different hosting locations like MIT, Cambridge University or the Science Museum of London.
Its goal is to evaluate the state of the art of conversational machines aspiring to pass the Turing Test and to promote Artificial Intelligence and Natural Language Processing research.
The procedure for this competition is the same that would be used for a normal Turing test: 30 different judges sit with 2 screens each and have two separate conversations: one with a computer program and one with a real person.
The goal is that of being able to correctly asses which screen belongs to the machine and which belongs to the person. Judges have various interactions with machine/human counterparts, and at the end of the day, the artificial system that has been able to fool the judges with the highest percentage success is crowned victorious.
An improved version of ELIZA, known as the PC Therapist, won this competition on its first three editions.
Despite being able to somehow asses the Turing Test performance of State of the Art Artificial Intelligence systems, this competition has created some controversy regarding the role of the judges, the way these can be tricked by approaches like ELIZA’s or other sorts of mimics, and the short and sometimes whimsical conversations that take place between judges and machines.
One of the biggest criticisms of this trial is the following by Hayes and Ford in 1995:
“the imitation game conditions say nothing about the judge, but the success of the game depends crucially on how clever, knowledgeable and insightful the judge is”
Also, very few times have the competing systems come close to beating the 1/3 of fooled judges threshold that is acknowledged as the limit to accept that it has passed the Turing Test. We will finish the article by exploring one of the systems that has touched this threshold: Eugene.
Eugene Goostman is a conversational chat-bot designed in 2001 that came second in the Loebner Price in 2005 and 2008. In 2012, in a distinct competition, it obtained the highest ever score of 29% of the judges being fooled, and coming very close to surpassing the Turing Test Threshold.
Eugene works by imitating a 13 year-old Ukrainian boy, with grammatical mistakes, the lack of knowledge that can only be attributed to a kid, and a very special form of speaking.
Again, we can see the trickery in these kinds of systems. They avoid facing a natural conversation and instead try to exploit the weaknesses of the test by imitating a very specific character, thus driving the conversation to narrow domains in which they can comfortably have a chat.
In 2014, with this strategy, Eugene passed the Turing test, fooling 33% of the judges it spoke to in an event organised to honor the 60th anniversary of Turing’s death. Let’s see a transcript of a conversation with Eugene.
Judge: what music do you like?
Eugene: I like Eminem — because he doesn’t have a thing to do with “polite-correctness”.
Judge: OMG i love eminem, he’s my favourite artist!
Eugene: I wonder what Eminem’s nick mean. I associate it only with a word “enema”. But he is a cool guy, I like his “The Real Slim Shady”.
Judge: i know what his nick name means, his real name is Marshall Mathers. From the initials M&M phynetically sounding like Eminem.
Eugene: Eminem is a famous crapper. No, raper… ripper… Yeah, I mean — rapper. I like his “Stan” and “The Real Slim Shady”.
Judge: I like “Hailies song”
Eugene: Interesting tastes. I like science fiction books, by the way. Where do you came from, by the way? Could you tell me about the place where you live?
Judge: Epsom, home of the derby. yourself?
Eugene: My guinea pig says that name Derby sounds very nice.
Judge: i shopuld talk with him sometime
Eugene: I think you should take some typing lessons. Or just type more accurately.
Judge: that’s very rude?
Eugene: Ummm… Frankly, I didn’t get your question:-(
Judge: “Ummm” isnt a word. i think you need typing lessons
Eugene: My thoughts are same. By the way, I still don’t know your specialty — or, possibly, I’ve missed it?
This article has explored the origin, nature, and evolution of the Turing test, one of the first arrows directed towards assessing the target of Artificial Intelligence.
I hope it has not only served you for the purpose of understanding the origins of AI, but that it has also raised several questions, leading to some sort of internal discussion.
If not, I will give you one last nudge, inciting you to answer the following question:
Can this test really be a proxy for answering the question of whether machines can think?
There are various other factors which should be considered when trying to answer this question, aside from the previously provided.
First, is it important to know whether one of the two entities which we are interacting with is a machine?
As we saw in the transcript of the conversation with Eliza, the human did not previously know that this was a chance, so for him it was unquestionably an interaction with a person.
Would this have changed if he knew that he could have either been speaking to a machine or to a human?
Secondly, should the time limitations or unrestricted character of the conversations be addressed in some way? Also, what should we do about the previously discussed problem with the judges?
All of these considerations can only further stress the last sentence written by A. Turing in Computing machinery and Intelligence:
We can only see a short distance ahead, but we can see plenty there needs to be done.
If you enjoyed this article, take a look at our articles category to read more awesome stuff about AI and Machine Learning! Also, check out our awesome books about Artificial Intelligence, which speak about the Turing Test and a lot more!