The Turing test is an experiment to determine if a computer program possesses artificial intelligence (AI). Here's what the Turing test is all about, including its definition, history, limitations, variations, and primary applications.
The Turing test aims to see if a human can tell the difference between another human and a computer program.
The original test consists of two human subjects and an autonomous computer. One person (the questioner) sits in a room with a keyboard and a monitor. The other person (the responder) is in a separate room with a keyboard and a monitor. The computer is in a different room by itself.
The human test taker types a series of predetermined questions, which the responder and the computer receive. The human responder and the computer provide their answers, and the test taker must decide which responses came from the human.
If the test taker can't tell the difference between the human and the computer more than half the time after repeating the test, then the computer is said to have artificial intelligence.
The Turing test was invented by English computer scientist Alan Turing, who first proposed the idea in his 1950 paper "Computing Machinery and Intelligence." Turing based his idea on a similar experiment called the Imitation Game involving two men and one woman. One of the men has to determine the gender of the other two subjects based solely on their responses to the same questions. Turing's test switches out one of the responders with a computer to see if the questioner can determine which responder is human.
In 1966, Joseph Weizenbaum developed a chatbot called ELIZA that met the basic conditions of the Turing test: It could generate sentences that users couldn't distinguish as having been written by a computer program.
From 1991-2020, the Loebner Prize was awarded yearly to the AI that performed the best in a Turing test. In 2014, a chatbot named Eugene Goostman won a Turing Test competition at the University of Reading to commemorate the 60th anniversary of Alan Turing's death. The chatbot, which had the personality of a teenage boy, convinced a third of the judges that it was human.
In popular culture, "passed the Turing test" has come to mean that an AI can pass for a human. For example, in a 2018 live demonstration, AI engineers used Google Duplex to make a reservation over the phone without the human on the other end realizing they were talking to a computer. Some media outlets reported that Google Duplex had passed the Turing test, but no controlled test had actually taken place.
In the early days of AI research, Turing test questions had to be restricted to specific topics. They also had to be limited to "Yes" or "No" responses. If the questions were too open-ended, the test subject could quickly tell if the answers came from a computer.
Because it's limited to written text and must be conducted in a controlled environment, the traditional Turing test is a very narrow measure of intelligence. The standards for human-like intelligence have shifted over the years, and the philosophical debate over what "intelligence" means continues today.
Instead of trying to convince users that they are talking to another human, the focus of AI engineers has shifted to finding practical applications and improving interactions between humans and AI. Turing's work inspires AI chatbots like ChatGPT and Bing AI, and both have passed the Turing test in controlled settings.
The concept behind the Turing test has been expanded and tweaked over the years to develop a number of variations, including:
No standardized set of Turing test questions exists, partly because programmers could then design their software to pass it. Instead, judges develop their own questions. Some possible questions involve childhood memories and ones that require more creative, lateral thinking (e.g., "Describe yourself without using adjectives").
The Turing test is mostly useful to gauge the progress of natural-language AIs. Ones that pass -or even come close to passing -are considered especially advanced. The test also provides data that develops can use to improve their programs by showing where they have difficulty.