IBM’s Newest AI Can Probably Argue Better Than You

IBM shows off an artificial intelligence program that can engage in a debate, possibly pointing the way to the future of talking machines.
Image may contain Door Clothing Apparel Human Person Overcoat Coat and Suit
A display representing IBM's Project Debater and Israeli debater Dan Zafrir, ahead of Monday's debate.TED CHIN/IBM

“Fighting technology means fighting human ingenuity,” an IBM software program admonished Israeli debating champion Dan Zafrir in San Francisco Monday. The program, dubbed Project Debater, and Zafrir, were debating the value of telemedicine, but the point could also apply to the future of the technology itself.

Software that processes speech and language has improved enough to do more than tell you the weather forecast. You may not be ready for machines capable of conversation or arguing, but tech companies are working to find uses for them. IBM’s demo of Project Debater comes a month after Google released audio of a bot called Duplex booking restaurants and haircuts over the phone.

IBM’s stunt Monday was a sequel of sorts to the triumph of its Watson computer over Jeopardy! champions in 2011. Project Debater, in the works for six years, took on two Israeli student debating champions, Zafrir and Noa Ovadia. In back-to-back bouts each lasting 20 minutes, the software first argued that governments should subsidize space exploration, then that telemedicine should be used more widely.

Debater was represented by a freestanding black display roughly the height and width of a person. In each debate, the system’s mellow, synthetic, female voice made a four-minute opening statement, before responding to its opponent’s own opener with a second four-minute spiel, and letting them respond in kind. Each competitor then got another two minutes to sum up.

Anyone who has spent much time talking to Siri or Alexa will appreciate the challenge taken on by researchers at IBM’s labs in Haifa, Israel. Although speech recognition has become fairly reliable, the nuances of language are extremely challenging for computers. Conversation where each utterance expands the complexity of the interaction has proven a particular challenge.

But a solid performance by Debater Monday showed how—in carefully designed scenarios—computers that talk are ready to do much more for us. In informal polls of the audience that included journalists and IBM staffers, Debater was rated more informative than the human on stage in both debates. Despite having no need for healthcare, it swayed more audience members to its arguments for telemedicine than its flesh-and-blood opponent.

Debater’s strategy is built on two core components. The software can pull sentences and quotes to support its position from a corpus of hundreds of millions of documents. It also has a framework of pre-built arguments, and even jokes, that it seeks opportunities to deploy. At one point, the system prefaced its response to Zafrir by noting that its blood would boil if it had blood.

Chris Reed, a professor of computer science and philosophy at the University of Dundee not affiliated with the project, watched Monday’s debates. He told WIRED afterwards that they showed technology of impressive maturity. Reed liked how Debater sometimes tried to preempt an opponent’s arguments in advance, a strategy known as procatalepsis. “She might say that subsidies are needed, but not for space exploration,” Debater said of Ovadia at one point, “If she can present data on what is needed for subsidy I would love to see it.”

Debater didn’t perform perfectly. It didn’t seem capable of rebutting opponents’ claims in the precise way a human could, and failed to register some of the points fired at it. The system’s case for space exploration included a non sequitur claiming that nuclear-powered space exploration is intended to quash concerns about nuclear weapons in orbit. A Google search suggests the claim originates from an op-ed in a British socialist newspaper, the Morning Star. The facts and figures that IBM’s algorithmic arguer recited—apparently sourced from news articles—sometimes lacked necessary context, such as what country or region might expect to save a particular dollar amount quoted.

Despite those glitches, Debater’s performance suggested such technology could make its way into the lives of consumers and corporations, whether via IBM or its competitors. As with Google’s Duplex demo, it showed that computers can do powerful and surprising things with language in a carefully constrained situation.

That Debater struck audience members as more informative than human debaters might presage assistants like Alexa doing more than recalling facts from Wikipedia. Imagine asking for a comparison of the pros and cons of Hawaii and Fiji as vacation destinations, or an explanation of different views on a topic in the news.

Reed, the Dundee professor, says he can imagine technology like Debater’s software helping people evaluate claims made in fake news online. He can also see IBM and others creating virtual assistants that contribute to discussions among executives or lawyers in board rooms or even courtrooms. IBM says it has tested some Debater technology to advise investment professionals, and is interested in having it help politicians consider the different sides to tricky issues.

The company also plans to keep working on the existing version of Debater as a research challenge. If you find yourself having to take it on, Reed suggests employing irony and sarcasm. “I think it wouldn’t be difficult to bamboozle if you wanted,” he says.


More Great WIRED Stories