IBM's Watson: wonder of the analytics age

When fans of Jeopardy!, America's favourite TV trivia game show, tune in this Valentine's Day they might wonder if they have mistakenly entered the Twilight Zone. At the center podium, where the smiling face of an expectant contestant would normally be, they will see a refrigerator-sized black box, silent and unassuming save for the luminous avatar at its core - a brain-like armillary sphere orbited by a score of glowing satellites. On February 14th, for the first time since Jeopardy!'s 1964 debut, two human contestants will compete against a machine. This is no hoax, incredulous readers. This... is... Watson!IBM's latest breakthrough in Artificial Intelligence (AI) that in a few days will make history. If Deep Blue, the computer that beat Garry Kasparov at chess, was AI's sputnik, Watson is about to become its moon landing.

Here is a Q&A on this prodigy of the analytics age.

What is Watson?

Watson is the answer to the Grand Challenge taken on by Dr. David Ferrucci and his team of colleagues at IBM to build a computer system that could champion the best players of the popular quiz show Jeopardy!. Watson is a computer system that unites massively parallel hardware and highly sophisticated question-answering technology (DeepQA). Watson's guts are a cluster of IBM Power 750 servers stacked on top of a cluster of POWER7 cores; Watson's mind is an Apache Unstructured Information Management Architecture. The combined processing and programming machinery enable Watson to rapidly make sense of unstructured natural language in real-time over an inconceivably sweeping range of content. In other words, it can understand what the questionmaster says to it and it can answer him back. Whether the response is the right one, we eager viewers shall have to wait to find out.

Watson is the namesake of IBM founder Thomas J. Watson, Sr., not the sidekick of Sir Arthur Conan Doyle's invention, who, unlike this marvel of DeepQA, was notoriously at a loss.

What does Watson do?

Plays Jeopardy! brilliantly. Watson can take an expression in idiomatic, often playful and punning English and, in under 5 seconds, correctly return the question which this expression answers. Since, for any round of play, answer categories can be as far removed as "US Presidents" and "Chicks dig me", the scope of Watson's knowledge is immense.

How does Watson do it?

The DeepQA technology behind Watson follows a scientist's model. When a clue is presented, Watson turns into a hypothesis-generating machine. Hundreds of candidate answers are proposed from all the possible leads the language and context of the prompt suggest. This productive stage is followed by a phase of extensive evidence acquisition and experimentation, in which candidate clues are fed back into the prompt and then tested for their plausibility. The troves of evidence are entirely self-contained. No cheating by dipping into the web cloud. Machine learning algorithms use the results from the experimental stage to separate the wheat from the chaff, assigning confidence scores to each hypothesis. Unfortunately for statisticians, the details of this complex network of algorithms, like Google's ranking system, are not available to the public. The IBM team has expressed plans to publish technical papers about their methodology.

During the televised challenge, viewers will get a peek into the mystery of Watson's inner workings. With each clue, a visual display will appear showing Watson's top three hypotheses for the correct answer. The horizontal bar and a percentage value will indicate the computer's confidence that the given candidate is the right choice. Each graph will also have a vertical line marking the certainty threshold that must be met in order for Watson to be willing to buzz in. The level is always 50% or greater but varies with the game conditions. Viewers can have fun trying to discern Watson's trigger strategy by noting how the confidence threshold changes with player earnings and the number and values of questions remaining.

Why was making Watson the Next Grand Challenge for IBM? Wasn't the Man-against-Machine problem already solved with Deep Blue? Watson is not a rehash of Deep Blue. In accomplishing the goal of mastering the game of chess, Deep Blue had to be excellent at essentially one task--determining the next optimal chess move given current game conditions. Because the game of chess follows well-defined rules, a solution could be determined by mathematical logic. The feat Deep Blue designers achieved was constructing a machine with sufficient processing power to quickly enumerate a huge set of game scenarios and then search for the best strategy.

Watson's task is far more open-ended and substantially more complex. To meet the Jeopardy! challenge, Watson had to overcome the hurdle that has been the most insurmountable for computers: Watson had to learn to think more like a human. In the end, the machine-based solution IBM researchers devised is a brain that works like a dynamic über-Wikipedia, capable of returning meaningful responses on-the-fly to almost any natural language query. Though Watson's cognition is ultimately distinct from man's, the system's intellectual feats might still provide insights about how humans process information and make decisions in the face of uncertainty.

I will let readers ponder the philosophical question of whether we should ask Who not What is Watson?

What must Watson do to win?

IBM researcher's measured top Jeopardy performance by plotting the correct response percentage against the buzz in frequency based on the winner's performance for over 2000 Jeopardy! games. When they highlighted the data for the best performers, they found that these players' stats clustered in the region corresponding to a 40-50% buzz-in rate and an 85-95% response accuracy. They called this region the "winner's cloud".

But Watson's competitors are not just any former winners. Watson will be pitted against Ken Jennings and Brad Rutter, the two greatest money earners in American game show history. Looking at Jennings' performance stats alone shows that, to win, Watson will have to skyrocket beyond the winner's cloud. In Jennings' 74-game winning streak he had an accuracy of 92%, won the race to the buzz 62% of the time, and had an average of 35 correct responses per game out of 60 possible, excluding the final jeopardy round.

Will Watson win?

Beginning with the 2/14 show, Watson, Jennings and Rutter will face-off on three consecutive nights. Although this leaves the possibility for the Jeopardy! challenge to end with an even draw, there is reason to expect a 3 to 0 machine-versus-man victory. In a 15-question practice round for game day, the buzz was won by Watson 7 times; 5 and 3 times for Jennings and Rutter. No player gave an incorrect response. Speed and wagering strategy will likely be the definitive factors in deciding the champion of the challenge.

Want your own Watson? It took over four years of around-the-clock work of a dedicated team of 20 IBM engineers to build Watson. Labor costs alone must have been in the tens of millions of dollars. Current labor and hardware expenses mean that a personal myWatson is unlikely to appear on Amazon any time soon. Smaller-scale implementations of Watson's technology have been adopted by some institutions and businesses. For example, biomedical researchers at Rice University are using a POWER7 system to perform computationally-intensive procedures in genomic sequencing, protein folding and drug modeling.

Anxious to meet Watson? For an early introduction before the big show, watch NOVA's special on Watson, The Smartest Machine on Earth, premiering on 2/9/11.


Related Reading

  • Ferrucci, David, et al. (2010). Building Watson: An Overview of the DeepQA Project, AI Magazine, 59-79.

  • Crowley, Jason, Curley, Brenna, and Osthu, Dave (2011). What is Jeopardy? A graphical exploration from 1984-2009, Chance, 23 (4). 

  • Thompson, Clive, (June 16, 2010). What Is I.B.M.’s Watson? New York Times Magazine.