My hunch is that 90-99% of all Jeopardy questions can be answered with informati...

aristus · on Feb 24, 2011

"properly understood" is the whole point. That's the hard problem they are trying to solve.

sliverstorm · on Feb 24, 2011

Which is especially interesting, considering the other competitors are coming from exactly the opposite direction. They are playing the same game, but facing completely different challenges.

pwnguin · on Feb 24, 2011

But properly understanding what? You can diagram the question all you want, but if you don't know the answer it's pointless, no? Wikipedia is just a source of data.

Given that Jeopardy focuses a lot on literature, I'd also throw in gutenburg project books. And probably a newspaper archive going way back, like the NYT.

gojomo · on Feb 24, 2011

Sure, but it does put somewhat of a cap on the amount of reference material you need to import. And a fairly low cap in the tens of GB: Wikipedia/Wiktionary/WordNet/Freebase/J!Archive is probably enough.

Beyond that, you want software/heuristics. You might find far more data helpful to initially create that software, but once it's created, the reference material to have at hand can come from a small set of sources.

knowledgesale · on Feb 24, 2011

there is a project on formalization of the human "common sense" with (partly) open source database. take a look: http://en.wikipedia.org/wiki/Cyc

jokermatt999 · on Feb 24, 2011

For those who were intrigued by the story of Eurisko (that space fleet battle playing bot that completely crushed the competition and then was impossible to find decent info on) the developer of Eurisko, Douglas Lenat, is the one who started this program.

tel · on Feb 24, 2011

I'm not certain, but I'd bet Watson was explicitly not allowed to use something like J!Archive as training data. For one, the questions used in the Jeopardy games it played were drawn randomly from previous questions. More importantly, though, learning a stilted, domain specific language model to play Jeopardy isn't anywhere near as challenging, impressive, or worth pursuing than generating something that includes Jeopardy as a subset of its capacity.

Now Watson was tuned on Jeopardy questions. I'm sure the learning processes were adjusted in light of mistakes made on the Jeopardy corpus, but interpolation is far less big a deal than a full language model.

gojomo · on Feb 24, 2011

the questions used in the Jeopardy games it played were drawn randomly from previous questions

I've not heard that, and if true, it would have given Jennings and Rutter, both excellent crammers, a knowledge advantage.

Further, human contestants absolutely review the J!Archive before competing, so why wouldn't Watson?

We don't yet know for sure Jeopardy is only one subset of all the impressive things Watson can do. Notably, in the 'Ask Reddit' answers, the Watson team says: "At this point, all Watson can do is play Jeopardy and provide responses in the Jeopardy format."

So it seems like they're trying to claim the accolades for solving a bigger problem, when in fact they've only done well on a very constrained problem.

jamesbkel · on Feb 24, 2011

Can't find the quote this moment, but IIRC the questions(or technically, answers) were drawn from previously prepared questions, but not previously used questions. The point being that aside from eliminating audio/video based questions, these had been designed with humans in mind and there was no tailoring of the content to be "Watson friendly/unfriendly".

That may help explain the confusion.

jsvine · on March 9, 2011

For what it's worth, I scraped J-Archive.com and wrote a couple of articles for Slate Magazine about what I found. More for the purpose of learning about Jeopardy than learning how to win.

- The first piece: http://www.slate.com/id/2284678/

- The follow-up, in which I answer reader questions: http://www.slate.com/id/2287705/