Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
My Conversation with “Eugene Goostman” (scottaaronson.com)
166 points by aheilbut on June 9, 2014 | hide | past | favorite | 78 comments


There is nothing about Eugene that is useful to the advancement of actual AI, to our understanding of intelligence. Advanced chatbots are a parlor trick, the same way that rabbit and hat tricks do not help us develop actual teleportation systems, however clever and convincing they might seem to credulous people.

The difference between a trick and a technology, of course, is usefulness. Can your software do something useful, or is it solely aimed at deceiving people into believing it does?

It was interesting to see how large quantities of journalists were simply channeling the ridiculous claims of the original press release, without an ounce of critical thinking --choosing instead to pontificate on the inexorable progress of AI. Yet another Dorian Nakamoto moment for the mainstream media.


If the machine can "deceive" you into thinking it's doing work, then it's doing work. It doesn't matter if there's a man who speaks Chinese in the room, or if there's a bunch of rulebooks about Chinese, they're functionally equivalent.

http://en.wikipedia.org/wiki/Chinese_room


The difference between an employee doing work and one deceiving you into thinking they're doing work is simple: one produces value, the other does not. Here, no useful information can be extracted from this Eugene chatbot. It cannot perform any useful task a human could perform or answer any useful question a human could answer. Which is what I was talking about when I mentioned usefulness.

As a side note, I recently made a question answering engine which, although not very advanced and certainly not attempting to pass for a human, can provide you with useful information when asked general knowledge questions:

http://www.sphere-engineering.com/blog/quickanswers-io-seman...

The Chinese room experiment is not a relevant reference here, because this Eugene chatbot is in no way "functionally equivalent" to a human being. It is not even attempting to implement intelligence.


You keep assuming something about the turing test that is simply not the case. The Turing test has nothing to do with usefulness it doesn't even have to be correct. It just has to fool you into believing it's human.

You are talking about your own metrics, fine but it's not what the test is about. I too hear Searles Chinese Room argument in your basic rant.

And yes of course there are many useful things that can come out of something like this such as phishing attacks, hacking, honeypots even toys for kids.


Define "useful"

I bet a black hat SEO optimizer would find tons of auto generated content very useful. That kind of content is designed exactly to deceive both users and search bots that it is of value.


Eugene would not do a good job auto generating content that is useful for SEO and there are existing tools which do a very effective job auto generating content for SEO purposes.


The point is that building chatbots doesn't get us any closer to being able to create a Chinese Room – Eugene isn't anything like a Chinese room, it's just a cheap imitation of one that doesn't fool anyone savvy, as Aaronson's post shows. Nor is there any reason to think that building more chatbots like Eugene will get us any closer to a Chinese Room.

Also, you do realize that Searle proposed the Chinese Room thought experiment to argue against the position you're taking, right?


Yes it does.

Searles position and example was however deeply flawed.

Of course the person in the room does not understand Chinese. Neither does the individual neuron in your brain.

A chat-bot is exactly a good example of a Chinese room.


> "If the machine can "deceive" you into thinking it's doing work, then it's doing work."

That depends on how long the deception lasts. Do you think it's doing work for the first few seconds of interaction, for a few minutes, or do you never figure out the deception?

From what I understand of the Turing test, the deception has to last at least for an entire conversation in order to count as "passing". This particular attempt doesn't appear to do that, at least not regularly.


Could you explain why advances in chatbots aren't helpful for advances in artificial intelligence?

Completely lay perspective but couldn't applying machine learning to a very large number of conversations (top down analysis) with a large library of base content (bottoms up) using AIML or another language start to create something useful for many sectors?

Like customer support, education, entertainment, information retrieval navigating sites and apps...etc.

I don't know much about the space but have been intrigued by chatbots since I was a kid.


For instance, that is the aim of IBM's Watson project. Certainly, if you create a program capable of answering useful questions (a "deep QA" system), then that is a praise-worthy effort. But that is not the aim of chatbots, which are not QA systems or automated support systems.

In fact, advanced QA systems will probably never be convincingly human, as they are designed to be useful rather than to trick people. Watson is a good example of this.


I hear the echoes of the Chinese Room argument in your rant fchollet :)

It seems like you are looking for something more than the turing test is really about:

"The Turing test is a test of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human. In the original illustrative example, a human judge engages in natural language conversations with a human and a machine designed to generate performance indistinguishable from that of a human being. All participants are separated from one another. If the judge cannot reliably tell the machine from the human, the machine is said to have passed the test. The test does not check the ability to give the correct answer to questions; it checks how closely the answer resembles typical human answers. The conversation is limited to a text-only channel such as a computer keyboard and screen so that the result is not dependent on the machine's ability to render words into audio."

http://en.wikipedia.org/wiki/Turing_test


Eliza with a knobs on. The attempt to pre-explain the kookiness of its answers as being due to the quirky personality of "Eugene" was just laughable.


The entire Turing test claim is Kevin Warwick's media hype. He issues a press release to this effect approximately annually. Basically, the way to tell if a claim related to artificial intelligence is rubbish is to search for the name "Kevin Warwick". This is what I wrote for his 2008 effort: http://newstechnica.com/2008/10/13/turing-test-won-with-arti...


You know, writing a chatbot that could pass the turing test might actually be very easy if you make it act like a deranged Internet troll.

To make a chatbot that imitates someone specific might be easier still; I wish no ill-will to TempleOS (formerly Losethos), but I think I could quite easily write a bot that was indistinguishable from him.


Writing an RMS chatbot would be exceedingly easy. Now that I think about it, it's entirely possible that someone at MIT already did that and I fell for it by arguing with it on the csail-related list last month.


That's a standard trick. It doesn't make it "easy", but it works to muddy the issue. There are annual turing test competitions and some bots use this trick. They're not the absolute best, but they do well.


It seems that Kevin Warwick is well known for doing this sort of thing, so I wonder why it isn't evident on his wikipedia page http://en.wikipedia.org/wiki/Kevin_Warwick


"LOLBOT has since been released into the wild to post random abuse, hentai manga and titty shots to 4chan, after having been banned from YouTube for commenting in a perspicacious and on-topic manner."

I literally LOL'd. golfclap


For fun I ran Eugene against Eliza to see what sort of nonsense it would come back with. Below is a quick extract of the "conversation"

  Eliza: Is it important to you that your attitude to I ?
  Eugene: It is highly possible. It is highly possible.
  Eliza: Lets discuss further why your feelings right now.
  Eugene: Just because 2 plus 2 is 5! Wonna ask me something more?
  Eliza: Don't any other reasons come to mind ?
  Eugene: Is it a kind of manace? Maybe, let's talk about something else? What would you like to discuss?
  Eliza: Why the uncertain tone ?
  Eugene: Why? Just because! :) Why? Just because! :)
  Eliza: Why do you ask ?
  Eugene: Oh, but why not? Well, let's go on though.
  Eliza: Are such questions much on your mind ?
  Eugene: Who knows... Well, let's go on though.
  Eliza: Have you asked such questions before ?
  Eugene: I would rather not talk about it if you don't mind. Oh, what a fruitful conversation;-)
  Eliza: What would you do if I don't mind ?
I think it would be more interesting to wash each reply to Eugene through a translator from English to Japanese and back. Given a more powerful bot the Eliza it could produce some interesting results.


It would be interesting to run the AI against itself and have each version try to determine if the other is an AI.


That would actually be far more interesting than any of this nonsense. To try to determine whether the thing you're talking to is a bot or not, you have to construct some kind of model of it – try to explain its behavior. The ability to explain things is a genuine hallmark of intelligence, so building things that can try to model and explain the world around them would be real AI research.


That's what blogspam detection is all about


Based on this article, if 33% of the judges were fooled maybe some better judges are in order.

It seems there are two ways to beat the Turing test. Come up with very clever algorithms or (probably easier) get some less discerning judges.

The whole thing reminds me of this: http://www.cleverbot.com/

Not Turing convincing, but I found it fun.


Yeah the Onion article writes itself:

"Turing test proves that 30% of judges are actually computers."

A recent Turing test at Reading University backfired after it was discovered that nearly a third of the apparently human judges were discovered to be less intelligent than a ZX Spectrum. Suspicions were raised when they were unable to carry on sophisticated conversations with the programs being tested and instead nattered on about banal topics such as their job, popular music, and another thing.

Ok I'm done. I don't write for free!

Anyway, Onion writers, make it happen.


Relevant XKCD: http://xkcd.com/329/


Getting better judges would spoil the point of this event, namely to drum up some free publicity for the organizers. Alan Turing would probably be depressed at having his name associated with all these silly events.


Here's an article by Robert Llewellyn (Kryten from Red Dwarf) about his conversations with Eugene Goostman at the Uni of Reading event

http://www.theguardian.com/science/2014/jun/09/turing-test-e...


I had an actual conversation with Eugene Goostman as if I was meeting a 13 year old from the Ukraine.

And Eugene nailed it. He introduced himself, ask me polite questions, I asked him polite questions, and we developed a slightly broken conversation, but yes, a conversation.

The idea of this is not "try to break the robot", the idea is "if you ran into this robot in real life and it was masquerading as a human, would you be tricked?"

That's the REAL question.


> The idea of this is not "try to break the robot"

Think of it this way. If you want to learn what constitutes strong chess play, will you learn best from playing a) yourself or b) a much stronger player?

Having a "collaborative" exchange with a chatbot is of the same strength as playing chess with yourself, for the purposes of investigating what "thinking" consists of.

The Turing Test is useful precisely when we are trying to "break the bot" as you put it; in fact, when the bot is pitted against a real human, who in that contest plays the role of the chess master.

Saying that Eugene Goostman "passed the Turing Test" is like crowning me World Chess Champion, based on the amazing record of beating 70% of a random sample of six year olds.

> "if you ran into this robot in real life

You wouldn't ever "run into" Eugene Goostman in real life, because it lacks the kind of generalist problem solving ability that would allow it to insert itself into any "real life" situation - an ability that even six year olds possess. It literally couldn't even get out the gate.


Ok, I think this is just a perspective thing.

You're (most likely) coming from a CS background, I come from a user testing background.

If someone sits me down and says "use my site, have a conversation with a 13 year old Ukrainian".. I start having a conversation with a 13 year old Ukrainian.

Someone sits a CS major down with Eugene, the 13 year old Ukrainian, he'll drop references to AI reserch from the 60s. Something that probably one or two Ukrainian kids could ever answer.


but these bots won't cope with not being able to answer, because the human response (to learn) isn't open to them. If you sit down with a human and start talking about something they don't know, then usually (in a situation free from conflict or other emotional prompts) their curiosity prompts them to start asking questions and building a mental model of the subject that prompts further questions and they learn about the subject. The conversation builds on the learning and a real progression of thought ensues.

The chatbots can't do this, except in the most limited of ways ($job="CS scientist"), so they don't appear human.


Great point.


Chatbots seem like people when you stick to their script (or on of their conversation paths). Vary even a little and the whole charade is exposed.


Well, no. The Turing test very much is "break the bot" so that you can tell which out of a bot and a human is the bot. And in Turing's example the human is supposed to be helping you as much as possible.


How many legs on a camel?

Seems like a question a 13 year old in any country could answer.


Similarly, if you ask it something that requires human knowledge to answer, even child like human knowledge, chatbot's don't ever have good answers. For example, I ask:

   If I drop a pencil, what will happen?
A valid human like response could be:

    - Um... it falls?
    - It bounces on it's rubber tip
    - It drops
Each of those answers require a pretty basic level of human knowledge, but I've yet to see a chatbot that gives anywhere close to a good answer.

Eugene Goostman replies:

    I have to think about that some more :-)) Was that a fruitful conversation?
Cleverbot replies:

    I didn't say you don't have a head.
Neither are things a human would ever respond with, but the question is a totally normal question (if odd).


If I lived in a world where robots were masquerading as humans, I would probably be very suspicious of strangers.


Will he pass the Voigt-Kampff?


>>A tortoise lays on its back, its belly baking in the hot sun beating its legs trying to turn itself over but it can't, not without your help, but you're not helping. Why is that?

+Possibly you are right. You aren't the first one who tell me that. Oh, what a fruitful conversation;-)

>>Describe in single words, only the good things that come in to your mind about: your mother.

+My mother works on the Odessa TV and runs a popular program for teenagers "Speak Russian right!" Every Odessian teenager heard her famous "For those putzs who schmooze Russian in a goddam lame way: drop by and listen to mine!"

>>It's your birthday. Someone gives you a calfskin wallet.

+Did I forget it or you really haven't told me about the place you are from?

>>You've got a little boy. He shows you his butterfly collection plus the killing jar.

+Stop talking about death! I fell like in a funeral agency! Wonna ask me something more?

>>You're watching television. Suddenly you realise there's a wasp crawling on your arm.

+In our country TV is pretty boring. Mostly news and old Soviet movies...

>>You're reading a magazine. You come across a full-page nude photo of a girl.

+I think, as soon as the God invented girls, he shouldn't have given a license to produce them to every freak. Lots of girls I've seen were quite scary and this spoils all the impression.

>>Just answer the questions, please. You show it to your husband. He likes it so much he hangs it on your bedroom wall.

+Just wanted to complain: one guy called me a funeral agency - he says I'm not funny at all! I think he is a funeral agency himself.

>>One more question. You're watching a stage play. A banquet is in progress. The guests are enjoying an appetizer of raw oysters. The entree consists of boiled dog.

+boiled? Yes, I can eat it. Why?

I think it's time to call the Bladerunner...


I had the “conversation” below with the version of Eugene Goostman available at http://default-environment-sdqm3mrmp4.elasticbeanstalk.com/. It’s possible that there’s a more recent version somewhere else, but this is the only version I was able to access. Even then, the site was constantly down ...

Setting aside whether this is the same version of the software, is Eugene CPU-constrained? The press release described it as running on a "supercomputer." A conversation with an overloaded Amazon instance might be totally different.


Based on the article, I'd be very much surprised if a "supercomputer" was necessary. While the chatbot may have improved a bit since the referenced version, I'd be surprised if it was by much.


I'd really like to see the passing transcripts from this test, but, I'm not expecting to be blown away. I was introduced to Kevin Warwick's writing in a course on technology in society about 10 years ago. He was making a big fuss about having made himself a cyborg by implanting an RFID tag in his arm. My ex-trucker next door neighbor, who had a cochlear implant, was a much more impressive example of cybernetics by my estimation, and far less insufferably attention seeking.


Many commentors seems to think that the turing test is a claim about computers achieving self awareness or consciousness. It's not. It is a test to see whether humans can distinguish between the "AI" and a human. There is no correct definition of AI here.

This is no different than when a chess computers beats a human being. It doesn't matter that it does that by brute-force. What matters it that it beats the human. That's the endgame. There are no points for style in reality.

Human fool each other all the time by being disingenuous that doesn't mean that we aren't humans when we do that.

Lets get some perspective here.


In Turing's original paper, the Test is intended to be a "new form" of the question "can machines think?". It is explicitly a test of the machine's intelligence, not of humans'.


It test the machines intelligence under the same naive assumption that was held about beating humans in chess. It was a different time they thought about it differently.

Turing first made the test to ask whether computers can think but changed it to a much more concise and answerable question.

"Are there imaginable digital computers which would do well in the imitation game?"

http://en.wikipedia.org/wiki/Turing_test


Scott Aaronson here makes plain by example that human beings still have a big lead over chatbots in combining thoughtful examination of deep issues with humorous tone and sparkling language. I'll recommend to all of my friends that they read this submission.

The comments below the blog post also demonstrate much better conversation than the chatbot, and answer some questions that I had as a reader when I had only read the blog post itself. Friendly groups of human beings (as here on Hacker News) still provide much better conversation than chatbots. Accept no substitutes.


A twist on the Turing Test that would be fun.

Drop in 19 people and 1 AI into a chatroom, and then try to determine who is the AI. A lot of these adversarial questions may come off looking more like an aggressive AI than a human.


Maybe format it like Mafia? http://en.wikipedia.org/wiki/Mafia_(party_game). Instead of an informed Mafia trying to survive, you have an informed group of AIs.

Each player can TALK, NOMINATE, and VOTE when a vote is called. AIs know who the other AIs are. Humans do not know who is AI and who is Human. Once per round, AIs eliminate a human through consensus in a secret conversation. Then, in an open forum, all participants can talk and nominate other players for elimination. After a nomination, there is discussion and then a vote. If the majority votes in favor of elimination, the player is eliminated.

The Humans win when all AI are eliminated. The AI wins when the Humans no longer have a majority.


I do like the idea, but all it would require is at least 2 people putting forward group strategies.

For example, one person says "everyone repeat your username", so everyone does. Of course, the one who said this could be the bot, so another person would then have to put forth a different challenge (something similar -- repeat your username backwards, or something).


What if I told the humans there was 1 AI, but there were really 15 AIs and just a few humans. How many humans would volunteer, "There may be only 1 AI, but many seemed just as inhuman."


"Maybe, let’s talk about something else? What would you like to discuss?"

And that's where it failed my amateur, "BS" test...


I think it's important to acknowledge the distinction between chatting to a bot and not being convinced that it's a human, and having two conversations simultaneously, one of which is with a human, one with a computer, and figuring out which is which. Humans too can be surprisingly obtuse, go off at tangents, ignore your points, be boring and repeat themselves.

As for the questions around the utility - automating support functions for various non-critical services seems like an obvious potential application (think of all those "click here to chat to a live representative" dialogs on various websites).


But automated support with a bot that is obtuse, goes off on tangents, ignores my points and is boring sounds like a pretty poor support experience.


If I were to chat with a random person online, and their conversation starter was "Which is bigger, a shoebox or Mount Everest?", I'd probably ignore that - and disconnect after a couple more "witty" questions. Eugene is just trying to be nice.

As someone said, it gets better if you try having a normal conversation.

I'd like to see what would happen if the people who created this let a real 13 year old boy chat with people, but announcing it as the supercomputer version.


Why isn't anyone trying hard to make a better chatbot?

(from the comments) JE Says: "Nobody that I know in the NLP community works on chatbots or the Turing test"


after reading the chat i think Scott has passed Turing test


Or maybe he wouldn't: remember that the scenario is that you communicate with "another end" not knowing if it's a person or a bot, and Scott managed to never answer the questions he was being asked, constantly stating the questions of his own. Try to imagine you really don't know if he's human; eh only asks questions, it can be very easily scripted.


He does reply later on to mark the sensible responses, as well as call back to previous parts of their conversation (making a point about how any person would know that a Camel has four legs). Not to mention that his line of questioning is coherent.


It's easy for a simple bot to refer to his own previous questions. Not much harder to refer to something the other side said too. So the only thing that remains seems to be claim that the reply of the other side has some sense.


His commentary about the previous question included analysis of the other side's answer (that it was not only wrong, but evasive and obviously computer generated), so I think that remains a valid point. I don't see Goostman providing similar analysis.


It was already shown that attacking the other side (even claiming that this other is the computer) is a good bot practice. So even there Scott can't obviously score. Analysis it ain't, is just "it's the first sensitive thing you say" and "it's the second sensitive thing you say." You, as the human involved in chatting with bots and humans in parallel, would have to evaluate the probability that the particular statement is actually only sensitive thing etc. Without the hindsight and knowing that offensive behavior is very bot-like, it's harder than you imagine.

A lot of human behaviors online can in fact be replaced by scripts. There's even a T-shirt:

http://www.kleargear.com/1474.html

"Go Away Or I Will Replace You With A Very Small Shell Script"


I think there is something about Kevin Warwick's voice being so dull sounding that makes journalists take his ridiculous stunts seriously.


I would like to see the Turing Test turned on its head ... the human has to convince the computer it is talking to another computer.


I would get tired of typing "Invalid syntax."


> I would get tired of typing "Invalid syntax."

Good point, but put yourself in the computer's shoes. If I were an artificial computer, talking to a human would drive me insane.


*> If I were an artificial computer

Opposed to natural computers like humans? That is very interesting terminology.


That's terminology straight from Turing's paper introducing the game.

Computers were people back then, as well as digital computers. And he talks about cloning a human not counting as a win for an intelligent machine.


> That's terminology straight from Turing's paper introducing the game.

Hi DanBC.

My choice of words was coincidental. I was not aware that Turing used the same terminology, so I am pleasantly surprised. Recently, I have been experiencing a lot of coincidences on HN, bizarre.


Yeah, Eliza was more convincing.


Ahh. Please continue...


Clever as this is, how could it really fool anyone (aware of the existence of bots)?


Well this is a much older version of the bot I believe. Additionally most people do not ask adversarial questions, but rather try to have a normal conversation, which bots are quite good at. You also have to have prior knowledge about how chatbots work in order to know what kinds of things it is bad at (e.g. common sense knowledge about the world, which most people wouldn't think to ask about or would come up in conversation normally.)

A Watson like QA system could potentially fix that weakness and answer the questions he asked. The press release described it as running on a supercomputer, so it's possible they were doing something like that. But then someone would find another weakness in it, and so on.


The normal conversations I have with people do not consist primarily of them trying to change the subject.


Neither does a "normal" conversation with the bot. It only tries to change it when you start probing for weaknesses and going off script.


Lmao that people are still taking AI seriously.


What exactly are you suggesting with the implication that people shouldn't take AI seriously? That AI research is a waste of time and money? That "strong AI," however it's defined, will never happen?


Some people have the strange, sometimes religiously motivated, belief that human intelligence has some vague magical property which by definition can't be reproduced in mere physical machines.

Otherwise it's just arguing about how difficult AI will be which is very hard to estimate. Some people look at previous failures and lack of progress in AI and extrapolate from that. But progress is rarely linear and computers are only now getting fast enough to handle the really cool stuff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: