Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know about Twitter, but determining gender (on average) from English language text is pretty much a solved problem in natural language processing.

http://www.hackerfactor.com/GenderGuesser.html

See this paper:

http://u.cs.biu.ac.il/~koppel/papers/male-female-text-final....

Sidenote: this was an epic win for me in Gender Studies class, when I said that not only was the professor's claim that the genders communicate "essentially identically but with some individual variation" wrong, but I would constructively demonstrate it by blind-assigning a stack of any papers she cared to give me. (I also gave her a copy of the above paper, which she read with interest.) The class then retreated from the scary notion of binary decisions on the basis of the scientific method to the more comfortable one of endless arguing whether the differences were socially constructed or biological.



There's a similar service at genderanalyzer.com, based on UClassify's API - http://blog.uclassify.com/gender-text-analysis/.

It thinks Sergey Brin writes like a woman: http://www.genderanalyzer.com/?url=too.blogspot.com


Maybe that ad for 23andme was actually written by Sergey's wife.


What a load of crap. According to this tool, Linus Torvalds writes like a woman, and so do Zed Shaw and Steve Yegge.


But does this work this same way with a 140 character limit since people are often forced to express themselves in a different way?


As long as you have an appropriate training set, I don't see why it would be significantly different. You might run into problems if you train the classifier on some radically different dataset but then try to run it against twitter accounts.


The twitter API gives access to a large number of a user's latest tweets. So it's a series of 140 characters which makes the gender classifier more accurate.


Yeah, but the API doesn't give you access to those users' gender (since Twitter doesn't ask/store that info), which means you have no way to tell the classifier "Here are 500 males' tweets, here are 500 females' tweets."

I suppose you could manually find males and females and train based on their body of works, but it won't be great; you'll likely run into a selection bias.

But if you seeded with that approach, and then used a SpamAssassin-style auto-learner...maybe you'd have a chance?

I suppose this is a case where you don't want "Perfect" to get in the way of "Good enough", especially since it will never be perfect...


That was great. Gender Studies class, huh?

By the way, Gender Guesser gave me a good laugh when I ran it on a comment from HN. It said, "Verdict: Weak MALE Weak emphasis could indicate European."

A scientific basis for the effeminate European stereotype??

Edit: I was joking about the stereotype.


Gender Studies class, huh?

It was a requirement for graduation with a Arts & Sciences degree (I dual-degreed with CS from Engineering) that I demonstrated an interest in subjects covering things other than dead white males. For reasons which were never really clear to me, "I'm an East Asian Studies major!" was not sufficient, so my choices were either Gender Studies or The History of Jazz.

I will say this for required common curricula: if you only take courses which you're interested in or whose built-in intellectual biases are flattering to yours, then you're missing out on a good deal of the educational experience.

I will also say this against required common curricula: there were no gender studies majors in AI class.


Oh, I wasn't criticizing your choice. But surely your experience led you to question certain things.


Thanks for the links!

I find it pretty cool that the first site only does the guessing based on a pretty small list of words - you can see it by viewing the source.

Some words are more commonly male and others are more commonly female - looks as if that's all it is doing.


I submitted this HN post and it says you're Male for both formal and informal.

So, whats the verdict Patio11, are you Male or Female?

Great link nonetheless.


Gender Guesser is less accurate (60%-70%) for HN then a Perl one-liner (>95%):

    say 'Male'


Maybe they integrated that bias into their system?


patio11 is male. I submitted a few text too, its quite accurate.


I have always found it kind of... (weird? An amusing quirk of the culture?) that my HN account is tied very closely with my real identity and yet I'm always addressed by login name rather than real name here. I guess you can file that away as yet another example of the power of defaults.

On other business-related forums, like the Business of Software board, everyone either calls me Patrick or "that bingo guy". That might be because of the display name alone.


Oddly enough, given the post subject, I knew who you were immediately after reading a comment of yours about a week ago (and it had nothing to do about bingo, though it may have been entrepreneurial). I read it and immediately though it sounded like your writing style but didn't recognize the username. Clicked through, and lo and behold, there you were. Just had to register and mention that after your comment.


I figured you had the choice of username, so even though you're Patrick in your profile and if I ever met / emailed you I would use that, you had chosen to be patio11 in this forum.

I also find it odd at times - my username is my full name, yet people responding to me will call me JacobAldridge not Jacob.


Just looked at a few dozen tweets, the use of exclamation points seems to be a dead giveaway.


I hate that!!!!!! It's one of my myriad pet peeves!!!!!!! Along wif spellen like ur a morun, wevver it's on perpose or not!!!!! It makes you look like a drama queen!!!!!!! Which, to me at least, is a huge insult!!!!!!!!!!!!!! And it's some inane piece of misspelled, new-age bullshit far too often!!!!!!! Shock!!! Horror!!!!!!! Shock!!!!! Horror!!!!!!!! <slap>

Sorry. Has anyone built the total perspective vortex yet?


that is a neat link. thank you for posting it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: