Are you suggesting that new submissions route through Mechanical Turk?
LOL.
Once approach might be that when humans detect dupes, they could be reported. Click the "dupe" link, specify the URL(s) of the dupe(s), and submit. The oldest submission "wins", and the data could be used to train a bayesian dupe detector. I imaging that you could start with a URL text match (it's the ends of the string that tend to be different), along with a check of the <title> for the supposedly dupe page, and maybe the first 128 characters of the story text or something.
LOL.
Once approach might be that when humans detect dupes, they could be reported. Click the "dupe" link, specify the URL(s) of the dupe(s), and submit. The oldest submission "wins", and the data could be used to train a bayesian dupe detector. I imaging that you could start with a URL text match (it's the ends of the string that tend to be different), along with a check of the <title> for the supposedly dupe page, and maybe the first 128 characters of the story text or something.
It actually sounds like a fun project.