Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think just getting the data is the hardest part; there are several corpora available in academia, but not so much outside it. Although you wouldn't need the full text of the paper, just the bibliography entry, which could be much easier to get. (My PhD research was initially on scientific publication citations but the corpus we had was small, ~200 documents.)

Scientific papers are also interesting because you can use co-author relationships as a secondary graph. You could probably infer all kinds of things from that!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: