Hacker Newsnew | past | comments | ask | show | jobs | submit | vkhuc's commentslogin

Absolutely. This dataset search was first introduced in September of 2018. It was out of beta in January last year: https://www.kdnuggets.com/2020/01/google-dataset-search.html.


Although I'm working on deep neural nets, this material is too advanced to me. Looks like deep nets + bayesian reasoning is the next big thing.



The title sounds familiar, it's also a course on coursera:

https://www.coursera.org/course/pgm

Last session was in 2013 though.


Thanks. That's the book I'm reading :)


Why use NLTK when we can use spaCy instead? http://spacy.io/


English only.


Nice. As a guy who got involved into OpenNLP's development, I'd like to see the comparison too.

Also, it would be great if you include SENNA into your benchmark: http://ml.nec-labs.com/senna/


I think they used the methods described in http://www.cs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf


Thanks, interesting paper.


Yes, Noah Smith's NLP group at CMU is awesome. Btw, Noah is moving to UWash.


The whole project is based on various libraries. In particular, the POS tagger itself uses the OWLQN optimizer from Stanford NLP (licensed under GPL).

However, it's possible to remove GPL libraries out of the POS tagger as mentioned here: https://github.com/brendano/ark-tweet-nlp/blob/master/LICENS...


You may want to look at Factorie (https://github.com/factorie/factorie), that has a decent POS tagger and it's not crippled by the license. It also has dependency parsing which works reasonably well.


I've been looking at Factorie for a while but haven't actually done anything heavy with it.

I planned to replace the optimizer in CMU's POS tagger with the one implemented in OpenNLP to make the tagger fully Apache. Unfortunately, so busy right now. Currently, I'm running the tagger on AWS, so the GPL doesn't hurt me much.

BTW, besides the POS tagger, CMU's TweeboParser depends on Turbo Parser which again is licensed under GPL.


Great, I've been waiting for the release of TweeboParser. Just before EMNLP14 gets started.


There are some (free) good books that haven't been mentioned yet:

1) "Data Mining and Analysis: Fundamental Concepts and Algorithms" by Zaki and Meira http://www.cs.rpi.edu/~zaki/PaperDir/DMABOOK.pdf

This book covers many ML topics with concrete examples.

2) "Computer Vision: Models, Learning, and Inference" by Simon Prince: http://web4.cs.ucl.ac.uk/staff/s.prince/book/book.pdf

Despite a CV book, the first half of it is like a statistics book that comes with examples in CV which are very easy to follow.


I skimmed over Mohri's book and I think the topics it covers are quite narrow.

For mathematical foundations of ML, I would recommend the book "Understanding Machine Learning: From Theory to Algorithms" by Shai Shalev-Shwartz.

A brief version of the book is available to download on the author's website: http://www.cs.huji.ac.il/~shais/Handouts.pdf


Yes, Mohri's book takes a strong learning theory approach.

At the same time, it's the only book I've seen that covers online learning well. Can you think of any others?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: