More

vkhuc · on May 6, 2021

Absolutely. This dataset search was first introduced in September of 2018. It was out of beta in January last year: https://www.kdnuggets.com/2020/01/google-dataset-search.html.

vkhuc · on Oct 22, 2015

Although I'm working on deep neural nets, this material is too advanced to me. Looks like deep nets + bayesian reasoning is the next big thing.

imh · on Oct 22, 2015

Pick up this book. It's fantastic.

https://mitpress.mit.edu/books/probabilistic-graphical-model...

draven · on Oct 23, 2015

The title sounds familiar, it's also a course on coursera:

https://www.coursera.org/course/pgm

Last session was in 2013 though.

vkhuc · on Oct 23, 2015

Thanks. That's the book I'm reading :)

vkhuc · on Oct 1, 2015

Why use NLTK when we can use spaCy instead? http://spacy.io/

vasco · on Oct 1, 2015

English only.

vkhuc · on Jan 25, 2015

Nice. As a guy who got involved into OpenNLP's development, I'd like to see the comparison too.

Also, it would be great if you include SENNA into your benchmark: http://ml.nec-labs.com/senna/

vkhuc · on Jan 8, 2015

I think they used the methods described in http://www.cs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf

zwieback · on Jan 8, 2015

Thanks, interesting paper.

vkhuc · on Oct 22, 2014

Yes, Noah Smith's NLP group at CMU is awesome. Btw, Noah is moving to UWash.

vkhuc · on Oct 22, 2014

The whole project is based on various libraries. In particular, the POS tagger itself uses the OWLQN optimizer from Stanford NLP (licensed under GPL).

However, it's possible to remove GPL libraries out of the POS tagger as mentioned here: https://github.com/brendano/ark-tweet-nlp/blob/master/LICENS...

hnriot · on Oct 22, 2014

You may want to look at Factorie (https://github.com/factorie/factorie), that has a decent POS tagger and it's not crippled by the license. It also has dependency parsing which works reasonably well.

vkhuc · on Oct 22, 2014

I've been looking at Factorie for a while but haven't actually done anything heavy with it.

I planned to replace the optimizer in CMU's POS tagger with the one implemented in OpenNLP to make the tagger fully Apache. Unfortunately, so busy right now. Currently, I'm running the tagger on AWS, so the GPL doesn't hurt me much.

BTW, besides the POS tagger, CMU's TweeboParser depends on Turbo Parser which again is licensed under GPL.

vkhuc · on Oct 21, 2014

Great, I've been waiting for the release of TweeboParser. Just before EMNLP14 gets started.

vkhuc · on July 22, 2014

There are some (free) good books that haven't been mentioned yet:

1) "Data Mining and Analysis: Fundamental Concepts and Algorithms" by Zaki and Meira http://www.cs.rpi.edu/~zaki/PaperDir/DMABOOK.pdf

This book covers many ML topics with concrete examples.

2) "Computer Vision: Models, Learning, and Inference" by Simon Prince: http://web4.cs.ucl.ac.uk/staff/s.prince/book/book.pdf

Despite a CV book, the first half of it is like a statistics book that comes with examples in CV which are very easy to follow.

vkhuc · on July 22, 2014

I skimmed over Mohri's book and I think the topics it covers are quite narrow.

For mathematical foundations of ML, I would recommend the book "Understanding Machine Learning: From Theory to Algorithms" by Shai Shalev-Shwartz.

A brief version of the book is available to download on the author's website: http://www.cs.huji.ac.il/~shais/Handouts.pdf

achompas · on July 22, 2014

Yes, Mohri's book takes a strong learning theory approach.

At the same time, it's the only book I've seen that covers online learning well. Can you think of any others?