darkrho's comments

darkrho · on Feb 5, 2016

Try it out via conda!

  conda install -c scrapinghub/label/dev scrapy

darkrho · on Jan 20, 2016

There is ScrapyRT: http://blog.scrapinghub.com/2015/01/22/introducing-scrapyrt-...

In the project I work on we do have the usual periodic crawls and use ScrapyRT to let the frontend trigger realtime scrapes of specific items, all of this using the same spider code.

Edit: Worth nothing that we trigger the realtime scrapes via AMQP.

darkrho · on Jan 2, 2014

Years ago I've been asked: There is a file named "-foo", how do you delete it?

darkrho · on Jan 2, 2014

Do you have any example problem to share? I've taken a numerical analysis class and this interview-like questions are appealing.

thearn4 · on Jan 2, 2014

These are rough guidelines based on current best-practices that I know of, and shouldn't be treated as doctrine obviously. Numerical analysis/linear algebra is actually a pretty fast evolving field as far as applied math goes. Though statistics is a bit less dynamic at the moment, I'd say.

Honestly, after a certain amount of time, I'd expect a new hire would be able to teach me what is state-of-the-art in the field based on new literature.

But the questions I'd have in mind would be couched like this:

Linear system:

    - Is it small, square, and numerically well-conditioned? Use LU - it's pretty fast to write and pretty fast to use in practice.

    - Is it small, but rectangular (ie. overdetermined), or not as well conditioned? QR is a good choice.

    - Is it small, but terribly conditioned, or do you want to do rank revealing, or low-rank approximation while you're at it? Will you be using this matrix to solve many problems (multiple right-hand sides)? SVD would fit the bill

   - Is it large, or sparse, or implicitly-defined (ie. you don't actually have access to the elements of the matrix defining the system - you just have a surrogate function that gives you vectors in its range, or something)? Use an iterative algorithm. Krylov subspace methods (MINRES, GMRES, conjugate gradient, etc) are your friends here.

Pattern matching (more specific in question formulation):

    - If you wanted to determine the "strength" of a waveform (in a finite uniform sampling of data) that is re-occurring in a fairly regular way (like arterial pulse in an array of data taken from an oximeter), what type of transformation would you use, and how would you use the resulting information given in transform domain?

   - What if you wanted to determine the strength of a waveform that is short lived/impulse in nature, but re-occurs without any known periodicities (eg. eye blinks artifacts in a sample of EEG data)

   - How would your answers to the above questions change if there are n separate channels of data collected simultaneously (ie. sampled in different locations), which may be analyzed together?

Statistical analysis (might seem vague, I'd be more interested in good discussion with a candidate here than in actual whiteboard writing):

    - What does statistical significance mean, in the context of decision making? Is it a property of the test you perform, or is it a property of your data? (sort of a trick question, this basically rehashes the Fisher vs. Neyman & Pearson debates of the 20th century stats community)

   - Some cod problems of when to use z, t, f tests. Basically, you use them when your situation matches the appropriate inference models (two normally-distributed sample comparison of means with identical variances? t test.)

   - How do you construct an optimal test from scratch, if one doesn't already exist for your particular situation? (basically, if minimize type II error with fixed type I makes you comfortable for your problem, can you do use the Neyman-Pearson lemma to do likelihood ratio construction correctly)

   - What does a p-value actually mean? What if you instead wanted to have actual probabilities for your hypotheses, or you had apriori information that you wanted to use? (Bayesian inference is the winner here)

   - Probably something from point estimation: least squares, minimax criterion, Bayesian MAP estimates, general model fitting, those sort of thing. Brings it all back around to numerical (where I'm most comfortable). Like how applying statistical ridge regression is just knowing how to code up Tikhonov regularization, when you get down to implementation.

darkrho · on Jan 2, 2014

Thanks for such extended reply!

darkrho · on Jan 2, 2014

Re: E(x,z) is false (or !E(x,z)). Otherwise we would have E(y,z) given that E(y,x), that's a contradiction.

darkrho · on Oct 25, 2013

Here is a similar project: https://github.com/scrapinghub/splash#renderpng

darkrho · on Feb 7, 2013

I agreed. Harry can't talk about a Monoid with that non well-defined function. He deserved to be fired :D

(As I pointed out in my comment: http://www.jasq.org/2/post/2013/01/the-mathematician-the-mon...)

darkrho · on Dec 1, 2011

same here