One thing is probably the huge number of statistical packages it has (see: http://cran.r-project.org/web/views/), including (static) graphics (e.g: ggplot2, lattice, and just the base graphics).
Hi chimeracoder,
I am very curious to better understand how you find python better for the data pre-processing stage.
I use only R, and would love to know what "I am missing" here.
Any simple example will also make it easier to understand.
Well explained by chimeracoder. Data-table centric operations are much more natural in R while sequential objects (lists, tuples, and strings) are quickly manipulated in Python (there are more string/regex methods there).
I am a heavy Python user, but when I use Numpy/Scipy I don't feel like I'm using Python much anymore so at that point I either switch to R (or Fortran)... though I'm quite optimistic that at some point the pandas DataFrame can become my default storage structure from which I can parse out R tasks through Rpy, SQLite, HDF5, or possibly Reddis.
matplotlib is very verbose though; I almost prefer Matlab's graphics model... though less so than R's basic and lattice graphics.
True,
Though if you need to run a rare or uncommon stat procedure, SAS is not likely to have it in the core, and then you are back to using what "some grad student wrote".
In what metric?
The GUI of SPSS is better then R (which doesn't have a GUI, though interesting competitors are available like Rcmdr and Deducer).
In terms of everything else (performance, graphics, statistical tools, programming language), then from what I understand - R is the winner without a doubt...
I learned R some time ago in university. I've heard of lots of newly graduated colleagues that now work doing "SPSS consulting" (whatever that means) for big businesses. But I can't really see how you could use R to parse financial data, becaule I lack the financial background to understand what it means. Maybe that's what SPSS provides, and what singingfish means with "you don't need to know what you're doing".
SPSS comes from the school of "you don't need to know what you're doing in order to analyse data". R requires that you understand what it is what you want to do in order to make it work.
I quite like the middle ground of JMP (kind of SAS lite), and its a damned sight cheaper than SPSS too.