It's Drew Conway. The guy is good, and he makes complex things very accessible. Sometimes, you need to understand the matrices behind the process, and that's great, lots and lots of books out there about the math. But sometimes you need to group up some data and you need a quick way to get it done mostly well. There aren't as many books out there to do this as there should be, and I look forward to this one.
I like (& use) R, but to say it is "for hackers" seems strange to me -- not really a "hacker" language. Seems like a bit of a marketing strategy(/gimmick) to toss that word into the title. Looks like a promising book, though i'd much prefer it be in a scripting language like Python.
I haven't used R, but I've heard good things about it. Why isn't it for "hackers"?
If anything, it seems like a language more suited for hackers than Python for tasks involving math--it looks like it has a more integrated environment for fast iteration , particularly for generating nice plots of the data you're working with. It also seems like a language designed and optimized exactly for this sort of activity.
The problem with domain-specific systems like R is that, as software systems, you can't "hack" them far outside of their contraints.
Integration into other systems (like the web, or robust big-data pipelines) can't be taken for granted. Doing "standard" stuff becomes nonstandard, thus hacking can hit a wall.
I use R almost daily, and sure, for tasks involving math (& statistical analysis in particular) R is fantastic; however, to say "for hackers" (at least in my mind) would involve a less domain specific language -- like a scripting language (python or some such similar language) that has statistical analysis capabilities. Perhaps a matter of semantics, but in a similar vein, I imagine a book of the same title that used Octave would not be considered "for hackers" by most due to the specific/constrained functionality.
Just seems like "hacker" is such a buzz word now that it gets tacked onto a lot of stuff just to generate interest.
This looks like it's using R to teach, primarily. I'm partial to "Collective Intelligence" (Toby Segaran's book)-- it's written with all python examples and appears to have much of the same content and approach (practical application > theoretical underpinnings).
I just received my copy of "Collective Intelligence" in the mail this morning, and immediately sat down and starting reading. I've been reading for the past several hours, non-stop. It's that good. Perhaps the best part is the chapter at the end of the book that neatly and concisely sums up all the major ML algorithms: neural networks, SVM, kNN, k-means, decision trees, Bayesian classifiers, etc. etc. This chapter alone is worth the price of the book. I only wish it were longer (and it's not a short book).
What would you say is the level of the book? I am intrigued by your enthusiasm, but I think I'd be wasting my time if I went reading an introductory book.
I'd say it's introductory, up to the chapter it discusses SVMs. Than it becomes incomprehensible, unless you have prior knowledge of SVMs and kernel methods.
If you're using Python, what are you using for your stats libraries?
R is, generally, the go to language for stats work I've found. It's certainly used a lot in the financial world for statistical modelling, and all the libraries in it are well tested which is a big plus in my book.
Scikits.statsmodels [1] is the main statistical and econometric library for Python. It is usually used with pandas [2], which provides nice interfaces to your data, particularly for time-series. You can also call R functions with the rpy2 bridge [3]; pandas provides a higher-level API for using rpy2, though last I checked it's not fully fleshed out yet.
Wes McKinney, the developer of pandas, writes a blog [4] that provides a good look at data analysis in Python, particularly if you're in finance.
Check out enthought http://www.enthought.com for "everything and the kitchen sink". Numpy and scipy will generally have you covered, and often wrap low-level libraries written in e.g. FORTRAN and LAPACK. YMMV but I've used python exclusively in scientific computing as well as "web startup" data analysis environments. I love being able to write my data analysis and web server code in the same language. :)
Interesting - I'm not well versed on the Python stats ecosystem. It's a side effect of working in finance where every place I've worked already has large well tested libraries in C/C++/C#.
I liked "Programming Collective Intelligence," as it fairly clearly introduces how the algorithms operate by implementing them from scratch.
I found the coding style a bit non-pythonic, though (two space indentation?), and in practice you'd be better served using one of the many ML libraries (e.g., scikit-learn [1]), which aren't introduced.