The Art of R Programming: A Tour of Statistical Software Design
By Norman Matloff
(No Starch Press, list price $39.95, paperback)
What? You haven’t heard of R, the programming language?
“R is a scripting language for statistical data manipulation and analysis,” writes Norman Matloff, an experienced and widely published writer who is a professor of computer science at the University of California, Davis. He is also a former statistics professor.
R, he notes in this excellent overview of the programming language, has a rather complicated past.
“It was inspired by, and is mostly compatible with the statistical language S developed by AT&T. The name S, for statistics, was an allusion to another programming language with a one-letter name developed at AT&T—the famous C language. S later was sold to a small firm, which added a graphical user interface (GUI) and named the result S-plus.”
According to Matloff, “R has become more popular than S or S-plus, both because it’s free and because more people are contributing to it. R is sometimes called GNU S, to reflect its open source nature. (The GNU Project is a major collection of open source software.)”
So much for its history. Who uses R? A lot of people involved in statistics and data science. “It is widely used,” Matloff reports, “in every field where there is data—business, industry, government, medicine, academia, and so on.”
Here’s the good news about his good book. If you’ve never heard of R or if it’s something you’ve only recently considered trying, Matloff shows you how to get started quickly both in interactive mode and batch mode.
And you don’t begin by tiresomely displaying “Hello, world.” You start at the heart of R. You make a simple data set, which, in R parlance, is called a vector. You concatenate three numbers, in this case 1, 2 and 4.
“More precisely,” Matloff states, “we are concatenating three one-element vectors that consist of those numbers.” He adds: “It’s hard to imagine R code, or even an interactive R session, that doesn’t involve vectors.”
From there, his book smoothly delves into a wide range of R topics, including basic types, data structures, closures, recursion, anonymous functions, object-oriented programming, and interfacing R to other programming languages.
The Art of R Programming is rich with short, instructive code examples, including examples that initially have bugs but are corrected and given explanations for why the first try went awry.
The book’s marketing materials note that archaeologists use R to trace how ancient civilizations spread, and drug companies use it to try to figure out which medications are safe and effective. And actuaries use it, of course, to “assess financial risks and keep markets moving smoothly.”
But R can be used in much more commonplace settings, as well. You don’t have to know statistics, and you don’t have to be a professional programmer. You can be a beginner wanting to become expert. Or you can be, and remain, a hobbyist programmer.
R commands typically are submitted “by typing in a terminal window rather than clicking a mouse in a GUI, and most R users do not use a GUI,” Matloff cautions.
But: “This doesn’t mean that R doesn’t do graphics. On the contrary, it includes tools for producing graphics of great utility and beauty, but they are used for system output, such as plots, not for user input.”
Never fear, however. A number of free GUIs are available for R, and Matloff gives links to several.
Two appendices in Matloff’s book cover downloading, installing and running R. The place to begin is the Comprehensive R Archive Network (CRAN), where “thousands of user-written packages” are available. And there are “precompiled binaries for Windows, Linux, and Mac OS X on CRAN,” Matloff points out.
No Starch Press, the book’s publisher, pledges that it delivers “the finest in geek entertainment.” Many readers likely will say this handsome, well-structured and well-written R overview meets that promise.
– Si Dunn