R in a Nutshell, 2nd Edition – A welcome update to an excellent reference guide – #programming #bookreview

R in a Nutshell: 2nd Edition
Joseph Adler
(O’Reilly, paperbackKindle)

Attention, statisticians, data scientists, data journalists, mathematicians, graphics specialists, and others who use the R programming language.  Joseph Adler has updated his popular “desktop quick reference guide” to R.

If you aren’t familiar with R, it is a “free software environment for statistical computing and graphics,” according to the R-Project website.  Some of the world’s biggest corporations and news organizations are now using R. But there also are numerous ways individual users can work with R, including using it inside Microsoft Excel by running RExcel.

The new edition offers some nice improvements over the 2009 first edition, but it is not a full-scale rewrite.  After all, R itself generally doesn’t change much from one release to the next.

Here’s what is new in the new edition:

  • New information on ggplot2 and using R with Hadoop.
  • Formatting changes to make the code examples easier to read.
  • Plotting chapters have been grouped together.
  • “Minor updates.” These “reflect changes in R 2.14 and R 2.15.
  • New sections offering how-to information on “useful tools for manipulating data in R , such as plyr and reshape.

The author says that while his 699-page book “is designed to be a concise guide to R,” it is “not intended to be a book about statistics or an exhaustive guide to R.”

Chapter 3, however, provides a friendly “short R tutorial” with plenty of basic examples.  And Chapter 5 presents a helpful “Overview of the R Language.” The book’s other chapters are packed with code examples, illustrations, and well-written explanations, as well.

R in a Nutshell’s chapters are organized into six parts:

  • Part I – R Basics
  • Part II – The R Language
  • Part III – Working with Data
  • Part IV – Data Visualization
  • Part V – Statistics with R
  • Part VI – Additional Topics (including using r with Hadoop)  

Whether you are: (1)  new to R, (2) trying to land a job where R skills are required, (3) working on projects that could benefit from R’s excellent statistical and graphics capabilities, or (4) an old hand at R, you should have this updated “desktop quick reference” manual on hand.

Si Dunn

For more information:  paperbackKindle

Exploring Everyday Things with R and Ruby – An entertaining & challenging guide to learning 2 languages – #programming #bookreview

Exploring Everyday Things with R and Ruby
Sau Sheong Chang
(O’Reilly, paperbackKindle)

Sau Sheong Chang has embarked joyfully on Mission Next-to-Impossible. With his new book, he wants to inspire everyone to recapture at least some of their childhood passion for exploring and discovering.

“For many professional programmers,” he writes, “coding is a job. It’s drudgery, low-level work that brings food to the table. We have forgotten the promise of computers and the power of programming for discovery. This book is an attempt to bring back that wonder and sense of discovery.”

His new book is indeed full of opportunities for exploration and discovery. If you have a basic understanding of computer programming, a playful curiosity, and a willingness to learn new things, you can have some real fun with this entertaining, well-written how-to guide.

Exploring Everyday Things with R and Ruby provides a basic introduction to both programming languages and shows how to use them in simulations that can create solutions to several practical problems.

A few examples:

  • You must set up a new office with 70 employees. How can you accurately determine the number of restroom stalls that will be needed?
  • How can you do data mining and pattern analysis within your own email, accumulated over years? (Caution: You may discover things about yourself that you haven’t yet realized.)
  • What is the process for building a homemade stethoscope and extracting useful data from a WAV file of your heartbeat?

Author of two previous books on Ruby, Sau Sheong Chang is director of applied research for HP Labs in Singapore. In this new work, he shows how to use “simulations to create experiments, isolate factors, and propose hypotheses to explain the results of the experiments.” And you learn how to work with both Ruby and R in the exercises.

In his view, “…Ruby is a programming language for human beings. Yukihiro “Matz” Matsumoto, the creator of Ruby, often said that he tried to make Ruby natural, not simple, in a way that mirrors life. Ruby programming is a lot like talking to your good friend, the computer. Ruby was designed to make programming fun and to put the human back into the equation for programming.”

Meanwhile, “R offers a powerful and appealing interactive environment for exploring data, and using that interactive environment is part of its appeal. The other reason why R is getting increasingly popular is that it is free [like Ruby]. The existing batch of tools for data analysis—S, MATLAB, SPSS, and SAS—can be quite expensive, and R is a cost-effective way to achieve the same goals. Also, R has a very vibrant and active community of domain experts and developers, including statisticians and data scientists who contribute many very useful packages that enhance its overall capabilities.”

The 233-page book is nicely organized and adequately illustrated. There are, however, two minor dings that may briefly irritate some beginners.

First, in his introduction to R, Sau Sheong Chang describes the virtues of using a graphics package known as ggplot2 and states that it will be used extensively in the book’s exercises. But he doesn’t, at that point, specifically instruct readers how to get it—install.packages(‘ggplot2’)—and verify that it has been downloaded and installed—installed.packages(). So a teaching moment is missed. Instead, you have to remember to turn back about 20 pages to the “Installing Packages” discussion and figure out that you now need to download ggplot2. (But that’s just part of “discovery,” it could be argued.)

Second, a few of the code examples in Chapter 2 require tedious amounts of command-line typing. You don’t get code you can download from the author’s site until Chapter 3—just a nitpick.

You won’t become an R or Ruby expert by reading Exploring Everyday Things with R and Ruby. But this excellent book can show you how to install the software, learn the basics of using it, and actually put it to work in some practical ways.

From there, you can launch your own journeys of exploration and discovery—and use R and Ruby as you go.

Si Dunn

Machine Learning for Hackers – Analyzing & displaying data using R – #bookreview #in #programming

Machine Learning for Hackers
By Drew Conway and John Myles White
paperback, list price $39.99; Kindle edition, list price $31.99)

The word “hacker has a very bad reputation in many parts of the computer world.

This book’s two authors, however, offer a different and much more positive view. “Far from the stylized depictions of nefarious teenagers or Gibsonian cyber-punks portrayed in pop culture, “they write, “we believe a hacker is someone who likes to solve problems and experiment with new technologies.”

In their view: “If you’ve ever sat down with the latest O’Reilly book on a new computer language and knuckled out coded until you were well past ‘Hello, World,’ then you’re a hacker. “ You’re also a hacker, in their view, “if you’ve dismantled a new gadget until you understood the entire machinery’s architecture….”

As for machine learning, they define it “[a]t the highest level of abstraction…as a set of tools and methods that attempt to infer patterns and extract insight from a record of the observable world.” In more concrete terms, machine learning “blends concepts and techniques from many different traditional fields, such as mathematics, statistics, and computer science.” At the computer programming level, machine learning is defined as “a toolkit of algorithms that enables computers to train themselves to automate useful tasks.”

Conway’s and White’s new book, Machine Learning for Hackers, is rich with challenges for experienced programmers who love to crunch data. Its code examples use the R programming language, a “software environment for statistical computing and graphics.” It can be downloaded free for Windows, MacOS, or a variety of UNIX platforms from The R Project for Statistical Computing.

What you don’t get in this book is an R language tutorial. Instead of “Hello, World!” in the introductory chapter, you jump straight into working with a very interesting data set and generating histograms dealing with distributions of UFO sightings.

It is assumed that you have done some programming, and the authors note that you can find basic R tutorials online or in other books.

With a case-studies approach, each chapter of the 303-page book focuses on a particular problem in machine learning, and the authors show how to analyze sample databases and create simple machine learning algorithms.

The chapters are:

  1. Using R
  2. Data Exploration
  3. Classification: Spam Filtering
  4. Ranking: Priority Inbox
  5. Regression: Predicting Page Views
  6. Regularization: Text Regression
  7. Optimization: Breaking Codes
  8. PCA [principal components analysis]: Building a Market Index
  9. MDS [multidimensional scaling]: Visually Exploring US Senator Similarity
  10. kNN [The k-Nearest Neighbors algorithm]: Recommended Systems
  11. Analyzing Social Graphs
  12. Model Comparison

Some of the other projects the authors present include: using linear progression to predict the number of page views for 1,000 top websites; doing statistical comparisons and contrasts of U.S. Senators based on their voting records; and building “a ‘who to follow’ recommendation engine” for Twitter that doesn’t violate Twitter’s terms of service or its API’s “strict rate limit.”

Conway and White offer some fairly heady and challenging learning experiences for those who would like to work with pattern recognition algorithms and big piles of data.

“The notion of observing data, learning from it, and then automating some process of recognition is at the heart of machine learning,” the authors note, “forms the primary arc of this book.”


Si Dunn is a novelist, screenwriter, freelance book reviewer, and former software technical writer and software/hardware QA test specialist. He also is a former newspaper and magazine photojournalist. His latest book is Dark Signals, a Vietnam War memoir. He is the author of an e-book detective novel, Erwin’s Law, now also available in paperback, plus a novella, Jump, and several other books and short stories.