R IN ACTION: Data Analysis and Graphics with R, 2nd Edition – #bookreview

R in Action

Data Analysis and Graphics with R

Robert I. Kabacoff

Manning – paperback

Whether data analysis is your field, your current major or your next career-change ambition, you likely should get this book. Free and open source  R is one of the world’s most popular languages for data analysis and visualization. And Robert I. Kabacoff’s updated new edition is, in my opinion, one of the top books out there for getting a handle on R. (I have used and previously reviewed several R how-to books.)

R is relatively easy to install on Windows, Mac OS X and Linux machines. But it is generally considered difficult to learn. Much of that is because of its rich abundance of features and packages, as well as its ability to create many types of graphs. “The base installation,” Kabacoff writes, “provides hundreds of data-management, statistical, and graphical functions out of the box. But some of its most powerful features come from the thousands of extensions (packages) provided by contributing authors.”

Kabacoff concedes: “It can be hard for new users to get a handle on what R is and what it can do.” And: “Even the most experienced R user is surprised to learn about features they were unaware of.”

R in Action, Second Edition, contains more than 200 pages of new material. And it is nicely structured to meet the needs of R beginners, as well as those of us who have some experience and want to gain more.

The book (579 pages in print format) is divided into five major parts. The first part, “Getting Started,” takes the beginner from an installing and trying R to creating data sets, working with graphs, and managing data. Part 2, “Basic Methods,”focuses on graphical and statistical techniques for obtaining basic information about data.”

Part 3, “Intermediate Methods,” moves the reader well beyond “describing the relationship between two variables.” It introduces  regression, analysis of variance, power analysis, intermediate graphs, and resampling statistics and bootstrapping. Part 4 presents “Advanced Methods,” including generalized linear models, principal components and factor analysis, time series, cluster analysis, classification, and advanced methods for missing data.

Part 5, meanwhile, offers how-to information for “Expanding Your Skills.” The topics include: advanced graphics with ggplot2, advanced programming, creating a package, creating dynamic reports, and developing advanced graphics with the lattice program.

A key strength of R in Action, Second Edition is Kabacoff’s use of generally short code examples to illustrate many of the ways that data can be entered, manipulated, analyzed and displayed in graphical form.

The first thing I did, however, was start at the very back of the book, Appendix G, and upgrade my existing version of R to 3.2.1, “World-Famous Astronaut.” The upgrade instructions could have been a little bit clearer, but after hitting a couple of unmentioned prompts and changing a couple of wrong choices, the process turned out to be quick and smooth.

Then I started reading chapters and keying in some of the code examples. I had not used R much recently, so it was fun again to enter some commands and numbers and have nicely formatted graphs suddenly pop open on the screen.

Even better, it is nice to have a LOT of new things to learn, with a well-written, well-illustrated guidebook in hand.

Si Dunn


R in a Nutshell, 2nd Edition – A welcome update to an excellent reference guide – #programming #bookreview

R in a Nutshell: 2nd Edition
Joseph Adler
(O’Reilly, paperbackKindle)

Attention, statisticians, data scientists, data journalists, mathematicians, graphics specialists, and others who use the R programming language.  Joseph Adler has updated his popular “desktop quick reference guide” to R.

If you aren’t familiar with R, it is a “free software environment for statistical computing and graphics,” according to the R-Project website.  Some of the world’s biggest corporations and news organizations are now using R. But there also are numerous ways individual users can work with R, including using it inside Microsoft Excel by running RExcel.

The new edition offers some nice improvements over the 2009 first edition, but it is not a full-scale rewrite.  After all, R itself generally doesn’t change much from one release to the next.

Here’s what is new in the new edition:

  • New information on ggplot2 and using R with Hadoop.
  • Formatting changes to make the code examples easier to read.
  • Plotting chapters have been grouped together.
  • “Minor updates.” These “reflect changes in R 2.14 and R 2.15.
  • New sections offering how-to information on “useful tools for manipulating data in R , such as plyr and reshape.

The author says that while his 699-page book “is designed to be a concise guide to R,” it is “not intended to be a book about statistics or an exhaustive guide to R.”

Chapter 3, however, provides a friendly “short R tutorial” with plenty of basic examples.  And Chapter 5 presents a helpful “Overview of the R Language.” The book’s other chapters are packed with code examples, illustrations, and well-written explanations, as well.

R in a Nutshell’s chapters are organized into six parts:

  • Part I – R Basics
  • Part II – The R Language
  • Part III – Working with Data
  • Part IV – Data Visualization
  • Part V – Statistics with R
  • Part VI – Additional Topics (including using r with Hadoop)  

Whether you are: (1)  new to R, (2) trying to land a job where R skills are required, (3) working on projects that could benefit from R’s excellent statistical and graphics capabilities, or (4) an old hand at R, you should have this updated “desktop quick reference” manual on hand.

Si Dunn

For more information:  paperbackKindle

The Data Journalism Handbook – Get new skills for a new career that’s actually in demand – #bookreview

The Data Journalism Handbook: How Journalists Can Use Data to Improve the News
Edited by Jonathan Gray, Liliana Bounegru, and Lucy Chambers
(O’Reilly, paperbackKindle)

Arise, ye downtrodden, unemployed newspaper and magazine writers and editors yearning to be working again as journalists. Data journalism apparently is hiring.

Data journalism? I didn’t know, either, until I read this intriguing and hopeful collection of essays, how-to reports, and case studies written by journalists now working as, or helping train, data journalists in the United States and other parts of the world.

Data journalism, according to Paul Bradshaw of Birmingham City University, combines “the traditional ‘nose for news’ and ability to tell a compelling story with the sheer scale and range of digital information now available.”

Traditional journalists should view that swelling tide of information not as a mind-numbing, overwhelming flood but ”as an opportunity,” says Mirko Lorenz of Deutsche Welle. “By using data, the job of journalists shifts its main focus from being the first ones to report to being the ones telling us what a certain development actually means.”

He adds: “Data journalists or data scientists… are already a sought-after group of employees, not only in the media. Companies and institutions around the world are looking for ‘sense makers’ and professionals who know how to dig through data and transform it into something tangible.”

So, how do you transform yourself from an ex-investigative reporter now working at a shoe store into a prizewinning data journalist?

A bit of training. And, a willingness to bend your stubborn brain in a few new directions, according to this excellent and eye-opening book.

Yes, you may still be able to use the inverted-pyramid writing style and the “five W’s and H” you learned in J-school. But more importantly, you will now need to show you have some good skills in (drum roll, please)…Microsoft Excel.

That’s it? No, not quite.

Google Docs, SQL, Python, Django, R, Ruby, Ruby on Rails, screen scrapers, graphics packages – these are just a few more of the working data journalists’ favorite things. Skills in some these, plus a journalism background, can help you become part of a team that finds, analyzes and presents information in a clear and graphical way.

 You may dig up and present accurate data that reveals, for example, how tax dollars are being wasted by a certain school official, or how crime has increased in a particular neighborhood, or how extended drought is causing high unemployment among those who rely on lakes or rivers for income.

You might burrow deep into publically accessible data and come up with a story that changes the course of a major election or alters national discourse.

Who are today’s leading practitioners of data journalism? The New York Times, the Texas Tribune, the Chicago Tribune, the BBC, Zeit Online, and numerous others are cited in this book.

The Data Journalism Handbook grew out of MozFest 2011 and is a project of the European Journalism Centre and the Open Knowledge Foundation.

This book can show you “how data can be either the course of data journalism or a tool with which the story is told—or both.”

If you are looking for new ways to use journalism skills that you thought were outmoded, The Data Journalism Handbook can give you both hope and a clear roadmap toward a possible new career.

Si Dunn

Exploring Everyday Things with R and Ruby – An entertaining & challenging guide to learning 2 languages – #programming #bookreview

Exploring Everyday Things with R and Ruby
Sau Sheong Chang
(O’Reilly, paperbackKindle)

Sau Sheong Chang has embarked joyfully on Mission Next-to-Impossible. With his new book, he wants to inspire everyone to recapture at least some of their childhood passion for exploring and discovering.

“For many professional programmers,” he writes, “coding is a job. It’s drudgery, low-level work that brings food to the table. We have forgotten the promise of computers and the power of programming for discovery. This book is an attempt to bring back that wonder and sense of discovery.”

His new book is indeed full of opportunities for exploration and discovery. If you have a basic understanding of computer programming, a playful curiosity, and a willingness to learn new things, you can have some real fun with this entertaining, well-written how-to guide.

Exploring Everyday Things with R and Ruby provides a basic introduction to both programming languages and shows how to use them in simulations that can create solutions to several practical problems.

A few examples:

  • You must set up a new office with 70 employees. How can you accurately determine the number of restroom stalls that will be needed?
  • How can you do data mining and pattern analysis within your own email, accumulated over years? (Caution: You may discover things about yourself that you haven’t yet realized.)
  • What is the process for building a homemade stethoscope and extracting useful data from a WAV file of your heartbeat?

Author of two previous books on Ruby, Sau Sheong Chang is director of applied research for HP Labs in Singapore. In this new work, he shows how to use “simulations to create experiments, isolate factors, and propose hypotheses to explain the results of the experiments.” And you learn how to work with both Ruby and R in the exercises.

In his view, “…Ruby is a programming language for human beings. Yukihiro “Matz” Matsumoto, the creator of Ruby, often said that he tried to make Ruby natural, not simple, in a way that mirrors life. Ruby programming is a lot like talking to your good friend, the computer. Ruby was designed to make programming fun and to put the human back into the equation for programming.”

Meanwhile, “R offers a powerful and appealing interactive environment for exploring data, and using that interactive environment is part of its appeal. The other reason why R is getting increasingly popular is that it is free [like Ruby]. The existing batch of tools for data analysis—S, MATLAB, SPSS, and SAS—can be quite expensive, and R is a cost-effective way to achieve the same goals. Also, R has a very vibrant and active community of domain experts and developers, including statisticians and data scientists who contribute many very useful packages that enhance its overall capabilities.”

The 233-page book is nicely organized and adequately illustrated. There are, however, two minor dings that may briefly irritate some beginners.

First, in his introduction to R, Sau Sheong Chang describes the virtues of using a graphics package known as ggplot2 and states that it will be used extensively in the book’s exercises. But he doesn’t, at that point, specifically instruct readers how to get it—install.packages(‘ggplot2’)—and verify that it has been downloaded and installed—installed.packages(). So a teaching moment is missed. Instead, you have to remember to turn back about 20 pages to the “Installing Packages” discussion and figure out that you now need to download ggplot2. (But that’s just part of “discovery,” it could be argued.)

Second, a few of the code examples in Chapter 2 require tedious amounts of command-line typing. You don’t get code you can download from the author’s site until Chapter 3—just a nitpick.

You won’t become an R or Ruby expert by reading Exploring Everyday Things with R and Ruby. But this excellent book can show you how to install the software, learn the basics of using it, and actually put it to work in some practical ways.

From there, you can launch your own journeys of exploration and discovery—and use R and Ruby as you go.

Si Dunn

Machine Learning for Hackers – Analyzing & displaying data using R – #bookreview #in #programming

Machine Learning for Hackers
By Drew Conway and John Myles White
paperback, list price $39.99; Kindle edition, list price $31.99)

The word “hacker has a very bad reputation in many parts of the computer world.

This book’s two authors, however, offer a different and much more positive view. “Far from the stylized depictions of nefarious teenagers or Gibsonian cyber-punks portrayed in pop culture, “they write, “we believe a hacker is someone who likes to solve problems and experiment with new technologies.”

In their view: “If you’ve ever sat down with the latest O’Reilly book on a new computer language and knuckled out coded until you were well past ‘Hello, World,’ then you’re a hacker. “ You’re also a hacker, in their view, “if you’ve dismantled a new gadget until you understood the entire machinery’s architecture….”

As for machine learning, they define it “[a]t the highest level of abstraction…as a set of tools and methods that attempt to infer patterns and extract insight from a record of the observable world.” In more concrete terms, machine learning “blends concepts and techniques from many different traditional fields, such as mathematics, statistics, and computer science.” At the computer programming level, machine learning is defined as “a toolkit of algorithms that enables computers to train themselves to automate useful tasks.”

Conway’s and White’s new book, Machine Learning for Hackers, is rich with challenges for experienced programmers who love to crunch data. Its code examples use the R programming language, a “software environment for statistical computing and graphics.” It can be downloaded free for Windows, MacOS, or a variety of UNIX platforms from The R Project for Statistical Computing.

What you don’t get in this book is an R language tutorial. Instead of “Hello, World!” in the introductory chapter, you jump straight into working with a very interesting data set and generating histograms dealing with distributions of UFO sightings.

It is assumed that you have done some programming, and the authors note that you can find basic R tutorials online or in other books.

With a case-studies approach, each chapter of the 303-page book focuses on a particular problem in machine learning, and the authors show how to analyze sample databases and create simple machine learning algorithms.

The chapters are:

  1. Using R
  2. Data Exploration
  3. Classification: Spam Filtering
  4. Ranking: Priority Inbox
  5. Regression: Predicting Page Views
  6. Regularization: Text Regression
  7. Optimization: Breaking Codes
  8. PCA [principal components analysis]: Building a Market Index
  9. MDS [multidimensional scaling]: Visually Exploring US Senator Similarity
  10. kNN [The k-Nearest Neighbors algorithm]: Recommended Systems
  11. Analyzing Social Graphs
  12. Model Comparison

Some of the other projects the authors present include: using linear progression to predict the number of page views for 1,000 top websites; doing statistical comparisons and contrasts of U.S. Senators based on their voting records; and building “a ‘who to follow’ recommendation engine” for Twitter that doesn’t violate Twitter’s terms of service or its API’s “strict rate limit.”

Conway and White offer some fairly heady and challenging learning experiences for those who would like to work with pattern recognition algorithms and big piles of data.

“The notion of observing data, learning from it, and then automating some process of recognition is at the heart of machine learning,” the authors note, “forms the primary arc of this book.”


Si Dunn is a novelist, screenwriter, freelance book reviewer, and former software technical writer and software/hardware QA test specialist. He also is a former newspaper and magazine photojournalist. His latest book is Dark Signals, a Vietnam War memoir. He is the author of an e-book detective novel, Erwin’s Law, now also available in paperback, plus a novella, Jump, and several other books and short stories.