Machine Learning for Hackers – Analyzing & displaying data using R – #bookreview #in #programming

Machine Learning for Hackers
By Drew Conway and John Myles White
paperback, list price $39.99; Kindle edition, list price $31.99)

The word “hacker has a very bad reputation in many parts of the computer world.

This book’s two authors, however, offer a different and much more positive view. “Far from the stylized depictions of nefarious teenagers or Gibsonian cyber-punks portrayed in pop culture, “they write, “we believe a hacker is someone who likes to solve problems and experiment with new technologies.”

In their view: “If you’ve ever sat down with the latest O’Reilly book on a new computer language and knuckled out coded until you were well past ‘Hello, World,’ then you’re a hacker. “ You’re also a hacker, in their view, “if you’ve dismantled a new gadget until you understood the entire machinery’s architecture….”

As for machine learning, they define it “[a]t the highest level of abstraction…as a set of tools and methods that attempt to infer patterns and extract insight from a record of the observable world.” In more concrete terms, machine learning “blends concepts and techniques from many different traditional fields, such as mathematics, statistics, and computer science.” At the computer programming level, machine learning is defined as “a toolkit of algorithms that enables computers to train themselves to automate useful tasks.”

Conway’s and White’s new book, Machine Learning for Hackers, is rich with challenges for experienced programmers who love to crunch data. Its code examples use the R programming language, a “software environment for statistical computing and graphics.” It can be downloaded free for Windows, MacOS, or a variety of UNIX platforms from The R Project for Statistical Computing.

What you don’t get in this book is an R language tutorial. Instead of “Hello, World!” in the introductory chapter, you jump straight into working with a very interesting data set and generating histograms dealing with distributions of UFO sightings.

It is assumed that you have done some programming, and the authors note that you can find basic R tutorials online or in other books.

With a case-studies approach, each chapter of the 303-page book focuses on a particular problem in machine learning, and the authors show how to analyze sample databases and create simple machine learning algorithms.

The chapters are:

  1. Using R
  2. Data Exploration
  3. Classification: Spam Filtering
  4. Ranking: Priority Inbox
  5. Regression: Predicting Page Views
  6. Regularization: Text Regression
  7. Optimization: Breaking Codes
  8. PCA [principal components analysis]: Building a Market Index
  9. MDS [multidimensional scaling]: Visually Exploring US Senator Similarity
  10. kNN [The k-Nearest Neighbors algorithm]: Recommended Systems
  11. Analyzing Social Graphs
  12. Model Comparison

Some of the other projects the authors present include: using linear progression to predict the number of page views for 1,000 top websites; doing statistical comparisons and contrasts of U.S. Senators based on their voting records; and building “a ‘who to follow’ recommendation engine” for Twitter that doesn’t violate Twitter’s terms of service or its API’s “strict rate limit.”

Conway and White offer some fairly heady and challenging learning experiences for those who would like to work with pattern recognition algorithms and big piles of data.

“The notion of observing data, learning from it, and then automating some process of recognition is at the heart of machine learning,” the authors note, “forms the primary arc of this book.”


Si Dunn is a novelist, screenwriter, freelance book reviewer, and former software technical writer and software/hardware QA test specialist. He also is a former newspaper and magazine photojournalist. His latest book is Dark Signals, a Vietnam War memoir. He is the author of an e-book detective novel, Erwin’s Law, now also available in paperback, plus a novella, Jump, and several other books and short stories.


Continuous Testing with Ruby, Rails, and JavaScript – #bookreview

Continuous Testing with Ruby, Rails, and JavaScript
By Ben Rady and Rod Coffin
(Pragmatic Bookshelf, $33.00, paperback)

I used to test software for a living. It was seldom a pretty sight.

Patches to customized software sometimes would be released to particular customers on an emergency basis. Then I would be asked to test what had just been shipped.

Often, I found bugs – serious bugs. And often, it was Friday afternoon, and the programmers had gone home. Frequently, I had no idea which customer had received the buggy patches, and I had no way to fix the code myself and issue a new release.

So the customers installed bad software over the weekend and quickly called in to complain. But the software development manager had my report. So the programmers then were lashed until morale improved, as the old saying goes. A new load was created — and this time tested before it was shipped to the customer, along with profuse apologies (and who knows what else) by the sales department.

To murder an old saying, this was no way to run a software railroad.

Continuous Testing with Ruby, Rails, and JavaScript shows how programmers can set up and run automated tests continuously while they are writing code.

The book, illustrated with code examples and screen shots, shows how to set up and maintain a quick and powerful test suite and also how to use inline assertions and other continuous-testing (CT) techniques, rather than old-fashioned debugging or printing out piles of paper so you can search frantically for that missing semicolon or extra parenthesis.

Rady’s and Coffin’s 139-page work is divided into three parts. Part I covers Ruby and Autotest. Part II focuses on Rails, JavaScript and Watchr. Part III contains three appendices.

The chapter line-up shows the topic focus in each part.

  • Chapter 1: Why Test Continuously?

Part 1 — Ruby and Autotest

  • Chapter 2: Creating Your Environment
  • Chapter 3: Extending Your Environment
  • Chapter 4: Interacting with Your Code

Part II — Rails, JavaScript, and Watchr

  • Chapter 5: Testing Rails Apps Continuously
  • Chapter 6: Creating a JavaScript CT Environment
  • Chapter 7: Writing Effective JavaScript Tests

Part III — Appendices

  •  Appendix 1: Making the Case for Functional JavaScript
  • Appendix 2: Gem Listing (This is a listing of all the gems installed while testing the book’s examples.)
  • Appendix 3: Bibliography

The goal of the book is to show you how to use a combination of techniques, tests and tools to catch software problems while  you are initially coding, not later in the process when you’re up against the wall of develpment and delivery deadlines.

“A continuous testing environment validates decisions as soon as we make them,” the authors state. “In this environment, every action has an opposite, automatic, and instantaneous reaction that tells if what we just did was a bad idea. This means that making certain mistakes becomes impossible and making others is more difficult. The majority of the bugs that we introduce into our code have a very short lifespan. They never make their way into source control. They never break the build. They never sneak out into the production environment. Nobody ever sees them but us.”

Sounds good to this ex-software tester! (Although I do remain suspicious of the word “never” in anything related to software.) Sure wish the programmers in my groups had had these tools.

“Continuous testing is our first line of defense,” the authors point out. “Failure is extremely cheap here, so this is where we want things to break down most frequently.”

They also describe some drawbacks and limitations to continuous testing and ways to blend CT with continuous integration, before moving into the coding and testing examples.

The authors “suggest” using the follow to run the examples in this book:

  • A *nix operating system (such as Linux or MacOS)
  • Ruby 1.9.2
  • Rails 3.0.4

The book provides a link to online source for the coding examples. 

“The examples may work in other environments (such as Windows) and with other versions of these tools,” they add, “but this is the configuration that we used while writing the book.”

Si Dunn