The word “hacker“ has a very bad reputation in many parts of the computer world.
This book’s two authors, however, offer a different and much more positive view. “Far from the stylized depictions of nefarious teenagers or Gibsonian cyber-punks portrayed in pop culture, “they write, “we believe a hacker is someone who likes to solve problems and experiment with new technologies.”
In their view: “If you’ve ever sat down with the latest O’Reilly book on a new computer language and knuckled out coded until you were well past ‘Hello, World,’ then you’re a hacker. “ You’re also a hacker, in their view, “if you’ve dismantled a new gadget until you understood the entire machinery’s architecture….”
As for machine learning, they define it “[a]t the highest level of abstraction…as a set of tools and methods that attempt to infer patterns and extract insight from a record of the observable world.” In more concrete terms, machine learning “blends concepts and techniques from many different traditional fields, such as mathematics, statistics, and computer science.” At the computer programming level, machine learning is defined as “a toolkit of algorithms that enables computers to train themselves to automate useful tasks.”
Conway’s and White’s new book, Machine Learning for Hackers, is rich with challenges for experienced programmers who love to crunch data. Its code examples use the R programming language, a “software environment for statistical computing and graphics.” It can be downloaded free for Windows, MacOS, or a variety of UNIX platforms from The R Project for Statistical Computing.
What you don’t get in this book is an R language tutorial. Instead of “Hello, World!” in the introductory chapter, you jump straight into working with a very interesting data set and generating histograms dealing with distributions of UFO sightings.
It is assumed that you have done some programming, and the authors note that you can find basic R tutorials online or in other books.
With a case-studies approach, each chapter of the 303-page book focuses on a particular problem in machine learning, and the authors show how to analyze sample databases and create simple machine learning algorithms.
The chapters are:
- Using R
- Data Exploration
- Classification: Spam Filtering
- Ranking: Priority Inbox
- Regression: Predicting Page Views
- Regularization: Text Regression
- Optimization: Breaking Codes
- PCA [principal components analysis]: Building a Market Index
- MDS [multidimensional scaling]: Visually Exploring US Senator Similarity
- kNN [The k-Nearest Neighbors algorithm]: Recommended Systems
- Analyzing Social Graphs
- Model Comparison
Some of the other projects the authors present include: using linear progression to predict the number of page views for 1,000 top websites; doing statistical comparisons and contrasts of U.S. Senators based on their voting records; and building “a ‘who to follow’ recommendation engine” for Twitter that doesn’t violate Twitter’s terms of service or its API’s “strict rate limit.”
Conway and White offer some fairly heady and challenging learning experiences for those who would like to work with pattern recognition algorithms and big piles of data.
“The notion of observing data, learning from it, and then automating some process of recognition is at the heart of machine learning,” the authors note, “forms the primary arc of this book.”
– Si Dunn is a novelist, screenwriter, freelance book reviewer, and former software technical writer and software/hardware QA test specialist. He also is a former newspaper and magazine photojournalist. His latest book is Dark Signals, a Vietnam War memoir. He is the author of an e-book detective novel, Erwin’s Law, now also available in paperback, plus a novella, Jump, and several other books and short stories.