R IN ACTION: Data Analysis and Graphics with R, 2nd Edition – #bookreview

R in Action

Data Analysis and Graphics with R

Robert I. Kabacoff

Manning – paperback

Whether data analysis is your field, your current major or your next career-change ambition, you likely should get this book. Free and open source  R is one of the world’s most popular languages for data analysis and visualization. And Robert I. Kabacoff’s updated new edition is, in my opinion, one of the top books out there for getting a handle on R. (I have used and previously reviewed several R how-to books.)

R is relatively easy to install on Windows, Mac OS X and Linux machines. But it is generally considered difficult to learn. Much of that is because of its rich abundance of features and packages, as well as its ability to create many types of graphs. “The base installation,” Kabacoff writes, “provides hundreds of data-management, statistical, and graphical functions out of the box. But some of its most powerful features come from the thousands of extensions (packages) provided by contributing authors.”

Kabacoff concedes: “It can be hard for new users to get a handle on what R is and what it can do.” And: “Even the most experienced R user is surprised to learn about features they were unaware of.”

R in Action, Second Edition, contains more than 200 pages of new material. And it is nicely structured to meet the needs of R beginners, as well as those of us who have some experience and want to gain more.

The book (579 pages in print format) is divided into five major parts. The first part, “Getting Started,” takes the beginner from an installing and trying R to creating data sets, working with graphs, and managing data. Part 2, “Basic Methods,”focuses on graphical and statistical techniques for obtaining basic information about data.”

Part 3, “Intermediate Methods,” moves the reader well beyond “describing the relationship between two variables.” It introduces  regression, analysis of variance, power analysis, intermediate graphs, and resampling statistics and bootstrapping. Part 4 presents “Advanced Methods,” including generalized linear models, principal components and factor analysis, time series, cluster analysis, classification, and advanced methods for missing data.

Part 5, meanwhile, offers how-to information for “Expanding Your Skills.” The topics include: advanced graphics with ggplot2, advanced programming, creating a package, creating dynamic reports, and developing advanced graphics with the lattice program.

A key strength of R in Action, Second Edition is Kabacoff’s use of generally short code examples to illustrate many of the ways that data can be entered, manipulated, analyzed and displayed in graphical form.

The first thing I did, however, was start at the very back of the book, Appendix G, and upgrade my existing version of R to 3.2.1, “World-Famous Astronaut.” The upgrade instructions could have been a little bit clearer, but after hitting a couple of unmentioned prompts and changing a couple of wrong choices, the process turned out to be quick and smooth.

Then I started reading chapters and keying in some of the code examples. I had not used R much recently, so it was fun again to enter some commands and numbers and have nicely formatted graphs suddenly pop open on the screen.

Even better, it is nice to have a LOT of new things to learn, with a well-written, well-illustrated guidebook in hand.

Si Dunn

 

Advertisements

BIG DATA: A well-written look at principles & best practices of scalable real-time data systems – #bookreview

 

 

Big Data

Principles and best practices of scalable real-time data systems

Nathan Marz, with James Warren

Manning – paperback

Get this book, whether you are new to working with Big Data or now an old hand at dealing with Big Data’s seemingly never-ending (and steadily expanding) complexities.

You may not agree with all that the authors offer or contend in this well-written “theory” text. But Nathan Marz’s Lambda Architecture is well worth serious consideration, especially if you are now trying to come up with more reliable and more efficient approaches to processing and mining Big Data. The writers’ explanations of some of the power, problems, and possibilities of Big Data are among the clearest and best I have read.

“More than 30,000 gigabytes of data are generated every second, and the rate of data creation is only accelerating,” Marz and Warren point out.

Thus, previous “solutions” for working with Big Data are now getting overwhelmed, not only by the sheer volume of information pouring in but by greater system complexities and failures of overworked hardware that now plague many outmoded systems.

The authors have structured their book to show “how to approach building a solution to any Big Data problem. The principles you’ll learn hold true regardless of the tooling in the current landscape, and you can use these principles to rigorously choose what tools are appropriate for your application.” In other words, they write, you will “learn how to fish, not just how to use a particular fishing rod.”

Marz’s Lambda Architecture also is at the heart of Big Data, the book. It is, the two authors explain, “an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to Big Data systems that can be built and run by a small team.”

The Lambda Architecture has three layers: the batch layer, the serving layer, and the speed layer.

Not surprisingly, the book likewise is divided into three parts, each focusing on one of the layers:

  • In Part 1, chapters 4 through 9 deal with various aspects of the batch layer, such as building a batch layer from end to end and implementing an example batch layer.
  • Part 2 has two chapters that zero in on the serving layer. “The serving layer consists of databases that index and serve the results of the batch layer,” the writers explain. “Part 2 is short because databases that don’t require random writes are extraordinarily simple.”
  • In Part 3, chapters 12 through 17 explore and explain the Lambda Architecture’s speed layer, which “compensates for the high latency of the batch layer to enable up-to-date results for queries.”

Marz and Warren contend that “[t]he benefits of data systems built using the Lambda Architecture go beyond just scaling. Because your system will be able to handle much larger amounts of data, you’ll be able to collect even more data and get more value out of it. Increasing the amount and types of data you store will lead to more opportunities to mine your data, produce analytics, and build new applications.”

This book requires no previous experience with large-scale data analysis, nor with NoSQL tools. However, it helps to be somewhat familiar with traditional databases. Nathan Marz is the creator of Apache Storm and originator of the Lambda Architecture. James Warren is an analytics architect with a background in machine learning and scientific computing.

If you think the Big Data world already is too much with us, just stick around a while. Soon, it may involve almost every aspect of our lives.

Si Dunn

GIT IN PRACTICE: A fine how-to guide, with 66 techniques for greater effectiveness in individual & team settings – #programming #bookreview

 

 

Git in Practice

Mike McQuaid

Manning Books – paperback

 

I have taken Git how-to classes and read several how-to books on the Git distributed version control system. But I don’t use Git every day. Therefore, I tend to forget how to do certain tasks when I once again start bumbling around with my various local and remote Git repositories.

Git in Practice is exactly the book I have been needing at my computer. Git in Practice gives clear how-to steps, plus descriptions of ways to be more efficient and effective with Git in individual and team settings. And the well-written book even provides interesting background on how Git came to be–and be the way that it is.

For Git newcomers (and for those like me who tend to get rusty fairly quickly), the book’s appendices include how install Git, how to create a GitHub account and repository and how to benefit from the author’s heavily commented Git configuration files. There also is a handy index of Git methods for those times when you think you remember a particular command-line entry but aren’t sure exactly what is supposed to happen and what options, if any, may appear.

It matters not if you are new to Git, or someone who uses Git sporadically, or someone who uses Git daily as part of a software development or software test team. Git in Practice is a fine and useful book to keep within reach.

 

– Si Dunn

Mastering Gamification – A 30-day strategy to enhance customer engagement – #business #bookreview

 

Mastering Gamification

Customer Engagement in 30 Days

Scot Harris and Kevin O’Gorman

(Impackt Publishing – Kindle, paperback)

 Gamification is now a popular buzz word in many parts of the business world. This book wisely does not try to cover every angle, but stays focused on one application: “Marketing and sales people are using gamification to improve customer loyalty and engagement, knowing that it will lead to increased profitability,” the authors write.

They emphasize that “gamifying does not mean turning your business or website into a game. As Gamification.org defines it, gamifying is:

‘The presence or addition of game-like characteristics in anything
that has not been traditionally considered a game.’

 “Take particular note of the word ‘characteristics’ in this phrase,” the authors point out . “The purpose of gamifying is not to turn something into a game, but to apply understanding and knowledge about the basic human desires we all have that make us like games to a non-gaming environment, and hopefully to improve our businesses.”

 You may not finish all of the exercises, nor follow all of the suggestions in this well-written book. Yet the well-structured, 30-day plan offered by Harris and O’Gorman still can help you think harder about your business, how customers see it and how they engage–or don’t engage–with the products or services you offer.

 Even if you operate a small enterprise where you are the entire staff, this book can offer some good ideas and useful tips that can help you make more sales and keep customers coming back.

 What the authors aim to do is help you create and “launch a long-range, ongoing, continuous process of attracting the attention of a target audience, drawing them into a social space built around you and your products or services, encouraging them to evangelize about your products or services, and instilling in them an unshakable sense of loyalty.”

 In other words, you learn how to use some gamification techniques to get customers’ attention, keep their attention, and keep them coming back for more of whatever you are selling–three major keys to long-term survival and growth in business.

Si Dunn

Mule in Action, 2nd Edition – Want to be an integration developer? Here’s a good start – #bookreview

 

Mule in Action, Second Edition

David Dossot, John D’Emic, Victor Romero

(Manning – paperback)

 

An enterprise service bus (ESB) can help you link together many different types of platforms and applications–old and new–and keep them communicating and passing data between each other.

“Mule,” this book’s authors note, “is a lightweight, event-driven enterprise service bus and an integration platform and broker.  As such, it resembles more a rich and diverse toolbox than a shrink-wrapped application.”

Mule in Action, Second Edition, is a comprehensive and generally well-written overview of Mule 3 and how to put its open-source building blocks together to create integration solutions and develop them with Mule. The book provides very good focus on sending, receiving, routing, and transforming data, key aspects of an ESB.

More attention, however, could have been paid to clarity and detail in Chapter 1, the all-important chapter that helps Mule newcomers get started and enthused.

This second edition is a recent update of the 2009 first edition. Unfortunately, the Mule screens have changed a bit since the book’s screen shots were created for the new edition. Therefore, some of the how-to instructions and screen images do not match what the user now sees. This gets particularly confusing while trying to learn how to configure a JMS outbound endpoint for the first time, using Mule Studio’s graphical editor. The instructions seem insufficient, and the mismatch of screens can leave a beginner unsure how to proceed.

The same goes for configuring the message setting in the Logger element. The text instructs: “You’ll set the message attribute to print a String followed by the payload of the message, using the Mule Expression Language.” But no example is given. Fortunately, a reviewer on Amazon has posted a correct procedure. In his view, the message attribute should be: We received a message: #[message.payload]  –without any quote marks around it. (It works.)

Of course, this book is not really aimed at beginners–it’s for developers, architects, and managers (even though there will be Mule “beginners” in those ranks). Fortunately, it soon moves away from relying solely on Mule Studio’s graphical editor. The book’s examples, as the authors note, “mostly focus on the XML configurations of flows.” Thus, there are many XML code examples to work with, plus occasional screen shots of the flows as they appear in Mule Studio. And you can use other IDEs to work with the XML, if you prefer.

Indeed, the authors note, “no functionality in the CE version of Mule is dependent on Mule Studio.”

Overall, this is a very good book, and it definitely covers a lot of ground, from “discovering” Mule to becoming a Mule developer of integration applications, and using certain tools (such as business process management systems) to augment the applications you develop. I just wish a little more how-to clarity had been delivered in Chapter 1.

Si Dunn

Software Requirements, Third Edition – A major, long-needed update of a classic book – #software #business #bookreview

Software Requirements, Third Edition

Karl Wiegers and Joy Beatty
(Microsoft Press – paperback, Kindle)

A lot changes in 10 years, particularly in the world of software development. The previous edition of this book appeared in 2003, and I never knew about it while I struggled over software requirements documents and user manuals as a technical writer for several big and small companies.

In those days, pulling information out of software engineers was on par with pulling their wisdom teeth using needle-nosed pliers. And management seldom was helpful. Sometimes, I would be sitting at my desk, working on some project, and a high-level delegation suddenly would arrive.

“We are releasing a new software update tomorrow,” the delegation leader would announce. “And we need some documentation written. Here is the latest requirements document. We need for you to expand it into a release document. Oh, and some kind of user manual.”

Fortunately and unfortunately, the software release almost always slipped from tomorrow to the next week and then to the next month as bugs emerged during final testing. While the customer grumbled or screamed, I had time to produce new documents from the software requirements, plus interviews with any engineer I could grab and threaten to name in the materials that I would send out to customers.

It was all seat-of-the-pants stuff. Now, after retiring several years ago, I can only wish I had had this well-written “best practices” guide to creating, managing, and making best use of software requirements documents.

Software Requirements, Third Edition covers a lot of ground in its 637 (print-edition) pages. The 32 chapters are organized into five major parts:

  • Part I – Software Requirements: What, Why, and Who
  • Part II – Requirements Development
  • Part III – Requirements for Specific Project Classes
  • Part IV – Requirements Management
  • Part V – Implementing Requirements Engineering

The book’s two authors, each an expert in software requirements development, emphasize that a software requirements document can be a shining beacon of guidance and clarity or a confusing array of ill-defined features and functions–or it can be something that hovers perilously between good and bad.

The writers emphasize: “Many problems in the software world arise from shortcomings in the ways that people learn about, document, agree upon and modify the product’s requirements….[C]ommon problem areas are information gathering, implied functionality, miscommunicated assumptions, poorly specified requirements, and a casual change process. Various studies suggest that errors introduced during requirements activities account for 40 to 50 percent of all defects found in a software product….Inadequate user input and shortcomings in specifying and managing customer requirements are major contributors to unsuccessful projects. Despite this evidence,” they warn, “many organizations still practice ineffective requirements methods.”

Indeed, they add: “Nowhere more than in the requirements do the interests of all the stakeholders in a project intersect….These stakeholders include customers, users, business analysts, developers, and many others. Handled well, this intersection can lead to delighted customers and fulfilled developers. Handled poorly, it is the source of misunderstanding and friction that undermine the product’s quality and business value.”

The intended primary readership for the book includes “business analysts and requirements engineers, along with software architects, developers, project managers, and other stakeholders.”

In my view, Software Requirements, Third Edition should be read by an even bigger audience. This includes anyone who works in software development, anyone who manages software developers, anyone who sells software development services, plus other key personnel in companies that create, sell, or buy specialized or customized software products or services. The buyer must understand the software requirements process just as keenly as the seller. Otherwise, the software development company may try to hide behind certain jargon or definitions or introduce new processes or changes previously undefined as a delaying tactic, particularly if it has fallen behind schedule or otherwise is failing to deliver what it has promised.

A well-structured, well-worded, well-managed requirements document can help save time, money and, most importantly, the reputations of the companies and people on all sides of a software project. This important, newly updated book shows exactly how such documents can be created, managed, and maintained.

Si Dunn

Data Science for Business – A serious guide for those who need to know – #bigdata #bookreview

Data Science for Business

What You Need to Know about Data Mining and Data-Analytic Thinking
Foster Provost and Tom Fawcett
(O’Reilly – paperback, Kindle)

This is not an introductory text for casual readers curious about the hoopla over data science and Big Data.

And you definitely won’t find code here for simple screen scrapers written in Python 2.7 or programs that access the Twitter API to scoop up messages containing certain hashtags.

Data Science for Business is based on an MBA course Foster Provost teaches at New York University, and it is aimed at three specific, serious audiences:

  • “Aspiring data scientists”
  • “Developers who will be implementing data science solutions…”
  • “Business people who will be working with data scientists, managing data science-oriented projects, or investing in data science ventures….”

Provost’s and Fawcett’s book  “concentrates on the fundamentals of data science and data mining,” the two authors state. But it specifically avoids “an algorithm-centered approach” and instead focuses on “a relatively small set of fundamental concepts or principles that underlie techniques for extracting useful knowledge from data. These concepts serve as the foundation for many well-known algorithms of data mining,” the authors note.

“Moreover, these concepts underlie the analysis of data-centered business problems, the creation and evaluation of data science solutions, and the evaluation of general data science strategies and proposals.”

The book is well-written and adequately illustrated with charts, diagrams, mathematical equations and mathematical examples. And the text, while technical and dense in some places, is organized into short sections. Most of the chapters end with insightful summaries that help the lessons stick.

Both authors are experienced veterans in the use of data science in business.  Their new book includes two helpful appendices. One shows how to “assess potential data mining projects” and “uncover potential flaws in proposals.” The second appendix presents a sample proposal and discusses its flaws.

“If you are a business stakeholder rather than a data scientist,” the authors caution, “don’t let so-called data scientists bamboozle you with jargon: the concepts of this book plus knowledge of your own business and data systems should allow you to understand 80% or more of the data science at a reasonable enough level to be productive for your business.”

They also challenge data scientists to “think deeply about why your work is relevant to helping the business and be able to present it as such.”

Si Dunn