Hadoop is hot! Three new how-to books for riding the Big Data elephant – #programming #bookreview

In the world of Big Data, Hadoop has become the hard-charging elephant in the room.

Its big-name users now span the alphabet and include such notables as Amazon, eBay, Facebook, Google, the New York Times, and Yahoo. Not bad for software named after a child’s toy elephant.

Computer systems that run Hadoop can store, process, and analyze large amounts of data that have been gathered up in many different formats from many different sources.

According to the Apache Software Foundation’s Hadoop website: “The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.”

The (well-trained) user defines the Big Data problem that Hadoop will tackle. Then the software handles all aspects of the job completion, including spreading out the problem in small pieces to many different computers, or nodes, in the distributed system for more efficient processing. Hadoop also handles individual node failures, and collects and combines the calculated results from each node.

But you don’t need a collection of hundreds or thousands of computers to run Hadoop. You can learn it, write programs, and do some testing and debugging on a single Linux machine, Windows PC or Mac. The Open Source software can be downloaded here. (Do some research first. You may have use web searches to find detailed installation instructions for your specific system.)

Hadoop is open-source software that is often described as “a Java-based framework for large-scale data processing.” It has a lengthy learning curve that includes getting familiar with Java, if you don’t already know it.

But if you are now ready and eager to take on Hadoop, Packt Publishing recently has unveiled three excellent how-to books that can help you begin and extend your mastery: Hadoop Beginner’s Guide, Hadoop MapReduce Cookbook, and Hadoop Real-World Solutions Cookbook.

Short reviews of each are presented below.

Hadoop Beginner’s Guide
Garry Turkington
(Packt Publishing – paperback, Kindle)

Garry Turkington’s new book is a detailed, well-structured introduction to Hadoop. It covers everything from the software’s three modes–local standalone mode, pseudo-distributed mode, and fully distributed mode–to running basic jobs, developing simple and advanced MapReduce programs, maintaining clusters of computers, and working with Hive, MySQL, and other tools.

“The developer focuses on expressing the transformation between source and result data sets, and the Hadoop framework manages all aspects of job execution, parallelization, and coordination,” the author writes.

He calls this capability “possibly the most important aspect of Hadoop. The platform takes responsibility for every aspect of executing the processing across the data. After the user defines the key criteria for the job, everything else becomes the responsibility of the system.”

The 374-page book is written well and provides numerous code samples and illustrations. But it  has one drawback for some beginners who want to install and  use Hadoop.  Turkington offers step-by-step instructions for how to perform a Linux installation, specifically Ubuntu. However, he refers Windows and Mac users to an Apache site where there is insufficient how-to information. Web searches become necessary to find more installation details.

Hadoop MapReduce Cookbook
Srinath Perera and Thilina Gunarathne
(Packt Publishing – paperback, Kindle)

MapReduce “jobs” are an essential part of  how Hadoop is able to crunch huge chunks of Big Data.  The Hadoop MapReduce Cookbook offers “recipes for analyzing large and complex data sets with Hadoop MapReduce.”

MapReduce is a well-known programming model for processing large sets of data. Typically, MapReduce is used within clusters of computers that are configured to perform distributed computing.

In the “Map” portion of the process, a problem is split into many subtasks that are then assigned by a master computer to individual computers known as nodes. (Nodes also can have sub-nodes). During the “Reduce” part of the task, the master computer gathers up the processed data from the nodes, combines it and outputs a response to the problem that was posed to be solved. (MapReduce libraries are now available for many different computer languages, including Hadoop.)

“Hadoop is the most widely known and widely used implementation of the MapReduce paradigm,” the two authors note.

Their 284-page book initially shows how to run Hadoop in local mode, which “does not start any servers but does all the work within the same JVM [Java Virtual Machine]” on a standalone computer. Then, as you gain more experience with MapReduce and the Hadoop Distributed File System (HDFS), they guide you into using Hadoop in more complex, distributed-computing environments.

Echoing the Hadoop Beginner’s Guide, the authors explain how to install Hadoop on Linux machines only.

Hadoop Real-World Solutions Cookbook
Jonathan R. Owens, Jon Lentz and Brian Femiano
(Packt Publishing – paperback, Kindle)

The Hadoop Real-World Solutions Cookbook assumes you already have some experience with Hadoop. So it jumps straight into helping “developers become more comfortable with, and proficient at solving problems in, the Hadoop space.”

Its goal is to “teach readers how to build solutions using tools such as Apache Hive, Pig, MapReduce, Mahout, Giraph, HDFS, Accumulo, Redis, and Ganglia.”

The 299-page book is packed with code examples and short explanations that help solve specific types of problems. A few randomly selected problem headings:

  • “Using Apache Pig to filter bot traffic from web server logs.”
  • “Using the distributed cache in MapReduce.”
  • “Trim Outliers from the Audioscrobbler dataset using Pig and datafu.” 
  • “Designing a row key to store geographic events in Accumulo.”
  • “Enabling MapReduce jobs to skip bad records.”

The authors use a simple but effective strategy for presenting problems and solutions. First, the problem is clearly described. Then, under a “Getting Ready” heading, they spell out what you need to  solve the problem. That is followed by a “How to do it…” heading where each step is presented and supported by code examples. Then, paragraphs beneath a “How it works…” heading sum up and explain how the problem was solved. Finally, a “There’s more…” heading highlights more explanations and links to additional details.

If you are a Hadoop beginner, consider the first two books reviewed above. If you have some Hadoop experience, you likely can find some useful tips in book number three

Si Dunn

Advertisements

Programming C# 5.0 – Excellent how-to guide for experienced developers ready to learn C# – #bookreview

Programming C# 5.0
Ian Griffiths
(O’Reilly, paperbackKindle)

Ian Griffiths’ new book is for “experienced developers,” not for beginners hoping to learn the basics of programming while also learning C#. The focus is “Building Windows 8, Web, and Desktop Applications for the .NET 4.5 Framework.”

Earlier editions in the Programming C# series have “explained some basic concepts such as classes, polymorphism, and collections,” Griffiths notes. But C# also keeps growing in power and size, which means the page counts of its how-to manuals must keep growing, too, to cover “everything.”

The paperback version of Programming C# 5.0 weighs in at 861 pages and more than three pounds. So Griffiths’ choice to sharpen the book’s focus is a smart one. Beginners can learn the basics of programming in other books and other ways before digging into this edition. And experienced developers will find that the author’s explanations and code examples now have space to go “into rather more detail” than would have been possible if chapters explaining the basics of programming had been packed in, as well.

If you have done some programming and know a class from an array, this book can be your well-structured guide to learning C#. The “basics” are gone, but you still are shown how to create a “Hello World” program—primarily so you can see how new C# projects are created in Visual Studio, Microsoft’s development environment.

C# has been around since 2000 and “can be used for many kinds of applications, including websites, desktop applications, games, phone apps, and command-line utilities,” Griffiths says.

“The most significant new feature in C# 5.0,” he emphasizes, “is support for asynchronous programming.” He notes that “.NET has always offered asynchronous APIs (i.e., ones that do not wait for the operation they perform to finish before returning). Asynchrony is particularly important with input/output(I/O) operations, which can take a long time and often don’t require any active involvement from the CPU except at the start and end of an operation. Simple, synchronous APIs that do not return until the operation completes can be inefficient. They tie up a thread while waiting, which can cause suboptimal performance in servers, and they’re also unhelpful in client-side code, where they can make a user interface unresponsive.”

In the past, however, “the more efficient and flexible asynchronous APIs” have been “considerably harder to use than their synchronous counterparts. But now,” Griffiths points out, “if an asynchronous API conforms to a certain pattern, you can write C# code that looks almost as simple as the synchronous alternative would.”

If you are an experienced programmer hoping to add C# to your language skills, Ian Griffiths’ new book covers much of what you need to know, including how to use XAML (pronounced “zammel”) “to create  applications of the [touch-screen] style introduced by Windows 8” but also applications for desktop computers and Windows Phone.

Yes, Microsoft created C#, but there are other ways to run it, too, Griffiths adds.

“The open source Mono project (http://www.mono-project.com/) provides tools for building C# applications that run on Linux, Mac OS X, iOS, and Android.”

Si Dunn

For more information:  paperback – Kindle

Switching to the Mac, Mountain Lion Edition – David Pogue scores again – #bookreview

Switching to the Mac: Mountain Lion Edition
David Pogue
(O’Reilly, paperbackKindle)

David Pogue will have to pry Windows PCs out of my cold, dead fingers.

That being said, his new book makes a very compelling case for why you other Windows users should switch from PCs to Macs right away.

As I’ve previously noted, I use three battle-scarred Windows PCs during a typical work day. Yet sometimes (don’t ask why), I am forced – forced, I tell you – to use my wife’s Macintosh, too.

Frankly, I have hated Macs for a long, long time. No, actually, I have hated the smug, “Everything’s milk and honey on a Mac!” attitude that peppy-preppy Mac users (my wife excluded) seem to radiate each time they get around us gray-haired Windows types.

I happen to think the Blue Screen of Death is a lovely work of art, easily on par with Thomas Gainsborough’s The Blue Boy and Edvard Munch’s The Scream, thank you very much. And what is life without the daily excitement of battling evil spyware and sinister viruses from Eastern Europe?

Seriously, I continue to be a huge fan of New York Times tech columnist David Pogue and “The Missing Manual” book series he created. I use several of O’Reilly’s “Missing” manuals on a regular basis.

His new book has convinced me that, okay, maybe it finally might be time to replace one of my combat-scarred PCs with a shiny new Mac. Then I, too, can radiate some of that lustrous “Everything’s sunshine and bunnies!” glow instead of merely gnashing my teeth at the need to download a new patch or service pack.

“OS X has a spectacular reputation for stability and security,” Pogue assures readers. “At this writing, there hasn’t been a single widespread OS X virus—a spectacular feature that makes Windows look like a waste of time.” (David, David, David. “Waste of time”? Tsk, tsk.)

If you are contemplating making the switch or have already switched from Windows to Mac – one that’s running OS X (Mountain Lion) – you need this book. It is well written and nicely illustrated, and it has a strong focus on helping Windows users feel comfortably at home on a new Mac.

“Be glad you waited so long to get a Mac,” Pogue writes in a chapter titled “Special Software, Special Problems.”

“By now, all the big-name programs look and work almost exactly the same on the Mac as they do on the PC.”

You will encounter situations where a favorite Windows program is not available in a Mac equivalent. But there usually are Mac equivalents that offer similar functions. Or, you often can run Windows programs on an OS X Mac in Windows format, Pogue points out.

He also shows how to transfer documents and other files from Windows machines to Macs. Usually, the transfers go smoothly. “It turns out that communicating with a Windows PC is one of the Mac’s most polished talents,” Pogue notes. Sometimes, there are problems, of course, even in “infallible” Mac Land. But Pogue’s huge book (743 pages) gives clear procedures or suggestions for dealing with most of them. And: “Most big-name programs are sold in both Mac and Windows flavors, and the documents they create are freely interchangeable.”

Switching to the Mac: Mountain Lion Edition is organized into five parts:

  • Part 1, Welcome to the Macintosh – Covers the differences between what you see on a Macintosh screen and a Windows screen. Pogue notes that “OS X offers roughly the same features as Windows. That’s the good news. The bad news is that these features are called different things and parked in different spots.”
  • Part 2, Making the Move – Covers how to move software, data and peripherals such as printers and scanners from a Windows PC to a Mac. Includes steps for running Windows on Macs, using Apple Boot Camp. “The only downsides: Your laptop battery life isn’t as good, and you have to restart the Mac again to return to the familiar world of OS X.”
  • Part 3, Making Connections – Shows how to set up web, iCloud, and email connections on a Mac and use Apple’s Internet software suite.
  • Part 4, Putting Down Roots – Covers user accounts, parental controls, security, networking, file sharing, screen sharing, system preferences, and OS X’s “freebie” programs, such as Calendar, Photo Booth, and QuickTime Player.
  • Part 5…(Hello? Why is Part 5 missing from the table of contents and the pages of the printed version?)
  • Part 6, Appendixes – Two of the four appendixes cover installing OS X Mountain Lion and troubleshooting. The third appendix is “The Windows-to-Mac Dictionary,” especially useful for Windows people who have to use a Macintosh once in a while. “It’s an alphabetical listing of every common Windows function and where to find it in OS X,” Pogue says. And the fourth appendix offers a “master keyboard-shortcut list for the entire Mac OS X universe.”

Switching to the Mac, Mountain Lion Edition offers sound reasons (1) why you may prefer to stick with certain Windows for Mac programs on your new Mac and (2) why you may want to abandon certain Windows programs written for Macs and learn to use the Mac programs that are, in Pogue’s estimation, “better.”

You won’t be alone if you become (as I likely will) a user who moves back and forth between Mac world and Windows world, for a long time if not “forever.” In that case, you’ll definitely want Switching to the Mac: Mountain Lion Edition on your reference shelf.

Version Control with Git, 2nd Ed. – Bring order to software development’s collaborative chaos – #bookreview #programming

Version Control with Git, 2nd Edition
Jon Loeliger and Matthew McCullough
(O’Reilly, paperbackKindle)

When I first took a job in software development, individual programmers controlled code versions themselves, and they jealously guarded their releases with back-ups on multiple diskettes – 5.25” diskettes. The real floppies. (Yep, I’m so old I actually worked with a few 8-inch floppies, too.)

It’s a different world now. Code for one project often is developed, modified, tested and controlled by groups of people, sometimes big groups. And many of those who work with the project’s code are scattered all over the planet.

Thus, maintaining version control and keeping good backups are major management challenges for software developers today. There’s no more going home after work with 10 big floppies in your briefcase as a hedge against your office burning down overnight.

Git is a popular, if somewhat difficult, tool for tracking, branching, merging, and managing code revisions. The authors of Version Control with Git favor the term “version control system (VCS)” for this and other software packages that perform similar functions. (“Source code manager (SCM)” is another popular label.)

In their updated and expanded 2nd edition, here is how they sum up the imperative for strong version control:

“No cautious, creative person starts a project nowadays without a back-up strategy. Because data is ephemeral and be lost easily—through an errant code change or catastrophic disk crash, say— it is wise to maintain a living archive of all work. For text and code projects, the back-up strategy typically includes version control, or tracking and managing revisions. Each developer can make several revisions per day, and the ever-increasing corpus serves simultaneously as repository, project narrative, communication medium, and team and project management tool. Given its pivotal role, version control is most effective when tailored to the working habits and goals of the project team.”

Whether you do or do not yet have experience with a version control system, you can glean important information and numerous useful tips from this book’s 21 chapters and 434 pages. Version Control with Git covers a lot of vital ground in a well-organized how-to fashion, with plenty of code samples and related illustrations.

One example out of its many key lessons: “As the developer of content for a project using Git, you should create your own private copy, or clone, of the repository to do your development. This development repository should serve as your own work area where you can make changes without fear of colliding with, interrupting, or otherwise interfering with another developer.”

In another key lesson, they show how to use git stash, “the mechanism for capturing your work in progress, allowing you to save it and return to it later when convenient….the stash is a quick convenience mechanism that allows a complete and thorough capturing of your index and working directory in one simple command. It leaves your repository clean, uncluttered, and ready for an alternate development direction. Another single command restores that index and working directory state completely, allowing you to resume where you left off.”

In a software development environment where everything is a crisis and priorities change hourly on what should have been finished yesterday, git stash save and git stash pop may become two of your favorite commands.

The book describes installing versions of Git for Linux and Microsoft Windows, and for running within Cygwin. It also can be run on Mac OS X and Solaris systems. Meanwhile, most of the book’s chapters focus on using the Git command line tool. But the new 2nd edition also devotes a chapter to what many Git users consider the most vital tool that has emerged from the big online community that now surrounds Git: GitHub.com.

Developers often clone a repository from GitHub. Several types of public and private repositories also can be created there. And so-called “social coding” is available. Indeed, many open source projects are hosted on GitHub, and some of them attract people who simply watch the coding, while others do coding in personal “forks” that may or not prove helpful to those more officially involved in the project. Yet another popular use of GitHub is finding useful code examples in particular programming languages.

Whether Git is in your working future or it’s already here, or if you’re still wondering if it can help you, definitely check out Version Control with Git.

Si Dunn

Build Awesome Command-Line Applications in Ruby – #programming #bookreview

Build Awesome Command-Line Applications in Ruby
David Bryant Copeland
(Pragmatic Bookshelf,
paperback)

The word “awesome” now is grossly overused in contemporary culture. And I hate it in book titles.

That being said, Build Awesome Command-Line Applications in Ruby is an excellent how-to guide, particularly if you have a little bit of UNIX and some basic Ruby programming in your background.

The book is “aimed at both developers and system administrators who have some familiarity with Ruby and who find themselves automating things on the command line (or wish they could),” David Bryant Copeland writes. And he adds: “Writing command-line apps in Ruby is also a great way to really learn Ruby and become a better programmer, since you can apply it directly to your day-to-day tasks.”

Mac and Linux users will have the easiest time with this book’s code examples. Things get a little bit more complicated for Windows users, especially those with no UNIX experience and not much programming background, either. The author, fortunately, lays out some workarounds.

For example, on UNIX systems, the first line of code commonly is called the shebang. In a piece of Ruby code, the shebang might look something like this: #!/usr/bin/ruby. (That example tells where the Ruby interpreter is installed.) But, at a Windows command prompt, if Ruby has been installed correctly and is in the path, the # character simply will be interpreted as the start of a comment line, and the rest of the shebang will be ignored when code is run directly, such as: ruby hello_world.rb.

In this book, David Bryant Copeland’s focus definitely is code. “There is a lot of code,” he says, “and we’ll do our best to take each new bit of it step by step.” As the book progresses, two command-line applications are developed, enhanced, and improved. One is a database-backup app, and the other is a command suite, “an app that provides a set of commands, each representing a different function of a related concept.”

This is not a Ruby primer, so get some experience in that language first before tackling this book. But if you are now reasonably comfortable with Ruby coding on a graphical user interface (GUI) and want some new challenges, consider moving to the command line and use this excellent book as your guide.

The requirements are minimal: a free Ruby download and a text editor or a UNIX-like shell. But the payoff is very good.

In his 10 chapters, the author discusses and illustrates “every detail of command-line application development, from user input, program output, and code organization to code handling, testing, and distribution” while the two example applications are created, tested, and enhanced.

There is plenty to learn, and Build Awesome Command-Line Applications in Ruby does a fine job of  leading you through the process in short-chapter steps.

Si Dunn

Learn the Kinect API – New Microsoft ‘Start Here!’ guide shows how – #bookreview

Learn the Kinect™ API
Rob Miles
(Microsoft Press, paperback, Kindle)

The Kinect sensor  is a popular peripheral for Microsoft’s XBox 360 video game systems and Windows PCs. The device contains a video camera, a directional microphone system, and a depth sensor.

Software developers are using the device “to advance the field of computer interaction in all kinds of exciting ways,” the author notes. “It is now possible to create programs that use the Kinect sensor to create a computer interface with the ability to recognize users and understand their intentions using a ‘natural’ user interface consisting of gestures and spoken commands. In addition, the device’s capabilities have a huge range of possible applications, from burglar alarms to robot controllers.”

If you want to learn how to program with the Kinect application programming interface (API), this new book in the popular Microsoft “Start Here!” series can get you moving along the right path toward becoming a developer.

But there are three key assumptions that may slow your start. You are expected to “have a reasonable understanding of .NET development using the C# programming language.” And: “You should be familiar with the Visual Studio 2010 development environment and object-oriented programming development.”

Also, “if you are a C++ developer who wishes to learn how to interact with the Kinect sensor from unmanaged C++ programs, you will find that the code samples supplied will not [emphasis added] provide this information.” All of the code samples are written in C#.

Rob Miles, a programming professor at the United Kingdom’s University of Hull, has organized his well-written, 250-page book into four parts:

  • Part I: Getting Started – Provides an overview of the Kinect and how to hook it up and get it working with your PC.
  • Part II: Using the Kinect Sensor – Covers sensor initialization and introduces each of Kinect’s data sources –video, depth, and sound – and how to use them in programs.
  • Part III: Creating Advanced User Interfaces – Illustrates how the Kinect SDK performs body tracking and how programs can use this information. Also shows how Kinect data can be combined to create augmented-reality applications.
  • Part IV: Kinect in the Real World – Focuses on how the Kinect can interact with external devices, such as MIDI devices and robots.

Learn the Kinect™ API offers several ideas for how you can use the Kinect’s video, sound, and depth-response capabilities in your own programs. One example is using the Kinect’s directional microphone feature so that a spoken password “only works when you say it in one part of [a] room, or you could have different [spoken] passwords for different parts of the room,” Miles points out.

It’s a bit of understatement to say that Rob Miles enjoys working with the Kinect device. “I’ve had,” he writes, “more wow moments with this little sensor bar than I’ve had with much more expensive toys that I’ve played with over time.”

Si Dunn

Understanding IPv6, 3rd Edition – Welcome to the new, improved & BIGGER Internet – #bookreview #microsoft #windows

Understanding IPv6, 3rd Edition
Joseph Davies
(Microsoft Press, paperback, list price $49.99; Kindle edition, list price $39.99)

The Internet can now expand into a much bigger realm than was possible before the worldwide launch of IPv6 (Internet Protocol version 6) on June 6, 2012.

The web most of us use has long relied on IPv4, the circa-1981 Internet Protocol built around 32-bit addresses. This scheme can accommodate approximately 4.3 billion unique addresses worldwide. On a planet where (1) the population now has surpassed 7 billion and (2) many of us now have multiple devices connected to the Web, Internet Protocol version 4 recently has been in dire danger of running out of unique addresses.

IPv6 will fix that problem and offer several important new enhancements, as long as we don’t find ways to expand the Internet to parallel universes or to the people on a few trillion distant planets. IPv6 uses a 128-bit addressing scheme that can accommodate more than 340 trillion trillion trillion unique addresses. So go ahead. Get online with that second iPad, third smart phone or fourth laptop.

IPv4 and IPv6 are now running in a dual stack that supports both addressing schemes. The transition from IPv4 to IPv6 is not seamless, however. A lot of work remains to be done by major Internet service providers (ISPs), web companies, hardware manufacturers, network equipment providers and many others to enable IPv6 on their products and services.

Joseph Davies, author of Understanding IPv6, has been writing about IPv6 since 1999. His new 674-page third edition provides both a detailed overview of IPv6 and a detailed focus on how to implement it, within a limited range of Windows products.

“There are,” he notes, “different versions of the Microsoft IPv6 protocol for Windows….I have chosen to confine the discussion to the IPv6 implementation in Windows Server 2012, Windows Server 2008 R2, Windows Server 2008, Windows 8, Windows 7, and Windows Vista.”

This well-written and well-organized book is not for beginners. Its intended audience includes:

  • Windows networking consultants and planners
  • Microsoft Windows network administrators
  • Microsoft Certified Systems Engineers (MCSEs) and Microsoft Certified Trainers (MCTs)
  • General technical staff
  • Information technology students

Davies and Microsoft offer downloadable companion content for this book: Microsoft Network Monitor 3.4 (a network sniffer for capturing and viewing frames); and PowerPoint 2007 training slides that can be used along with the book to teach IPv6.

If you need a guide to best practices for using IPv6 in a Windows network, definitely consider getting Understanding IPv6, 3rd Edition.

Si Dunn