MapReduce Design Patterns – For solving Big Data problems – #bookreview #programming #hadoop

MapReduce Design Patterns
Donald Miner and Adam Shook
(O’Reilly –
paperback, Kindle)

“MapReduce is a computing paradigm for processing data that resides on hundreds of computers,” the authors point out. It has been “popularized recently by Google, Hadoop, and many others.”

The MapReduce paradigm is “extraordinarily powerful, but does not provide a general solution to what many are calling ‘big data,” they add, “so while it works particularly well on some problems, some are more challenging.” The authors’ focus in their new book is on using MapReduce design patterns as “templates or general guides to solving problems.”

Their new book definitely can help solve some time-crunch problems for new MapReduce developers. It brings together a variety of solutions that have emerged over time in a patchwork of blogs, books, and research papers and explains them in detail, with code samples, illustrations, and cautions about potential pitfalls.

You can’t simply cut and paste solutions from the chapters. But the two writers do “hope that you will find a pattern to get you at least 90% of the way for just about all of your challenges.”

Six of the book’s eight chapters focus on specific types of design patterns:

  • Summarization Patterns
  • Filtering Patterns
  • Data Organization Patterns
  • Join Patterns
  • Metapatterns
  • Input and Output Patterns

“The MapReduce world is in a state similar to the object-oriented world before 1994,” the authors point out. “Patterns today are scattered across blogs, websites such as StackOverflow, deep inside other books, and inside very advanced technology teams at organizations across the world.”

They add that “[t]he intent of this book is not to provide some groundbreaking new ways to solve problems with MapReduce….” but to offer, instead, a collection of “patterns that have been developed by veterans in the field so they can be shared with everyone else.”

The book’s code samples are all written in Hadoop, and the two writers deal with the question of “why should we use Java MapReduce in Hadoop at all when we have options like Pig and Hive,” which reduce the need for MapReduce patterns.

There is “conceptual value,” they state, “in understanding the lower level workings of a system like MapReduce.” Furthermore, “Using Pig or Hive without understanding MapReduce can lead to some dangerous situations.” And, Pig and Hive cannot yet “tackle all of the problems in the ways that Java MapReduce can. This will surely change over time….”

If you are new to MapReduce, this useful and informative book from Donald Miner and Adam Shook can be the next best thing to having MapReduce experts at your side.

MapReduce Design Patterns can save you time and effort, steer you away from dead ends, and help give you solid understandings of the powerful MapReduce paradigm.

Si Dunn

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s