Programming MapReduce with Scalding – Using Hadoop & Scala to do some Big Data – #programming #bookreview

Programming MapReduce with Scalding

Programming MapReduce with Scalding

A practical guide to designing, testing, and implementing complex MapReduce applications in Scala

Antonios Chalkiopoulos

(Packt Publishing – paperback, Kindle)

 

Antonio Chalkiopoulos’s new book has three key goals, and it meets each of them in good, readable fashion.

It describes how MapReduce, Hadoop, and Scalding can work together. It suggests some useful design patterns and idioms. And, it provides numerous code examples of “real implementations for common use cases.”

The book also briefly introduces the Scala programming language and the Cascading platform, two elements vital to the Scalding framework.

Right here, a few brief definitions need to be offered.

According to a Wikipedia definition, MapReduce is both a programming model and “an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.”

Meanwhile, the Apache Hadoop website states: “Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.”

And: “The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware” and “is highly fault-tolerant….”

Continuing, the Cascading.org website promotes Cascading as “the proven application development platform for building data applications on Hadoop.” Plus, Scalding, it explains, is  “an extension to Cascading that enables application development with Scala, a powerful language for solving functional problems.” Indeed, Scalding is “[a] Scala API for Cascading,” and it provides functionality from custom join algorithms to multiple APIs (Fields-based, Type-safe, Matrix) for developers to build robust data applications. Scalding is built and maintained by Twitter.”

Scalding “makes MapReduce computations look very similar to Scala’s collection API. It’s also a wrapper for Cascading to simplify jobs, tests and data sources on HDFS or local disk.”

Okay, that’s a lot to digest, especially if you are making some of your first forays into the world of Big Data.

Fortunately, Programming MapReduce with Scalding offers clear, well-illustrated, smoothly paced how-to steps, as well as easy-to-digest definitions and descriptions. It takes the reader from setting up and running a Hadoop mini-cluster and local-development environment to applying  Scalding to real-use cases, as well as developing good test and test-driven development methodologies, running Scalding in production, using external data stores, and applying matrix calculations and machine learning.

The book is written for developers who have “a basic understanding” of Hadoop and MapReduce, but is also intended for experienced Hadoop developers who may be “enlightened by this alternative methodology of developing MapReduce applications with Scalding.”

In this book, “[a] Linux operating system is the preferred environment for Hadoop.” And the author includes instructions for how to install and use a Kiji Bento Box, “a zero-configuration bundle that provides a suitable environment for testing and prototyping projects that use HDFS, MapReduce, and HBase with minimal setup time.”  It’s an easy way to get Apache Hadoop up and running in as little as five minutes or so.

Or, if you prefer, you can manually install the required software packages. Either way, you can learn a lot and do a lot with a Hadoop mini-cluster. And, with this book, you can get a very good handle on the Scalding API.

It does help to be somewhat familiar with MapReduce, Scalding, Scala, Hadoop, Maven, Eclipse and the Linux environment.  But Antonio Chalkiopoulo does a good job of keeping the examples accessible even when readers are new to some of the packages. Still, be prepared to take your time and be prepared to do some additional research on the web and ask questions in forums, particularly if any of the required software is new to you.

(The book also can be purchased direct from Packt Publishing at http://goo.gl/Tyw4Sh.)

 

Si Dunn

 

 

 

 

Scala for Java Developers – A practical guide to building apps and integrating code – #programming #bookreview

Scala for Java Developers

Build reactive, scalable applications and integrate Java code with the power of Scala

Thomas Alexandre

(Packtpaperback, Kindle)

 

The increasingly popular Scala programming language runs on the Java Virtual Machine (JVM). And “Java and Scala stacks can be freely mixed for totally seamless integration,” Scala’s website proudly trumpets.

The Scala site goes on to note: “Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It smoothly integrates features of object-oriented and functional languages.”

Indeed, in Scala, “every value is an object” and “every function is a value.”

Scala’s continuing  rise poses something of a dilemma for many Java developers. Their questions include:

  • Should I learn Scala and blend it with Java?
  • Should I take up Scala and use it instead of  Java?
  • Should I just keep my head down, ignore Scala, and focus on getting better at developing with Java?

Thomas Alexandre’s well-written new book, Scala for Java Developers, clearly is aimed at those who would much rather add Scala to their skills and resumes than attempt to hide from the changes Scala can offer to Java development.

He notes in his preface: “When I tell people around me that I now program in Scala rather than Java, I often get the question, ‘So, in simple words, what is the main advantage of using Scala compared to Java?’  I tend to respond with this: ‘With Scala, you reason and program closer to the domain, closer to plain English.’ “

However, Alexandre’s new how-to guide does not try to browbeat developers into abandoning Java and pledging allegiance to Scala alone. Instead, Alexandre shows how the two programming languages can be used together. And he demonstrates how Scala solutions often can be shorter and less complex than their Java equivalents.

He focuses on “exploring progressively some of the new concepts brought by Scala, in particular, how to unify the best of Object-Oriented and functional programming without giving away any of the established and mature technologies built around Java for the past fifteen years.”

Decisions regarding how far to go with Scala remain with the reader. Yet Alexandre and his numerous code examples make a compelling case for becoming at least reasonably familiar with the language and experiencing how easily it can integrate with Java.

And, yes, he does evangelize a bit; this is, after all, a book about Scala. “Scala is the only language that has it all,” Alexandre touts. “It is statically typed, runs on the JVM and is totally Java compatible, is both object-oriented and functional, and is
not verbose, thereby leading to better productivity, less maintenance, and therefore more fun.”

His book is divided into 10 chapters:

  • Chapter 1: Programming Interactively within Your Project – Includes advantages of using Scala, learning Scala via the REPL, and performing operations on collections.
  • Chapter 2: Code Integration – Focuses on creating a REST API from an existing database, adding a test in Scala, setting up Scala within a Java Maven project, and showing how Scala and Java work together despite some differences in code style.
  • Chapter 3: Understanding the Scala Ecosystem – Includes inheriting Java Integrated Development Environments (IDEs), building with Simple Build Tool (SBT), using Scala worksheets, working with HTTP, and using Typesafe Activator.
  • Chapter 4: Testing Tools – Writing tests with ScalaTest and testing with ScalaCheck.
  • Chapter 5: Getting Started with the Play Framework – Getting started with the classic Play distribution, getting started with the Typesafe Activator, the architecture of a Play application, authentication, and practical tips when using Play.
  • Chapter 6: Database Access and the Future of ORM – In this case, ORM is Object Relational Modeling. Chapter topics include integrating an existing ORM – Hibernate and JPA, dealing with persistence in the Play framework, replacing ORM, learning about Slick, and scaffolding a Play application.
  • Chapter 7: Working with Integration and Web Services – Includes binding XML data in Scala, working with XML and JSON, and handling Play requests with XML and JSON.
  • Chapter 8: Essential Properties of Modern Applications – Asynchrony and Concurrency – Covers the pillars of concurrency, the async library – SIP-22-Async, and getting started with Akka.
  • Chapter 9: Building Reactive Web Applications – Includes describing reactive applications, handling streams reactively, experimenting with WebSockets and Iteratees in Play, learning from activator templates, and playing with Actor Room.
  • Chapter 10: Scala Goodies – Covers exploring MongoDb and scratching the surface of Big Data. Also introduces DSLs in Scala and introduces Scala.js, which compiles Scala to JavaScript.

Alexandre makes a strong pitch for using the Play framework in Scala web development. And he again speaks out for Scala at the conclusion of his well-structured book:

“The concise and expressive syntax of the Scala language should make your code not only more readable but also more maintainable for yourself and other developers,” he writes. “You don’t have to give up any of the libraries of the very large and mature Java ecosystem as all the APIs can be reused directly within Scala. Moreover, you benefit from many additional great Scala-specific libraries. Our recommendation is to take a piece of Java code from a domain you understand well, maybe because you wrote it in the first place one or several times before. Then, try to convert it to Scala code and refactor it to get rid of the boilerplate and to make it in a more functional style.”

Si Dunn

 

 

 

 

 

 

 

 

Testing in Scala – How to Test First, Then Develop Effective Code – #programming #bookreview

Testing in Scala
Daniel Hinojosa
(O’Reilly – paperback, Kindle)

In test-driven development (TDD), a software developer first creates some specific tests that are intended to fail and then writes code that is good enough to pass the tests. After that, the code is refactored, improved to make it better and easier to maintain and extend.

A key goal of TDD is to reduce the time and costs required to develop software.

Daniel Hinojosa’s well-written Testing in Scala effectively introduces test-driven development basics to Scala newcomers, as well as to developers already familiar with Scala or other programming languages, including Java, Ruby or Python.

The scala-lang.org website describes Scala as “a general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It smoothly integrates features of object-oriented and functional languages, enabling Java and other programmers to be more productive. Code sizes are typically reduced by a factor of two to three when compared to an equivalent Java application.”

Both TDD and Scala have been around for a number of years, but each is now gaining new traction with corporations, software companies, and individual developers seeking faster results at lower costs.

One big reason for Scala’s rising popularity, the Scala website proclaims, is Scala’s close ties to Java:

“Existing Java code and programmer skills are fully re-usable. Scala programs run on the Java VM, are byte code compatible with Java so you can make full use of existing Java libraries or existing application code. You can call Scala from Java and you can call Java from Scala; the integration is seamless. Moreover, you will be at home with familiar development tools, Eclipse, NetBeans or IntelliJ for example, all of which support Scala.”

The Spring Tool Suite also can support Scala using the Scala IDE for Eclipse, but there recently were a few “caveats” if you have the Java 7 JDK installed. Meanwhile, the Spring Scala project, announced last October, is underway.

The new book Testing in Scala is structured as six chapters that utilize different testing frameworks while an example application is tested and developed from scratch:

  1. Setup
  2. Structure and Configuration of Simple Build Tool (SBT)
  3. ScalaTest
  4. Spec2
  5. Mocking
  6. ScalaCheck

The book and its code examples, Hinojosa says, are “organized in a TDD fashion: test first, fail; test again, succeed maybe; test again, succeed, and so on.”

If you’ve never tried TDD, Testing in Scala may help you learn how to become a better, more efficient Scala developer.

It also can introduce you to a development style that you may be able to adapt quickly and effectively to other programming languages, as well.

Si Dunn