Cloudera Administration Handbook
The explosive growth and use of Big Data in business, government, science and other arenas has fueled a strong demand for new Hadoop administrators. The administrators’ key duty is to set up and maintain Hadoop clusters that help process and analyze massive amounts of information.
New Hadoop administrators and those looking to join their ranks especially will want to give good consideration to The Cloudera Administration Handbook by Rohit Menon. This is a well-organized, well-written and solidly illustrated guide to building and maintaining large Apache Hadoop clusters using Cloudera Manager and CDH5.
The author has an extensive computer science background and is a Cloudera Certified Apache Hadoop Developer. He notes that “Cloudera Inc., is a Palo Alto-based American enterprise software company that provides Apache Hadoop-based software, support and services, and training to data-driven enterprises. It is often referred to as the commercial Hadoop company.”
CDH, Menon points out, is the easy shorthand name for a rather awkward software title: “Cloudera’s Distribution Including Apache Hadoop.” CDH is “an enterprise-level distribution including Apache Hadoop and several components of its ecosystem such as Apache Hive, Apache Avro, HBase, and many more. CDH is 100 percent open source,” Menon writes.
The Cloudera Manager, meanwhile, “is a web-browser-based administration tool to manage Apache Hadoop clusters. It is the centralized command center to operate the entire cluster from a single interface. Using Cloudera Manager, the administrator gets visibility for each and every component in the cluster.”
The Cloudera Manager is not explored until nearly halfway into the book, and some may wish it had been explained sooner, since they may be trying to learn it on day one of their new job. However, Menon wants readers first to become familiar with “all the steps and operations needed to set up a cluster via the command line” at a terminal. And these are, of course, important considerations to becoming an effective, knowledgeable and versatile Hadoop Administrator. (You may not always have access to Cloudera Manager while setting up or troubleshooting a cluster.)
The book’s nine chapters show its well-focused range:
- Chapter 1: Getting Started with Apache Hadoop
- Chapter 2: HDFS and MapReduce
- Chapter 3: Cloudera’s Distribution Including Apache Hadoop
- Chapter 4: Exploring HDFS Federation and Its High Availability
- Chapter 5: Using Cloudera Manager
- Chapter 6: Implementing Security Using Kerberos
- Chapter 7: Managing an Apache Hadoop Cluster
- Chapter 8: Cluster Monitoring Using Events and Alerts
- Chapter 9: Configuring Backups
You will have to bring some hardware and software experience and skills to the table, of course. Apache Hadoop primarily is run on Linux. “So having good Linux skills such as monitoring, troubleshooting, configuration, and security is a must” for a Hadoop administrator, Menon points out. Another requirement is being able to work comfortably with the Java Virtual Machine (JVM) and understand Java exceptions.
But those skills and his Cloudera Administration Handbook can take you from “the very basics of Hadoop” to taking up “the responsibilities of a Hadoop administrator and…managing huge Hadoop clusters.”
— Si Dunn