BIG DATA: A well-written look at principles & best practices of scalable real-time data systems – #bookreview

 

 

Big Data

Principles and best practices of scalable real-time data systems

Nathan Marz, with James Warren

Manning – paperback

Get this book, whether you are new to working with Big Data or now an old hand at dealing with Big Data’s seemingly never-ending (and steadily expanding) complexities.

You may not agree with all that the authors offer or contend in this well-written “theory” text. But Nathan Marz’s Lambda Architecture is well worth serious consideration, especially if you are now trying to come up with more reliable and more efficient approaches to processing and mining Big Data. The writers’ explanations of some of the power, problems, and possibilities of Big Data are among the clearest and best I have read.

“More than 30,000 gigabytes of data are generated every second, and the rate of data creation is only accelerating,” Marz and Warren point out.

Thus, previous “solutions” for working with Big Data are now getting overwhelmed, not only by the sheer volume of information pouring in but by greater system complexities and failures of overworked hardware that now plague many outmoded systems.

The authors have structured their book to show “how to approach building a solution to any Big Data problem. The principles you’ll learn hold true regardless of the tooling in the current landscape, and you can use these principles to rigorously choose what tools are appropriate for your application.” In other words, they write, you will “learn how to fish, not just how to use a particular fishing rod.”

Marz’s Lambda Architecture also is at the heart of Big Data, the book. It is, the two authors explain, “an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to Big Data systems that can be built and run by a small team.”

The Lambda Architecture has three layers: the batch layer, the serving layer, and the speed layer.

Not surprisingly, the book likewise is divided into three parts, each focusing on one of the layers:

  • In Part 1, chapters 4 through 9 deal with various aspects of the batch layer, such as building a batch layer from end to end and implementing an example batch layer.
  • Part 2 has two chapters that zero in on the serving layer. “The serving layer consists of databases that index and serve the results of the batch layer,” the writers explain. “Part 2 is short because databases that don’t require random writes are extraordinarily simple.”
  • In Part 3, chapters 12 through 17 explore and explain the Lambda Architecture’s speed layer, which “compensates for the high latency of the batch layer to enable up-to-date results for queries.”

Marz and Warren contend that “[t]he benefits of data systems built using the Lambda Architecture go beyond just scaling. Because your system will be able to handle much larger amounts of data, you’ll be able to collect even more data and get more value out of it. Increasing the amount and types of data you store will lead to more opportunities to mine your data, produce analytics, and build new applications.”

This book requires no previous experience with large-scale data analysis, nor with NoSQL tools. However, it helps to be somewhat familiar with traditional databases. Nathan Marz is the creator of Apache Storm and originator of the Lambda Architecture. James Warren is an analytics architect with a background in machine learning and scientific computing.

If you think the Big Data world already is too much with us, just stick around a while. Soon, it may involve almost every aspect of our lives.

Si Dunn

Mule in Action, 2nd Edition – Want to be an integration developer? Here’s a good start – #bookreview

 

Mule in Action, Second Edition

David Dossot, John D’Emic, Victor Romero

(Manning – paperback)

 

An enterprise service bus (ESB) can help you link together many different types of platforms and applications–old and new–and keep them communicating and passing data between each other.

“Mule,” this book’s authors note, “is a lightweight, event-driven enterprise service bus and an integration platform and broker.  As such, it resembles more a rich and diverse toolbox than a shrink-wrapped application.”

Mule in Action, Second Edition, is a comprehensive and generally well-written overview of Mule 3 and how to put its open-source building blocks together to create integration solutions and develop them with Mule. The book provides very good focus on sending, receiving, routing, and transforming data, key aspects of an ESB.

More attention, however, could have been paid to clarity and detail in Chapter 1, the all-important chapter that helps Mule newcomers get started and enthused.

This second edition is a recent update of the 2009 first edition. Unfortunately, the Mule screens have changed a bit since the book’s screen shots were created for the new edition. Therefore, some of the how-to instructions and screen images do not match what the user now sees. This gets particularly confusing while trying to learn how to configure a JMS outbound endpoint for the first time, using Mule Studio’s graphical editor. The instructions seem insufficient, and the mismatch of screens can leave a beginner unsure how to proceed.

The same goes for configuring the message setting in the Logger element. The text instructs: “You’ll set the message attribute to print a String followed by the payload of the message, using the Mule Expression Language.” But no example is given. Fortunately, a reviewer on Amazon has posted a correct procedure. In his view, the message attribute should be: We received a message: #[message.payload]  –without any quote marks around it. (It works.)

Of course, this book is not really aimed at beginners–it’s for developers, architects, and managers (even though there will be Mule “beginners” in those ranks). Fortunately, it soon moves away from relying solely on Mule Studio’s graphical editor. The book’s examples, as the authors note, “mostly focus on the XML configurations of flows.” Thus, there are many XML code examples to work with, plus occasional screen shots of the flows as they appear in Mule Studio. And you can use other IDEs to work with the XML, if you prefer.

Indeed, the authors note, “no functionality in the CE version of Mule is dependent on Mule Studio.”

Overall, this is a very good book, and it definitely covers a lot of ground, from “discovering” Mule to becoming a Mule developer of integration applications, and using certain tools (such as business process management systems) to augment the applications you develop. I just wish a little more how-to clarity had been delivered in Chapter 1.

Si Dunn

Computing with Quantum Cats – Strange and exciting times are ahead – #science #bookreview

Computing with Quantum Cats

From Colossus to Qubits

John Gribbin

(Prometheus Books – hardcover, Kindle)

John Gribbin’s new book, Computing with Quantum Cats, is an entertaining, informative and definitely eye-opening look at quantum computing’s recent progress, as well as its exciting near-future possibilities.

The “conventional” (a.k.a. “classical”) computers currently on our desktops, in our briefcases, and in our pockets and purses keep getting smaller and faster, yet laden with more features, memory and processing power. “But,” cautions John Gribbin, a veteran science writer, “the process cannot go on indefinitely; there are limits to how powerful, fast and cheap a ‘classical’ computer can be.”CompwithQuantumCats

Already we are cramming a billion transistors into tiny chips and moving much of our data and programs out to the “cloud,” because we are running out of both physical space and memory space on our shrunken devices.

So what’s next, if the end of Moore’s Law is here?

Gribbin predicts that “within a decade the computer world will be turned upside down”–by quantum computers that  “will enable physicists to come to grips with the nature of quantum reality, where communication can occur faster than the speed of light, teleportation is possible, and particles can be in two places at once. The implications are as yet unknowable,” he concedes, “but it is fair to say that the quantum computer represents an advance as far beyond the conventional computer as the conventional computer is beyond the abacus.”

For now, quantum computers are functioning  at a level somewhat equivalent to the early classical computers that, nearly 70 years ago, could perform only rudimentary calculations, yet filled large rooms and required 25 kilowatts or more of electrical power to light up hundreds or thousands of  vacuum tubes. It may be decades or perhaps just a few years until quantum desktop PCs or quantum smartphones become a reality.

What makes quantum computing such a big deal? 

Classical computers, Gribbin writes, “store and manipulate information consisting of “binary digits, or bits. These are like ordinary switches that can be in one of two positions, on or off, up or down. The state of a switch is represented by the numbers 0 and 1, and all the activity of a computer involves changing the settings on those switches in an appropriate way.”

He notes that two “classical” bits can represent any of the four numbers from 0 to 3 (00,01, 10, and 11). But once you start using quantum bits–qubits (pronounced “cubits”)–the scale of possibilities quickly becomes astronomical.

The “quantum switches can be in both states, on and off, at the same time, like Schrodinger’s ‘dead and alive’ cat. In other words, they can store 0 and 1 simultaneously.” Or both can be off or both can be on, creating three possibilities.

“Looking further into the future,” Gribbin continues, “a quantum computer based on a 30-qubit processor would have the equivalent computing power of a conventional machine running at 10 teraflops (trillions of floating-point operations per second)–ten thousand times faster than conventional desktop computers today….” 

His new book presents an enlightening, engrossing blend of facts and speculations about quantum computing, as well as short biographical sketches of key people who have helped quantum computing become a reality.  These range from Alan Turing and John Von Neumann to more recent researchers such as Nobel Prize recipients Tony Leggett and Brian Josephson, to name a few. Their key research efforts also are explored.

The author notes that “the enormous challenge remains of constructing a quantum computer on a scale large enough to beat classical computers at a range of tasks….” He also observes that “many competing approaches are being tried out in an attempt to find the one that works on the scale required.” And he concedes that in a research field now changing very fast, “I’ve no idea what will seem the best bet by the time you read these words, so I shall simply set out a selection of the various [techniques] to give you a flavor of what is going.”

John Gribbin’s other books include In Search of Schrodinger’s Cat, Erwin Schrodinger and the Quantum Revolution, and In Search of the Multiverse.

The need to break enemy codes in World War II gave us classical computers, Gribbin points out. In a curious twist, it may be the need to create truly unbreakable codes that will help usher in quantum computing as a practical reality.

Si Dunn

Hello World! – Updated book brings new fun to learning Python – #programming #bookreview

Sande--Hello World!, 2e

Hello World!

Computer Programming for Kids and Other Beginners (2nd Edition)

Warren Sande and Carter Sande

(Manning, paperback)

Many politicians, educators and pundits keep arguing over whether the United States should offer computer programming classes to all students in kindergarten through 12th grade.

Others say all of us, including senior citizens, should do some coding to help us (1) maintain mental sharpness and good computer skills and (2) ward off late-in-life memory problems such as dementia.

These contentious debates are a long way from being settled, of course. Meanwhile, questions also rage over which programming languages we should learn. There are, after all, many dozens now in use.

Experienced software developers often state that Python is a good choice for youngsters ready to tackle their first “real” language, particularly once they have spent some time mastering Scratch, which MIT describes as “a programming language and an online community where children can program and share interactive media such as stories, games, and animation with people from all over the world.”

Manning Publications recently has brought out an updated second edition of its popular Python how-to book, Hello World!, written by Warren Sande and his son Carter Sande.

Some parents want to hand a programming book over to a child and let them learn at their own pace. And that can be done, in many cases, with Hello World! (It is written at a 12-year-old’s reading level, according to Manning). But other parents want to share the learning experience and be mentors, too, and the Sande book can be used effectively that way, as well. In either case, many children younger than 12 also should be able to learn from it.

Be sure to note the “Other Beginners” in the book’s subtitle. I have taken classes in Python, and I have worked my way through a couple of  Python programming books. Hello World! is proving a useful addition to my library, too, because it gives some clear explanations and examples for  many different concepts, such as using variable nested loops, importing portions of modules, or providing collision detection in a game, to name just a few.

One big question quickly pops up when someone decides to learn to program in Python: Python 2 or Python 3?

Several years ago, the language was updated from version 2 to version 3, but many users of version 2 chose to not upgrade. So now we recently have had Python 2.7.6 and Python 3.3.3 (with Python 3.4 coming soon). The two versions have some similarities, but they also have essential differences. Bottom line: They do not play well together.

In this second edition of Hello World!, the authors have elected to stick with Python 2 in their text and code examples. But they have added notes to help make the code work for students using Python 3. Likewise, they have added an appendix explaining some major differences between Python 2 and Python 3.

Other significant changes include using color in illustrations and code listings and, in the chapter on GUI programming, using PyQT, rather than the no-longer-supported PythonCard. And the updated book now spans more than 460 pages, including its index.

With Hello World!, even the most eager student who is a very fast reader can be kept focused and busy for many hours while learning how to program in Python.

Si Dunn

Testing Cloud Services – How to Test SaaS, PaaS and IaaS – #cloud #bookreview

Testing Cloud Services

How to Test SaaS, PaaS & IaaS
Kees Blokland, Jeroen Mengerink and Martin Pol
(Rocky Nook – paperback, Kindle)

Cloud computing now affects almost all of us, at least indirectly. But some of us have to deal directly with one or more “clouds” on a regular basis. We select or implement particular cloud services for our employers or for our own businesses. Or, we have to maintain those services and fix any problems encountered by co-workers or employees.

Testing Cloud Services, written by three well-experienced test specialists, emphasizes that the time to begin testing SaaS (Software as a Service), PaaS (Platform as a Service), or IaaS (Infrastructure as a Service) is not after you have made your selections. You should begin testing them during the selection and installation processes and keep testing them regularly once they are live.

“Cloud computing not only poses challenges for testing, it also provides interesting new testing options,” the authors note. “For example, cloud computing can be used for test environments or test tools. It can also mean that all test activities and the test organization as a whole are brought to the cloud. This will be called Testing as a Service.”

Their well-written, six-chapter book deals with numerous topics related to using and testing cloud services, including the role of the test manager, identifying the risks of cloud computing and testing those risks, and picking the right test measures for the chosen services.

In Chapter 5, a significant portion of the book is devoted both to test measures and test management. “Testing SaaS is very different from testing PaaS or IaaS,” the writers state. Much of the lengthy chapter focuses on SaaS, but it also addresses PaaS and IaaS, and the authors describe the following test measures:

  • Testing during selection of cloud services
  • Testing performance
  • Testing security
  • Testing for manageability
  • Testing availability/continuity
  • Testing functionality
  • Testing migrations
  • Testing due to legislation and regulations
  • Testing in production

Particularly if you are a newcomer to choosing, testing, and maintaining cloud services, this book can be an informative and helpful how-to guide.

Si Dunn

The Practice of Network Security Monitoring – You’re compromised, so deal with it. #security #bookreview

The Practice of Network Security Monitoring

Understanding Incident Detection and Response
Richard Bejtlich
(No Starch Press – paperback, Kindle)

Security expert Richard Bejtlich’s focus in his new book is not on “the planning and defense phases of the security cycle.” Instead, he emphasizes how to handle “systems that are already compromised or that are on the verge of being compromised.”

His well-organized, well-written, 341-page book aims to help you “start detecting and responding to digital intrusions using network-centric operations, tools, and techniques.”

Bejtlich has long emphasized a “detection-centered philosophy” built around a straightforward central tenet: “Prevention eventually fails.” No matter how many digital walls and moats you build around your network, someone will find a way to tunnel in, parachute in, or sneak in via an unsuspecting employee’s $9.95 thumb drive.

“It’s becoming smarter,” he writes, “to operate as though your enterprise is always compromised. Incident response is no longer an infrequent, ad-hoc affair. Rather, incident response should be a continuous business process with defined metrics and objectives.”

You may recognize some of Bejtlich’s previous books on network security monitoring (NSM): The Tao of Network Security Monitoring; Extrusion Detection; and Real Digital Forensics.

The Practice of Network Security Monitoring is tailored toward two key audiences: (1) security professionals who have little or no experience with NSM; and (2) “more senior incident handlers, architects, and engineers who need to teach NSM to managers, junior analysts, or others who may be technically less adept.”

Readers, he add, should understand “the basic use of the Linux and Windows operating systems, TCP/IP networking, and the essentials of network attack and defense.”

The examples in Bejtlich’s book rely on open source and vendor-neutral tools, primarily from Doug Burks’ Security Onion (SO) distribution.

The 13-chapter book is organized into four parts:

  • Part I: Getting Started – Introduces NSM and sensor placement issues.
  • Part II: Security Onion Deployment – Shows how to install and configure SO.
  • Part III: Tools – Examines the “key software shipped with SO and how to use these applications.”
  • Part IV: NSM in Action – Looks at “how to use NSM processes and data to detect and respond to intrusions.”

Following the technical chapters, Bejtlich offers some concluding thoughts on network security management, cloud computing, and establishing an effective workflow for NSM. “NSM isn’t just about tools,” he writes. “NSM is an operation, and that concept implies workflow, metrics, and collaboration. A workflow establishes  a series of steps that an analyst follows to perform the detection and response mission. Metrics, like the classification and count of incidents and time elapsed from incident detection to containment, measure the effectiveness of the workflow. Collaboration enables analysts to work smarter and faster.”

He also observes: “It is possible to defeat adversaries if we stop them before they accomplish their mission. As it has been since the early 1990s, NSM will continue to be a powerful, cost-effective way to counter intruders.”

Si Dunn

Puppet 3 Beginner’s Guide – Automate configuration management & become a better system admin – #programming #bookreview

Puppet 3 Beginner’s Guide
John Arundel
(Packt Publishing – paperback, Kindle)

If you administer a small network built around just a few servers, you may still be doing at least some of the configuration management by hand. You literally move from machine to machine, manually entering updates, changes, or fixes. And your small network may be running several different brands–and vintages–of hardware and software, which complicates the update and repair process.

However, infrastructure consultant John Arundel warns, once you get “[b]eyond ten or so servers, there simply isn’t a choice. You can’t manage an infrastructure like this by hand. If you’re using a cloud computing architecture, where servers are created and destroyed minute-by-minute in response to changing demand, the artisan approach to server crafting just won’t work.”

In his new book, Puppet 3 Beginner’s Guide, Arundel emphasizes: “Manual configuration management is tedious and repetitive, it’s error-prone, and it doesn’t scale well. Puppet is a tool for automating this process.”

Among “UNIX-like systems,” there are at least three major configuration management (CM) packages, including Puppet. The others are Chef and CFEngine, plus a few more competitors. Arundel calls them “all great solutions to the CM problem…it’s not very important which one you choose as long as you choose one.” But he hopes, of course, you will favor Puppet and his well-written how-to guide.

Puppet 3 Beginner’s Guide is structured to help system administrators “start from scratch…and learn how to fully utilize Puppet through simple, practical examples,” he writes.

He places important emphasis on the rapidly closing “divide between ‘devs,’ who wrangle code, and ‘ops,’ who wrangle configurations. Traditionally, the skills sets of the two groups haven’t overlapped much,” he notes. “It was common until recently for system administrators not to write complex programs, and for developers to have little or no experience of building and managing servers.”

Today, system admins are “facing the challenge of scaling systems to enormous size for the web, [and] have had to get smart about programming and automation.” Meanwhile, “[d]evelopers, who now often build applications, services, and businesses by themselves, couldn’t do what they do without knowing how to set up and fix servers,” he says.

Therefore, “[t]he term ‘devops’ has begun to be used to describe the growing overlap between these skill sets…Devops write code, herd servers, build apps, scale systems, analyze outages, and fix bugs. With the advent of CM systems, devs and ops are now all just people who work with code.”

Arundel’s 184-page Puppet 3 Beginner’s Guide offers 10 chapters smoothly structured with headings, short paragraphs, code examples, and other illustrations. He has generated his code examples using the Ubuntu 12.04 LTS “Precise” distribution of Linux. But he explains how to load the software using “Red Hat Linux, CentOS, or another Linux distribution that uses the Yum package system,” as well.

The chapters are:

  • Chapter 1, Introduction to Puppet
  • Chapter 2, First Steps with Puppet
  • Chapter 3, Packages, Files, and Services
  • Chapter 4, Managing Puppet with Git
  • Chapter 5, Managing Users
  • Chapter 6, Tasks and Templates
  • Chapter 7, Definitions and Classes
  • Chapter 8, Expressions and Logic
  • Chapter 9, Reporting and Troubleshooting
  • Chapter 10, Moving on Up

That final chapter covers a range of topics, including how to make Puppet code “more elegant, more readable, and more maintainable.” The author offers “links and suggestions for further reading.” And he describes several projects to help you “improve your skills and your infrastructure at the same time.” Those projects, he says, “provide a series of stepping-stones from your first use of Puppet to a completely automated environment.”

Besides Linux, Puppet will run on other several platforms, including Windows and Macs. But there is almost no help for those in Arundel’s book. Essentially, it’s Linux or bust. For other operating systems, you will need to refer to the Puppet Labs website.

It can take a bit of work to get Puppet installed and properly configured. But once you have Puppet running, the Puppet 3 Beginner’s Guide can help you become both a proficient Puppet user and a more efficient, knowledgeable, and versatile system administrator.

Si Dunn