Packt: Data Unlocked

Just realised there’s been something I wanted to mention in my Unix Tutorial digest, but it appears a time limited deal so it won’t wait until the next edition. Please note I’m not paid for sharing these news and won’t be earning any affiliate comission from the links below.

Seems the month of September is pretty busy at Packt Publishing: every week there are lots of great books and educational videos made available for just $10 each.

If you’re considering a career in Data Science or just looking for a really interesting direction to learn something new – have a look at their data science section and perhaps grab a book or two!

or just looking for a really interesting direction to learn something new – have a look at their data science section and perhaps grab a book or two!

Really cool: there’s also a free book of the day at Packt, I think – just created my account to download one. This isn’t data science specific – so every day there is a different book.

That’s it for today! Enjoy your weekend!

Pro Puppet

Back in 2011, James Turnbull partnered with Jeffrey McCune to produce a marvelous and technically complete sequel to his first book – Pulling Strings with Puppet, The called that sequel book Pro Puppet.

I’m moving Unix books reviews from another website, want to keep them here on Unix Tutorial in the Unix Book Reviews section before I start publishing more recent reviews. This review is for the 1st edition of the Pro Puppet, but I know there’s been a 2nd edition of Pro Puppet written in 2013.

As it should be obvious from the title, this book is aimed at experienced users of the Puppet configuration management system, most likely seasoned systems administrators which have been managing systems with Puppet for a while but feel there is room for improvement.

The Pro Puppet book does not disappoint: not only is it an updated introductory material for those of you only discovering Puppet, but it is also a step-by-step, full source code examples kind of a guide to solving more complex issues facing a serious Puppet deployment – scalability, Puppet modules, stored configurations and MCollective are just some of the topics explained in plenty of detail.

Puppet basics revisited

The first few chapters are talking about fundamental features and basic scenarios of deploying Puppet management system within your environment. You’ll learn about the super-easy way of describing Puppet nodes and using nodes inheritance in Puppet server’s config.

Naturally, there are full-text examples of creating your own configurations using Puppet classes and modules. There’s even a quick intro in case you decide to write a function or two – these are Puppet functionality elements running on the server side.

Class inheritance is shown quite expertly – not just the basics of having separate modules for managing different services with Puppet, but the acctual class-based approach to stopping-starting services – essentially you can have the same class used for installing software (like DNS or NTP server), and then have the flexibility of using different classes for toggling the enabled/disabled state of the freshly installed service for different nodes

Cool stuff you can do with Puppet

Apparently, there is now a new provider specifically for auditing files – very similar to the File one, it only reports the compliance in terms of permissions and ownership for a given file. There is enough flexibility to get the audit reports exactly the way you need them.

Another really cool thing I’ve learned is that it’s possible and quite convenient to have classes require specific files to be in place before the class functionality is applied. I’ve been familiar with dependencies before but benefited from extra examples involving custom classes.

I always thought it would be great to use Puppet system for deploying Pupppet infrastructure itself – server and nodes. Turns out, this is entirely possible – the book includes example of a completely self-referential Puppet deployment.

Scaling Puppet environment

There are quite a few challenges you’ll be facing when your Puppet environment grows to be large enough. The Pro Puppet book gives you advices for most scenarios.

First things first – you have got to use multiple deployment environments, for instance test/dev/prod. From the Puppet server perspective, this will mean getting familiar with how you describe these environments in the puppet.conf file and also creating separate directories for your modules. The approach given in the book will help you cater for both different environments (multiple nodes belonging to production or test environment) and for properly managing stages of Puppet module developments.

The really good thing is that you’ll have plenty of examples of how to manage it all with a source control system (git).

When it comes to horizontally scaling the server aspects of Puppet, you’ll find a lot of instructions for fronting Puppet instance with Apache webserver via mod_rails (Passenger) module. Naturally, some of the most probable scenarios are described and provided with solutions, so if you’re stuck for some immediate help on making your crawling Puppet server run nice and fast, you’ll find some easy to follow steps.

What I enjoyed throughout the book is its attention to detail: it’s easy to see how some chapters address not just an isolated issue but the full-scale solution. In case of scaling, you’ll certainly appreciate hints on automating the data synchronization between Puppet backends – unless they reside on the same Unix environment, you’ll need some behind-the-scenes tricks to make sure all the backeds are in full sync – be it for the Puppet modules/files or SSL certficiates for the Puppet CA element.

Externalizing Puppet configs (storing nodes info in a database

As soon as your Puppet nodes.pp file grows past the first few hundred hosts, you’ll get this feeling that things could be greatly improved if you managed nodes list in a database of some sorts. Puppet server comes with such an abstraction planned from the very beginning, so it should be easy enough for you to externalize the nodes configuration. You can start off by using external text file or even a shell script, but the same approach and interface can be taken for Ruby or Perl, LDAP or MySQL.

Full text examples make it very easy to get started, you are ready to plug whole scripts into your infrastructure as even LDAP ldif files are provided for your convenience.

Exporting and storing configs of your Puppet managed node

You can configure Puppet master to use MySQL DB for storing all the configs related to managed nodes. In contrast with the nodes list externalization, this functionality will actually store metadata about your nodes – things like Facter facts – which normally reside locally on each node. Once configured, such a setup may prove to be very useful for syncing configurations between nodes.

A really cool example given in the book is the one for collecting public ssh keys and then distributing the in the updated known_hosts file form.

Puppet modules using Puppet Forge

If you end up using Puppet for managing your environments, it will be only a matter of time before you get curious enough to attempt a development of your own Puppet module. You are in luck: the Pro Puppet book will gve you all the info you need to get started. Apart from learning how to use Puppet Forge for downloading new Puppet modules for use in your environment, the are some steps for configuring multiple source control trunks to take care of all the stages of a typical module development lifecycle. And if you think your newly created module will make a good addition to Puppet Forge, there are instructions on how to upload your module.

Extending Puppet and Facter

If you want to get the most out of your Puppet deployment, you’ll probably appreciate the sections of the book talking about Puppet improvements like writing your own functions (remember, they are server side!) or custom Facter facts. There are always many different ways to make your changes or deploy custom Facts, and even if they are not shown in every single detail, there is certainly enough information to show you how things are done and help you get moving in the right direction with your Puppet infrastructure.

Using MCollective with Facter and Puppet

One of the reasons many people are buying the Pro Puppet book is the chapter talking about Marionette Collective – MCollective. It’s a message bus solution for rapid scanning of your Unix servers and for instant command execution. Instead of using SSH or similar mechanism for connecting to each client, MCollective relies on a message system like ActiveMQ or RabbitMQ (both freely available online) so that all the clients are listening to a queue and execute commands as soon as something relevant shows up.

The really powerful way to use MCollective is to leverage the power of custom facts of Facter. Essentially this means that you abstract from the common list of nodes and instead use specific facts about each node to compile the list you’re interested in. Instead of generating a list of hosts, you can have MCollective instantly compile a list base on the OS flavor or environment description fact, and target your query at that list.

Summary for the Pro Puppet book

Without a doubt, this is one of the most useful books you can find on Puppet configuration management today. Whether you’re after a high level introduction or enjoy all the possible technical details, you will find the Pro Puppet to be very relevant, highly educational and amazingly thorough about quite a number of Puppet related topics.

Puppet Configuration Management – links

Puppet Open Source – configuration management like you’ve never seen before
Puppet Forge – a growing collecting of Puppet configuration modules
Facter – cross-platform library for retrieving operating system facts
MCollective -rapid query and command orchestration tool
Apache – the de-facto webserver for Unix
Phusion Passenger (aka mod_rails) – Apache module for running Ruby and therefore scaling Puppet

Practical Programming: An Introduction to Computer Science Using Python

practical-programming-an-ntroduction-to-computer-science-using-python.jpg — Practical Programming – Python

Yet another book on Python programming, I read it back in 2014 and must say – I was really impressed, as it’s provided a whole set of skills that I could immediately apply. In this sense, the Pragmatic Programmers series served me well once again – this guide to Python is very practical. Seems there’s a more recent version of it so I’ll be reading it and updating this page soon!

Check Practical Programming: An Introduction to Computer Science Using Python out!

Getting started with Python

The layout of the book ensures that you are getting foundation topics first.

First, you are shown the very basics of Python: simple math and operating with variables. There is a whole chapter devoted to working with strings, their importance in any programming language is universally high.

Once you are more comfortable, you are introduced to modular approach for coding software with Python: importing functionality from standard modules and defining your own modules.

Thanks to the strict syntax requirements, writing a Python code is a thing of beauty. Functions and loops are self-explanatory in most cases, and writing code encourages you to organise logic into functions and to automatically document interfaces.

Before moving on to lists and flow controls, there’s a little bit about working with objects and methods. There’s even an introduction to testing and coding style – it’s very helpful to sort these things out while you’re still very new to Python.
Lists are an amazingly flexible feature in Python, and chapter on Lists is one of the most useful in the whole book. The true beauty of lists is that they are universal – for instance, you can open a file and then access its content as a Python list with the same common methods.

About halfway into the book, you are given enough info and shown enough examples to write quite complex yet elegant pieces of software.

More useful Python skills

Second half of the book covers more advanced topics and goes into greater level of details.

I especially appreciated the chapters on file processing (lots of good examples) and sets/dictionaries. Both chapters were very useful for the recent automation script I wrote – parsing a mix structure configuration file to produce stats based on certain fields.

The databases chapter, in my view, could have been longer and more detailed, but it still provided a very good introduction and showed all the necessary elements for you to effortlessly integrate your Python software with a database.

If you’re interested in object-oriented programming with Python, this book will give you a concise but relevant introduction. If you decide to learn how to add a graphical user interface to your Python software, there’s a chapter on that too. You are also given a great introduction to algorithms and basic approaches to searching and sorting routines.

In summary, I would say this book is an excellent way to learn basics and get started with Python. Each chapter contains lots of questions and exercises for you to answer and solve. All the concepts are provided with complete and easy to understand examples.

This book should be enough of a guidance to write your very first relatively complex software projects with Python. Will also be a great book if at any stage you feel like refreshing your Python skills.

By this book on Amazon: Practical Programming – Introduction to Python.

Book Review: Pulling Strings with Puppet

Puppet is an incredibly popular Ruby-based configuration management tool. Gaining its popularity with the open-source edition, it has become popular enough to also appear as a Puppet PE – Puppet Enterprise edition.

Update February 2019: I’m migrating book reviews from another website of mine, so this isn’t a recent review but still a great book I think you should read if you get a chance.

Pupping Strings with Puppet is now a classic: written 10 years ago, it’s still a really good introduction to configuraiton management with Puppet. James Turnbull has since written an even better book (Pro Puppet) and I’ll probably find a few even more recent introductions to Puppet, but if you have Pulling Strings with Puppet on your colleague’s desk I suggest you borrow it.

What is Puppet?

Puppet is a framework for automatic configuration management of your systems. Originally oriented an Unix systems and servers specifically, it can now be used to manage quite a range of Unix-like systems as well as Windows environments.

Both open source and enterprise (paid) edition allow you to greatly optimize and automate the process of deploying and maintaining configurations of your environments via a sciptable core that can be configured to suit all of your needs.

Puppet has 400+ configurations already tested and available online on Puppet Forge – this is basically a collection of downloadable modules for automating all sorts of systems administration tasks.

The Pulling Strings with Puppet book gives a very good introduction into common tasks of configuration management and explains the multiple layers of Puppet-based solution very well: there’s a declarative language used for defining classes and modules, then there’s a transaction layer for creating and deploying updates and, finally, there is a resource abstraction layer – this piece of magic that makes it possible to use the same configuration stanza for deploying the same change to vastly different Unix-like distros.

Deploying a Puppet infrastructure

Puppet servers are called puppet masters. These are the servers which store all configurations along with list of Puppet clients (sometimes called Puppet nodes). Clients communicate back to Puppet servers at regular intervals using RESTful API over HTTPs, checking in and downloading configurational updates if necessary.

Self-signed certificates are a requirement for all the clients and the way these are managed on the Puppet master side ensures that no unathorized access can be gained easily enough. You will find a good enough description of how this works in practice so it will be very easy to get started if you’re new to Puppet.

The book goes into lots of detail when it comes to explaining the pre-requisites (Ruby, Facter) and installation process (compiling from srouces or deploying from packages). The easiest way to deploy a Puppet client or server is probably to get Ruby installed and then deploy a Puppet Ruby gem.

I really liked the table with descriptions of all the common Puppet management commands – it’s a neat little reference giving you a good idea of what’s possible and where to look for more information once you become comfortable enough.

Using Puppet for managing configurations

It all starts with the [main] configuration namespace, and before you know it the book effortlessly takes you through configuring resources and attributes and deploying classes and modules. Puppet relies heavily on the Facter framework which is an asbtract way of documenting, configuring and presenting useful configuration elements about your environments like OS version or name of your Linux distro. The book gives a very useful description of the approach to using facts and configurative definitions.

A good few pages are used to make sure you will have the full understanding of what variables are possible in Puppet and how variable scoping is probably different from most of the scripting and programming languages you already know.

You will learn about virtual resources and how they need to be realized before the changes are actually applied, and finally will be introduced to various default types available in a Puppet installation (cron/exec/file/filebucket/group/package/yumrepo and quite a few others).

Sample Puppet configuration management environment

Perhaps the most valuable element of the whole book is a complete description of nodes, classes, users and groups needed to deploy and support a typical LAMP environment. In addition to sample configurations for managing nodes and users, you are given full code for modules managing MySQL, Apache and Postfix.

The language used for composing classes and modules in Puppet is pretty straightforward, but the complete examples will help you not only learn the syntax but also pick up some of the best practices when it comes to starting even the simplest of Linux configuration management environments.

For the most technically curious minds there are hints for deploying custom set of Facter facts and even if that doesn’t impress enough you are given instructions for creating your own type for the most flexible resource/configuration management.

Advanced Puppet usage

The last few chapters in the Pulling Strings with Puppet book show you some more advanced challenges you’ll face if you chose to explore Puppet in depth.

In addition to getting plenty of hints for performance optimisation and scaling out of your Puppet infrastructure, you will find information about migrating nodes description information into an external storage (scripts or indeed proper datastores like LDAP).

For scaling, you’ll get basic information about Mongrel (apparently it’s a Ruby friendly webserver you can use instead of Apache) but also full configuration examlpes for configuring Apache as a proxy and a load balancing solution pointing to multiple Puppet instances.

Even if you are well versed in mod_proxy or mod_ajp, reading the chapter about Puppet scalability with Mongrel and Apache proxy will be a time well spent – it doesn’t just show you what steps are needed for the desired configuration but gives you explanations of the typical challenges you’ll be trying to solve.

Summary for the Pulling Strings with Puppet book

I had thoroughly enjoyed this book the first time I had read it back in 2010 and I can still recommend it after reading it again in 2012. Majority of the topics in the book are still quite relevant and easily applicable even today, which is probably a testament to both the talents of the book author and the great planning/roadmapping of the founding geniuses behind the Puppet framework.

Book Review: Learning MySQL

Whether you’re completely new to MySQL databases or someone who knows it well, Learning MySQL will really help you put structure around your knowledge while teaching you a number of nice to have things about MySQL. Covering a range of topics, this book will help you understand more about MySQL installation and performance tuning, provide detailed instructions on automating simple backups and restores, all while keeping you concscious of keeping your installation or web application secure.

Installing MySQL on Linux, Windows and OSX

The book has an overwhelming amount of detail describing various ways of installing MySQL on a Unix-like system of your choice. For starters, multiple Linux distros are covered with exact commands to be used. Whether you’re a Debian fan or a RedHat like OS user – it will have detailed instructions on how to update your repos and get MySQL installed.

Windows install instructions are also provided, and even if you are a proud owner of a Mac system you’ll find commands to install MySQL quickly and easily.

There is a description of the contents of MySQL directory and even description of how file storage differs for MyISAM and InnoDB storage engines.

A description of various installation screnarios and post-install actions is very useful because it explains that installing from sources is slightly different than deploying MySQL from packages. Those of you who will use MySQL long enough to warrant an upgrade will find instructions on updating the database software to its most up-to-date version.

The same section of the book has a good reference on MySQL config file options – all the parameters are provided with a brief set of instructions on how and when they should be applied.

There is definitely enough information for you to get you started – that is, to have MySQL installed and ready to be used.

Basic and advanced querying using SQL

There is enough of a database theory given in the book to provide a solid introuduction into basic database design principles. Good examples are provided on how NOT to create tables and storage data, explaining how each scenario can be improved with just a few tweaks. Entities and relationships are covered in good detail with complex topics like JOINs well covered with good examples.

There are many examples for data manipulation – inserting and updating date and creating/updating tables.

I really enjoyed instructions on using CSV files – for both import and export. It’s a common enough task frequently requested by users so knowing best ways to use it is certainly a worthy skill.

Finally, more advanced topics are covered in a separate chapter, this is where data aggregation, advanced joins and nested queries are explained. User variables get mentioned and basic idea behind transactions and locking is given.

User management and security in MySQL

I quited liked this section of the book, mainly because it not only explained how privileges work but also hinted at a privileges-related security improvements you can easily make. I also learned a few new commands which make life a lot easier cause you run them instead of logging into MySQL with a default client and typing SQL queries.

You will learn commands for showing privileges and explore various ways of controlling access given to a specific user. Passwords management is explained very well and shown with many examples.

An overview of the default MySQL installation completes this part of the book with a list of security aspects that you should particularly be mindful of.

MySQL backups and recovery

If you are looking for a quick guide into backing up and restoring your MySQL databases quickly, you’re covered: there’s a whole chapter in the book talking about exactly that. Apart from simple use of mysqldump and a few tips on command line parameters, this section provides a good deal of information about binary logs and commands for their management and analysis. Of course, instructions on using binary logs for point-in-time recovery are provided.

I was impressed with the depth of this section in the book, because (similar to the installation chapter) it provided instructions for a complete solution rather just the element of initiating a backup or a restore. You’ll find instructions in scheduling your backups and restores, and scenarios explain not only typical situations but trickier ones as well – like checking/repairing corrupted tables or even restoring (actually, re-creating) corrupted GRANT tables (the privileges management ones).

There is no mention of database replication, but that’s probably to keep things simple. Those of you familiar with replication will know that although it solves quite a few requirements for backup and restore, it’s really more a high availability solution than a proper backup. If you get your dataset corrupted due to an error in your software code, replication won’t be of much use as it will simply replicate your changes to all of your MySQL servers.

Tuning MySQL for maximum performance

Another immediately applicable section of the book. Starting with a review of the configurable and tunable parameters of your MySQL options file, you will be shown techniques for analysing and improving performance of your database solution. Both instructions for improving the performance of your MySQL server and the efficiency of your SQL code are provided.

I quite liked the InnoDB section which lists main features and explains when and why you may want to use this database storage engine.

There is also a good introduction into working with slow queries, particularly the part of defining of what exactly should be treated as a slow query.

Throughout the book there are mentions of the EXPLAIN command so you will get comfortable enough with assessing the query-level performance of your database.

Using MySQL in software development

Coming from PHP background of the LAMP fame, I quite enjoyed reading simple and easy to follow instructions on getting started – it certainly added structure to my knowledge. There are mentions of both MySQL and MySQLi interfaces and even explanation of their main differences.

Perl is introduced with DBI framework in mind, and provided examples cover the complete set of database management operations, including more complex topics like answer sets.

Session management and even implementation of DB-backed user logins are explained along with lots of smaller things which you would likely find useful in your web app development.

Full source code of a web app is given at the end of the book so if your style is to skip to that part and just tinker with the code and LAMP setup until you get it working – this section is for you.

Summary for the Learning MySQL book

Learning MySQL is a great introduction for the highly popular MySQL database server. If you have some knowledge and simply want to add structure to it – you will like the book. If you are quite new to MySQL – you will like it a lot.

If you’re going to buy this book, please use this page:

Book Review: Linux iptables Pocket Reference

I’ve just read a really useful book on iptables: Linux iptables Pocket Reference.

It’s a great reference book which is quite short but packed with more details than you’ll ever want to know.

iptables is a great way to manage all your needs running a gateway server: proxying (transparent proxying), packet forwarding or one of the network address translation (NAT) schemes – it’s all possible via straight-forward and easy to remember interface.

The Linux iptables Pocket Reference is a really great little book, and although it’s been written a good few years ago, most of the explanations still apply.

Introduction to iptables

I really liked the introductory part of the book, it explains (and shows with diagrams) what IP tables and chains are and how kernel processes them based on the iptables configuration.
Especially useful are the workflow illustrations, it is easy enough to understand how iptables work based on the functionality you’re after – for example with NAT the workflow will be different, although the names of the chains stay the same.

Common hook points in iptables

Common hook points (INPUT, OUTPUT, FORWARD, PREROUTING and POSTROUTING) are explained in a number of tables, so that eventually it becomes obvious how and why these are named and what should be used for your specific scenario.

This book expects you to have a rather good knowledge of IP networking. If you can’t tell a difference between TCP and UDP, or if you’re not familiar with their packet structure – many options will not make much sense until you fill these gaps using some other books or online resources.

Like any pocket reference, the Linux iptables Pocket Reference will give you the right kind of information if you know where to look and also know what you’re doing. If you don’t – just skip the most technical sections until you have a very good reason to revisit.

iptables command line tools

All the necessary command line tools are mentioned and explained, this means you will know what command to use when it’s time for you to review your existing iptables setup and to make some changes to it.

I had a relatively good idea about how iptables work, but thanks to this guide I now have a better understanding – my iptables debugging skills have definitely improved.

In short, I will recommend this book if you have heard about iptables but haven’t really used it much – the introduction part alone is worth the small price of the book.

If you like this review, please buy the book using this page: