Book Review: Text Processing in Python

text-processing-in-python-1st-edition-book.jpg
This is a review from 2012 of the book David Mertz: Text Processing in Python.
It’s written for a specific niche and expects an intermediate-level of Python knowledge. Still, Text Processing in Python contains quite a few introductory elements that make reading the book worthwhile even for Python beginners like myself.

Basic Python: techniques, patterns and standard modules

The first 100 pages of the book provide a great reference on Python basics. After providing a number of common problems and showing the elegance of fixing them the Python way, you are given an overview of standard modules for doing pretty much anything and everything using Python.

As a recent enough convert to MacOS, I was really impressed to learn about so many Mac specific Python modules. Really cool!

Operating strings in Python

The second part of the book shows you all sorts of string operations in Python. There are lots of examples and a number of exercises at the end of this part.
This is where the book really shines with a number of practical problems and solutions that are provided. This ensures that you get both the useful pointers for API and the examples to follow.
Just like in other sections of Text Processing in Python, this section has quite a bit about standard modules that help you to work with strings. This is a great way to learn things because flicking through 3-5 pages of function names and short descriptions gives you a very strong foundation of what’s immediately possible.

Master your Regular Expressions!

A whole part of the book talks about regular expressions. I really liked that the introductory section covered three grades of complexity – basic regular expressions, intermediate and advanced ones. This shows a natural increase in complexity, allowing you to decide what level is comfortable for you.
There are lots of common tasks described and with fully explained examples provided, so this is again a very useful addition.
There’s a good explanation and comparison of various modules for working with regular expressions, so if you have some regexp background coming from another programming language, you’ll probably find something common. All the modules are split into groups to show you examples of versions and optimisations work, introduce you to simple pattern matching (fnmatch) and finally allow you to use fully-fledged regular expression modules (pre, re).

Parsers and State Machines

Parses and state machines are a great topic. This book gives a good overview of what’s involved and guides you almost step-by-step to building your own parses for different kinds of input. Even as an academic exercise, learning about grammar and parsing states is going to greatly help in your understanding of how typical parsing is done.

Using Python to access Internet resources

Finally, there’s a section on internet which talks about sending email and working with IMAP/POP3 mailboxes and also shows you different ways to access online documents. As in other parts, lots of synopses of internet related Python modules and lots of examples. Introduction to schemas and XML is also a useful thing if you’re going to continue learning in this direction.

Full-text examples of using Python

Perhaps one of the best sections of the book for me was Appendix A, which gives a great short review of Python. All the most important things that one should know about this programming language are there, covered in great detail over just 50 pages or so.
I also enjoyed reading Appendix B which gives a primer on text compression – it was very interesting to refresh my knowledge of various compression algorithms and then to see how a task of text compression may be approached using Python.
Overall, I would say this is a good book. It’s rather specialised so not everyone should buy it (certainly, not every sysadmin who’s trying to learn Python), but I have found quite a few useful and readily applicable examples there.

See Also