A recent reddit post about a library making it easier to write scripts has prompted me to spend this Sunday morning taking a stand. I’m currently a senior Python data engineer on a rather large project involving about 4 junior developers. Having spent 3 years on this project, there are a few conclusions I have about reached developing libraries and command-line interfaces and configuration settings and here they are:
Every production-quality Python program has 2 requirements out of the box – logging and configuration (via multiple sources)
With Traitlets, my application instance has both logging and configuration out of the box. When you “write a script” or use the tool that spurred me to write this post you have a few things missing:
- Your application will not scale as you have more configuration settings – all I have to do in Traitlets is add another trait to my application instance and immediately that setting can be configured via the command-line. But not only that, that same setting can be configured via configuration files or OO programming.
- You still do not have a Logging instance at your disposal and must create support for that – every production application must have copious logging so that when things go wrong, you can trace execution and figure out what went wrong and who to blame for it. Every Traitlets application instance has a log instance out of the box.
A well-written application forms HAS-A relationships to the things it manipulates
In retrospect, both the settings and relational database parts of our application did not properly use Inversion of Control/Dependency injection. If they had, then our application would HAVE a relational-database-thing that it manipulates as well as other THINGS could have their state logged via the central logging instance provided by Traitlets.
I realize this section may seem a bit obtuse or confusing. But suffice to say: when you have people saying things about your database code, it is best that you have logging in place showing what your database calls are doing in detail so that you can prove that the problems are not your fault…. our database code was written as a module and thus lacks a logging instance – it is pure SQLAlchemy. It should have been written as a class that had logging available to it. And there should’ve been copious logging in it.
The Object-Functional divide is a problem in Python applications
Adding business concerns to a section of purely functional code poses a problem. Well-written functional code is automatically decomposed and self-documenting. In addition, any failure should be fixable from a stack backtrace. I would argue that as long as all calculations in functional code (ie, non-OO code) are decomposed into functions of less than 10 lines then OO application concerns do not apply.