Against Development Conveniences

In software development, strong typing is good and succinctness is often bad

As I have wasted no time sharing in the past, I have a strong preference for strongly-typed programming languages. I’ve taken particular aim at Python, a language I believe is past its prime—I won’t dwell on this, but I will quote my favorite article on the subject:

[T]his is oh-so much fun! Trying to apply a method to an object instance returned by a library? BOOM stacktrace in your face because you forgot to check if the object could be None! Using a function hastily ported from Python2 to Python3 which is now returning strings or bytes? Too bad you'll only find about this at runtime.

In general, scripting languages are compelling to beginners and smaller projects, and for good reason. They’re interpreted. They don’t require you to deal with types very much. Most-everything is presented as mutable. In these ways they abstract away concepts which are less accessible to beginners (like compilation, static arrays, and so on) with the goal of creating an environment that’s easier to jump into. These are specific manifestations of a general appeal to convenience.

I make the case below that, as projects scale, these conveniences provided by the language are deceptive and actually make ongoing development less convenient. It’s tremendously important to keep your types in line, document projects proactively, and impose opinionated standards—whether or not your language of choice tells you to—in order to keep large projects from becoming piles of competing-philosophies-as-code (and therefore unmaintanable). In other words, ideas about the above-stated manifestations of convenience are subject to hyperbolic discounting and are not your friend.1

As evidence for my case, below I detail two things people like a lot and explain why they are bad.

Against loose typing

Strong typing is great. If this function is supposed to return a string, it’ll always return a string. If it’s supposed to return an object with an ID and two strings, it’ll do that too. If you violate this standard, you can’t compile the code at all and you have to go fix it.

The downside is that sometimes this is a headache. What if I don’t know the structure of the data an API endpoint will return to me? What if there’s a situation where a function should either return an int or a float, and there’s a really good case for differentiating the two?2 Decent strongly typed languages will have ways of dealing with this, but it’s way more annoying than the loosely typed approach. For the API endpoint example, differentiating between two possible return objects is as simple as “Store the response somewhere; check if this property exists; if it doesn’t, check if this other property exists…”

Whereas languages like Go make this a lot more tedious with interface{} and map[string]interface{} objects and related assertions, to say nothing of reflection and the like. The extra time spent fiddling with these is an additional up-front cost imposed on the developer.

But, for this up-front cost, you earn worthwhile guarantees that help avoid runtime errors.

The convenience offered by loose typing is great when you individually understand the problem you’re solving and its implementation. There is no problem in the ideal case when you’re maintaining your own work: you code according to your own preferences, guided by your own beliefs about what is convenient. You comment as much as you individually need to make sure you remember what function X does.

But it is easy to overestimate how translatable those comments will be to the next maintainer, or how clear the intention might be behind what values get returned from what. You also, since bugs are generally unknown unknowns, can’t predict where bugs will need to be fixed and thus what will need to be changed post facto in what you’ve written.

Type-hinting function arguments and return values gets us most of the way there, but doesn’t reduce the risk that intermediate values managed by those functions can’t be confusingly typed. Consider the following:

def handle_new_order(order_id: int, amount: int) -> int:
    sales_order = db.get_order(id: order_id)
    
    ... some logic

    return amount_successfully_handled

What type is sales_order? What if I need to do something more complicated with it? I sure hope that db.get_order() follows the same type-hinting standards and constructs its return object correctly!3

Being forced to define one’s types & objects and match them up exactly prior to runtime may feel inconvenient, because it feels needless in this simple case. The point is that those strict definitions will be a huge quality-of-life improvement as the project grows. It’s great to know the exact data you’re working with without having to hunt for it.

Against succinctness

In general, succinctness is valued and considered to improve readability. I believe this intuition is partly but not wholly untrue.

It’s nice when you can write values = [1, 2, 3, 4, 5] at the top of your file and call it a day. Surely that’s much better than var values = [5]int{1, 2, 3, 4, 5} to achieve the same end?

It would be disingenuous to consider these statements identical, though. The complexity of each statement is directly proportional to the amount of detail it provides to the interpreter or compiler:

  • The first statement says “this is a list of values which initially contains 1, 2, 3, 4 and 5.”

  • The second statement says “this is a list of integer values which will always contain exactly five members; initially it contains 1, 2, 3, 4 and 5.”

In other words, the amount of information communicated in a statement is inversely proportional to the succinctness of the statement.

There are at least two types of readability: here I call them idiomatic readability and explicit readability. The first statement is idiomatically readable: it is understood on the surface what the statement is doing (creating a list of numbers) and it can be inferred what this list may be used for—even if it’s possible, it’s obvious you probably shouldn’t put a string in that list. If it’s a constant with a relatively higher scope, perhaps you should think twice before modifying it at all. And so on.

The second statement is explicitly readable. It is contextless. You do not need to know anything about how it is used to understand that you cannot put anything but an integer in it and you cannot increase or decrease its size. If you tried, it would cause compile-time errors. In this sense the more verbose statement communicates much more information and provides the reader useful guarantees. For instance, it would be pointless to search for cases where the list was modified in an unexpected manner; if it was, the code wouldn’t compile.

To me, the usefulness of the second statement vastly outweighs the increased verbosity.4 I disagree with the commonly held belief that Python is more readable than Go, for example, because I believe idiomatic readability is an insufficient standard—I want to learn as much as possible per unit time about how a component of a project is built and used, and that means understanding the precise nature of any relevant data without having to hunt for it or try to infer it through context.

And so we reach a similar conclusion to the prior section in a different way.

Discussion

It is on these two points that I find the most disagreement with other software engineers, eg. “What do you mean that you like verbosity?” or “How can you possibly think Go is more readable than Python?”

I like to avoid anything implicit, or over-reliance on comments or docstrings to understand code. You will forget. You will leave the company eventually. The next maintainer won’t understand the comments, or they will become outdated. It’s easy to underestimate the size of the problem you can create this way.

Things that feel convenient often won’t be in a year or two. The fundamental logic of a program will always be as complex and readable as it needs to be to accomplish your use case. Either a language forces you to stay engaged with the precise nature of the data you’re working with (eg. Rust) or it doesn’t (eg. Python). I prefer the former.

1

Because I took aim at Python I should caveat this as follows: there are many smaller or younger projects with good opinionated standards, block merges behind good linters, &c.; these seem mostly exempt from my concerns. Instead I’m taking aim at projects that too strongly embrace the conveniences I discuss here.

2

I came across this in a past project related to some finance code. It was a function designed to standardize output between two modules: one that stored monetary values in integer cents, and an older one that dealt in floats. If the function received a float, it knew to turn it into an int at a certain degree of precision before the value was used for anything. It would’ve been better to refactor the whole old module to deal with ints, but there were dozens of fiddly execution paths including lots of exit points—again, it was old!—so it was nice to have a sane final conversion at the end.

3

And this is without getting into how many different types of exceptions db.get_order() is capable of throwing, checking whether the object is None, and being careful to put attempts to access properties inside if blocks in cases where the object might not be structured as intended.

4

Of course, both statements here fit in one line; there are myriad examples of worse verbosity than this. I choose this example for its simplicity but argue that my position holds even when it results in, say, twice as many lines of code—especially in a compiled language which will execute faster anyway.