Bad Abstractions

Mar 3rd 2022 by Will Gant

We all like abstractions. We use them every day to make sense of the world. Sometimes those abstractions can be useful, like saying "hey, if we have interchangeable parts, our factories are more efficient". However, we have to be careful, as history is littered with bad abstractions and "just so" stories to explain the world that led to horrible things. Many other abstractions just lead to dead ends and mediocrity. In technology, everything sits on a pile of abstractions, whether you are talking about the application server level, or are writing assembler code to work directly with a processor. We either stand on the shoulders of giants or we don't stand at all.

And therein lies the problem. Because we all rather quickly recognize the value of abstraction, we all have a pretty strong bias towards building them. While this instinct can be useful, it really needs to be moderated to some degree by careful thought, as not every abstraction is reasonable, especially once you stack them together. We often hear about the perils of premature optimization and how wasteful and destructive it can be, but people often miss out on how abstraction is a subset of optimization. In this case, abstraction is an optimization of the time and attention it takes to get something done in code.

If you've been around the block a few times, you probably have worked on at least one codebase where someone got a little "happy" with abstracting things. If you've been around the block a few times and don't remember this problem, you may have been the source, btw.. Nevertheless, bad abstractions can double or triple the amount of time that is required for even simple tasks, and can also foster disagreements among staff members (especially if the person who wrote the abstraction is still there and still proud of it). Bad abstractions are like personality defects in that they also accumulate piles of other abstractions to deal with them.

Good abstractions are required in software development. Literally none of this stuff would be possible without abstractions. We should all strive to create more abstractions that actually work. However, it can be tempting to create abstractions for things you don't fully understand, things that are likely to change, or to simply create abstractions because "it's better practice". But you really need to reign back that impulse sometimes. An abstraction is like a character in a story, no matter how much you like it, it can ruin the whole thing if shouldn't be there. Bad abstractions when left alone, will often accrete bad code around them, simply as a way to try to fix them. You need to be able to either catch yourself before you write a bad abstraction, or you need to be able to identify them after the fact and have the will to remove them.

Episode Breakdown

Abstracting for "What we're gonna need"

Predicting the future is harder than you think - if you were that good at it, you'd own the tech company rather than work there. Say you are building a tool for pulling content from twitter. Do you abstract immediately because you think you'll do the same soon for Instagram, Tiktok, Facebook, Youtube, etc.?

This would be a premature abstraction. Those sites have different kinds of content, different kinds of rules around the content, and fairly disparate APIs. And you don't even know if you are going to need it in the first place. If you start out with an abstraction, you make your twitter integration harder, and you either spend a lot of time in design, or your design has to be reworked when you add one of the others.

Abstracting for similar things that have a high chance of later divergence

This one comes down to predicting the future as well. However, this time, you have two disparate, but fairly similar systems that you are trying to combine. Say you are making a library for interacting with digital assets that you've stored and you currently support file system storage, S3, azure blob storage, and storage in a SQL Server database. You might be tempted to abstract, because the code using these assets doesn't care where they come from.

This would be an incorrect abstraction. The means of retrieving these assets vary, as do the options (and problems) that are available in each case. For instance, you have a different security landscape with S3 than you do with SQL, and both SQL and your local filesystem can run out of space.

While these things may initially look the same, even a trivial implementation is likely to already have some divergence, and is likely to diverge further over time. For instance, with the S3 implementation, you may want to actually have the assets disappear after a certain amount of time, which is easy in s3 but is going to require more code for other implementations.

Abstracting to deep hierarchies

This problem occurs when you start visualizing your abstraction as being a set of layers of other abstractions. It's most commonly found in OOP-based systems, but you can do it in any paradigm. The real issue happens because as your object hierarchy gets deep, you'll find similar things on different branches and will be tempted to push them down to the least common ancestor in the hierarchy. However, it make not make sense at that level, or in other things that branch from there.

This one is a bit harder to explain by example, at least without extensive exposition, but a common sign of this problem is an excessive number of methods that throw the equivalent of a NotImplementedException because those methods are only expected to be used by "inheriting" types.

This sort of abstraction creates excessive overhead for anyone wanting to use the rest of the abstractions in a group (usually a class) by forcing them to implement things that may not be useful for what they are trying to do, or by forcing excessive reading of documentation to figure out what base methods are actually usable.

Abstracting to wide hierarchies

Along with deep hierarchies, wide hierarchies are also a problem. In this case (typically, but not always in OOP situations), you have a single base type that is inherited from by a large number of other types. The base provides some very commonly used functionality. The issue here occurs when one of the objects inheriting from the base needs to inherit from something else as well. In many frameworks, you won't be able to do this, while in others it's not overly recommended because it introduces additional maintenance issues.

Typically functionality that is used by a large number of classes belongs in a class of its own, rather than in a base class. This simplifies testing concerns, and makes it less likely for a small change to ripple through your codebase. This is especially apparent when the functionality in the base class is really more of a cross-cutting concern, like logging, caching, instrumentation, and the like.

Abstraction for cross-cutting concerns

Cross-cutting concerns should typically not be addressed in a base class. Rather, they should be handled in a separate class that is (usually) made available via dependency injection. Putting it in the base class essentially marries your code to a particular implementation.

The problem that occurs here is that it becomes much more difficult to switch out implementations. Additionally, it likely means that your code is much more tightly coupled to a particular implementation. These problems typically become apparent when a small configuration change in a cross-cutting concern breaks things all over the app. Most parts of your application should be agnostic in regards to things like logging implementation, so when they aren't, it's a sign that something is wrong.

A common way to try and deal with this lack of flexibility (without actually fixing it with something like dependency injection) is to add more code to the base class that lets inheritors configure it. This is almost always a terrible idea.

Abstraction with heavy configuration

When you find yourself in a hole, the first and most important thing to do is to stop digging. When you've made a bad abstraction, you should prefer getting rid of the abstraction to keeping the abstraction and configuring away the problems. The configuration will only grow.

This sort of complexity often appears because a particular abstraction is no longer addressing a single area of concern - rather, it's addressing two or more and not everything needs both abstractions. This is especially common in situations where people use inheritance for common functionality instead of composition. Over time, this means more configurability and more hooks into execution get pushed into the base class, eventually resulting in a bloated mess. This configuration is also very confusing, as it can be very unclear how much needs to be overridden and how much is built in by default.

Abstraction for swanky syntax

This one is increasingly common, especially with things like fluent interfaces. In essence, you break out an abstraction so that code is pleasing to the eye and expressive. While this is wonderful when applied properly, this abstraction can easily box you in, making changes more difficult than they would otherwise be.

Common use cases for this sort of abstraction are things like query builders, object factories, configuring classes, and the like. A common problem with this sort of abstraction is that it can be unclear what order methods should be called in, what exactly they do (versus what they are named), and that they make testing with mocks extremely messy.

Such abstractions can also be rather challenging to modify, debug, and test. That doesn't mean that they are necessarily bad (we all have some that we like), but rather that they are always bad when you build them without the planning and attention to detail that they truly require.

Tricks of the Trade

Not everybody understands abstractions. You will work with people, even some developers, who are concrete thinkers not able to grasp the concept of abstraction. What's interesting is when you meet people who grasp abstractions in one area like development but not in others like social life or in interactions with others. On the other side you have people who abstract too much and will apply that beyond where it is useful trying to abstract people and business.