
The hardest truth about software is that its underlying code easily becomes overly complex. A system consists of many components interacting between each other, and it is hard for a single person to keep everything in their head. Sometimes, there are just too many paths to follow along. Sometimes the code is jsut not well organized and anything can change the global state.
In short, software professionals know very well that reading code is far from enough to understand the functioning of a software system, or even to find errors in it. Yet, many developers think that, after writing some code, running a couple of manual tests is all it takes to consider it done.
Nothing farther from the truth. No real-world software system can be fully understood in all its corner cases. And sometimes also the main paths of software usage are broken in non-obvious way. We have to live knowing that the software we build contains bugs, and it is only a matter of time before they are discovered.
Does this mean that we have to give up in doing our job properly because perfection is not achievable? Not at all! As developers, we cannot really aim to have error-free code, but we can aim at continuously improving the reliability of our systems over time.
How can we do it if reading the code won’t get us too far? Well, the clear answer is: exercise your code! Most errors are found in running software. The more a software system is used, the more errors are found, and the more it can be fixed.
However, there are many ways to exercise code that should be taken into consideration during the development cycle.
Different ways of running your code
There are at least four ways to exercise your code, each with its pros and cons, and working at a different scale:
developer’s manual tests
automated tests
QA manual tests
Production
All of them are useful in their own way, and skipping one of them may make us blind to some weaknesses of our software product.

Developer’s Manual Tests
This is you, the developer, running your code to see how it behaves with real usage. And maybe to check if the last edits are working correctly without breaking anything existing.
This kind of testing is usually time consuming because the software starts from scratch and several paths need to be tried out. This can take significant time, particularly if the software’s response time are not instantaneous because of network latency or other pieces taking non-negligible time. Then, it seldom represents extensive testing and it is not useful to find non-obvious bugs.
Automated Tests
There are various fashions of automated tests: unit tests, smoke tests, integration tests, end-to-end tests, and more. They all have different purposes, as their granularities span from a single function to the entire system, and the running time increases accordingly. However, what they have in common is that they can be run while developing, so to get a (relatively) fast feedback on the current status of the system. Continuous integration (CI) servers allow us to run tests for each commit in a separate machine, so that our machine is kept free for our tasks. The feedback is generally quite fast if the tests are properly and they allow us to make sure that functionalities that are know to work, did not suddenly break (regression). Additionally, when writing automated tests we should think of edge cases that can break the correct functioning of the system.
Having a good amount and variety of automated tests is a nice way to routinely exercise our code, but they have to be designed for high coverage, edge cases, and execution speed.
QA Manual Tests
We said that a developer would mostly try the system paths validating the code just written. On the other hand, external testers have not written the code themselves and approach the working system with different, fresh eyes. Maybe they are QA professionals or maybe they are very early adopters, or people hired for alpha or beta tests. In all cases they will use the system differently from its developers and will find bugs to which the development team is blind.
This type of testing is particularly used in the gaming industry and in highly-regulated industries, where some errors can be very dangerous for the business or for its users.
Production
The examples before are all ways to exercise our code before the real thing: production. The real value of a system is seen when it is actually deployed, and we better come prepared to that. Buggy software can kill a company. Yet, deployment to production is not the end of the story, it is the beginning. When our software runs in production, it is exercised (hopefully) as never before. Thousands, millions of users will surely use it in ways that ten people in the team could not even imagine.
Strange software usage is also the main topic of Lizard Optimization, a way to come up with new features analyzing unexpected user behavior.
The famous Murphy’s law can also be thought this way: with thousands of users, an event having 0.1% of probability of happening has good chances of coming up. Anything that can go wrong will go wrong.
Is that a reason to despair? Of course not! Many systems are in production with millions of monthly users and they are not collapsing under their own weight. The reason is that they embrace software failure. Instead of hoping that nothing will go wrong, they come with tooling to provide observability on running software: running in production is one more opportunity to learn about software defects.
Testing in production can really be a superpower when knowing how to do it properly. That means, having tooling in place to check what errors come up, as well as monitoring system performance, and be ready to do something about it.
Exercise Building Code
Knowing that our software works is important, but before that, we should know that anyone that needs to build the code can actually do it. Here we can define build as the process that gives us a working artefact to run the software, being it compiled binaries or a virtual environment including all dependencies. For software written on Python, we can also have a double stage: first extensions written in languages such as C++ or Rust need to be compiled, and then a virtual environment created.
When we write our code and build it on our machine, we will install everything needed to make the build successful. Most of the dependencies will be in the local environment. Think about a Python virtual environment created for the project, or local dependencies installed with cargo for Rust. These are the “obvious” dependencies, as they are written explicitly in a dependency file that can be used during the build process. Pyproject.toml
or setup.py
are two popular file types to list dependencies needed during the build. Other languages have similar file types: cargo.toml
for Rust or go.mod
for Go are just two more examples.
However, sometimes dependencies are not explicit. Even worse, you may not think about them at all while listing dependencies because they were already installed in your system when you started working on the project.
Let’s suppose we are writing some software that processes audio files, and we thus have python-ffmpeg as a dependency. We install it and work with it seamlessly. Then, when we distribute it, some of our colleagues tell us that it is not running for them because of some missing dependency.
What can that be?
It turns out that python-ffmpeg does not install ffmpeg, it just assumes it is already in your system. And since it is a popular tool to work with audio, you had it already installed in your system.
Having implicit dependencies is a big issue that can make installing your software a frustrating experience for fellow developers or software users.
What can we do about it? After all, how can we know among all the software installed in our computer what is affecting our builds?
Fortunately the solution is already quite popular and consists in isolating the build environment from the rest of your system, so to have full control on our the installed software.

This is where containerization enters into play and brings a lot of value! The most popular container engine is Docker. We can add one or more Dockerfile
to our project, each creating a different Docker image that basically represents our software ready to run and packed in an environment with all the needed dependencies installed. We can also choose the operating system regardless of what we use for the actual development.
A Dockerfile includes all the steps needed to create the environment where to build our software, so it is exactly what we need. Before every release, we create a new image that builds our software, and we release only if the image creation is successful. Additionally, the image can also be distributed, or can also be the main means of distribution. This way, our users will not need to build our software at all! If they have Docker installed, they run everything from there. As a bonus, docker containers have the same isolation advantages as local virtual environments: no dependencies installed at the global level and thus no conflicts.
Docker lets us automate the building stage of the software lifecycle. The more run the automated build and the more we know that this stage works. It is another aspect of exercising your software.
Being able to at least create an image is really a baseline to be able to produce some installable software. If we are not able to install in a reproducible way the software that we wrote, how can we expect any users approaching it for the first time to have better success than us?
Conclusions
The main takeaway of this article is that software needs to run. When it runs we find issues and we can fix them. Not running the code can give us a false sense of security.
You want your code to be reliable? Then run it. A lot. The more users for it, the more edge cases will be explored. And this is where reliability is really proven. Testing and using the software should be seen by engineers as part of a continuum with different stakes but equal opportunities to find errors in it.