Craft your logs carefully

Logging is one of the simplest ways to leave a trace of what your code was doing at a certain point of time — even though it’s a rather short-lived, evanescent trace that usually disappears after just a couple days. Logs are usually classified with levels that describe the nature of the event that developers wanted to record:

  • Debug: for highly specialised information about the state of the system, hardware integration, intermediate operations… with the goal to help developers dive into the internals of the code behaviour. Usually these logs are hidden or even completely omitted in the code, due to the large quantities that can quickly be produced, often being redundant with the other log levels.
  • Info: for high-level, contextualised information about the core state and operations, the “business logic”. If relevant, it contains the necessary information for identifying the subject matter of the log (for example, the book identifier in a library management system). While they are very useful for understanding what happened in a concrete case, most informational logs will never be read by a human.
  • Warn: for abnormal state or behaviour that, if prolonged, might have a negative impact on the user, the infrastructure, or the code itself. These logs should be reviewed and carefully considered by developers. I try to minimise the use of this level and instead decide if my log is an error or just informational.
  • Error: for cases that “shouldn’t have happened”. Attention from the developers is required as soon as possible - but logs should never trigger alerts or pages!

Choosing the level is sometimes the first decision that developers make, when wishing to record a log. Or at least, it’s very often the first thing that developers type. In that moment, it’s important to ask oneself what level of visibility does the new log deserve. A good rule of thumb to aim for “lower” levels (i.e. prefer warnings over errors, and info’s over warnings), because “higher” levels come with a cognitive cost. As I mention in my post about reacting to error logs, I believe that developers should strive for a zero error policy, so that error logs are seen as an abnormality that needs fixing. Otherwise, there’s a risk of normalising the errors and consequently devaluing them.

The next and equally important decision is the log message - the words that we actually want to record. I prefer concise but descriptive messages, including all the relevant information for identifying the subject matter. Try to look around for similar logs, it’s usually a good idea to stick to the same format and conventions. For example, if you see error logs like [account-activation] Failed to send activation message for userId {userId} with an exception attached to the log, adding a message like this would feel dissonant: error while sending message for {userId}, exception: {exception.toString()}. Notice the following dissonances:

  • Redundant “error” mention, which takes away precious space from the log message
  • Missing “account activation” tag, which would have allowed to easily filter by the relevant use case
  • Assumption that the reader knows that the identifier is a “user identifier”, which increases the cognitive barrier to benefit from the log
  • Exception added as part of the message body, which clutters the message and make the exception hard to read

Disclaimer: I made up that dissonant log message, luckily I’ve never encountered such an extreme case in the wild!

Logs are sometimes accompanied by optional fields that provide additional context around them. For example, exception stack trace for error logs, response time for access logs, trace identifier for logs on a distributed trace, user identifier for business transactions initiated by users, etc. Having those bits of information available makes a big difference when investigating unexpected behaviours, attempting to understand what happened. However, be mindful if introducing sensitive data or personally identifiable information - incorrect use of data might breach directives like GDPR or other local laws for data protection. On top of that, information exposed in the logs could be used as an attack vector against your systems, if the access is compromised. So it’s a good idea to have a strict access policy and also establish a data retention period in order to avoid keeping unnecessary information. If you have the need to hold log information for auditing purposes, consider an event sourcing architecture or persisting the relevant events into a database, for example.

Given the criticality of some logs (particularly error ones) for debugging and detecting issues building up, I recommend asserting in your tests that logs are actually logged. It will verify the correct behaviour of the logging library and, perhaps more importantly, reveal the criticality of logs that shouldn’t be changed or removed without proper consideration.

In summary, these are my suggestions when it’s time to add a new log:

  • Prefer “lower” log levels
  • Be consistent with the rest of logs when writing your message
  • Write a concise but descriptive messages - avoid redundant information or making assumptions
  • Include optional log fields when appropriate
  • Be mindful about exposing sensitive data through logs
  • If critical for detecting issues, assert that the log is actually logged

And perhaps most importantly: don’t be afraid to challenge your logging conventions and speak up if you think the logs around you could be improved. Having meaningful logs is a blessing for debugging issues and tracing actions!