Which info should be written to a log file

No application is perfect. All applications have bugs, unexpected and strange behaviors. They might be a small thing but there are some users who can’t use the desired functionality due to the bugs. Logging is important to know what happened.

However, we, at least I, often faced the problem that the necessary information is not in the log file. It might be hard to know the necessary information in advance but there is also important and helpful basic information. Let’s take a look at them and introduce them to our application log if they have not been included yet.

Log level

To write information to our log file, it is necessary to categorize the message. I think it is basically the following.

Fatal
Error
Warn
Info
Debug
Trace

It depends on the project whether Fatal/Trace is used or not. By the way, the two are not used in my work project. It is easier to see the log by categorizing the messages.

Let’s check what should be included in each level.

I wrote examples to show which information in which level but it depends on the application. Consider which is the proper level for the information for the application. This is just guidance.

Fatal

The fatal level is generally used when an error crashes the application or the application can no longer work without the fix. Since this level depends on the application, I don’t write any examples here.

Error

Most exceptions except for the Fatal error should be written with the Error level. This error doesn’t cause an application crash but a feature doesn’t work.

We should catch all the exceptions that we already know that occur in some cases. For example, a file not found, network connection error, missing configurations, and so on.

It depends on the application which information to be written with this level.

Warn

Let’s consider about warning level. It is used when an error happens but the process can proceed and the feature works by automatic recovery.

Something is invalid but default value can be used

There are many cases that default values/settings can be used. Let’s have a look at the examples below.

File not found

If the application needs to load a setting file for the user preference but the file doesn’t exist even though the user gives the file path, it can instead load a default setting file. In this case, file not found should be written to the log with the Warning level.

Invalid or unsupported value

A user or client application could give us an invalid value. Our application can deny the value but there are some cases that we want to return the default value instead of an error. For example, our application guesses the current locale and wants to apply it to the timestamp but the locale is not supported. A default format should be used in this case.

In those cases, the used value should be written in our log file.

Version mismatch

If our application communicates with other modules, it could receive unexpected value due to the mismatch of the version. However, the structure could be different but the parse function might not throw an error. In this case, the process works without error but might behave a bit strangely. If the application has the dependency that should be the desired version, write the version mismatch info to the log.

Connection lost but reconnected again

Connection loss occurs depending on the network and the device status. The application might have an automatic connection retry system. If the application is trying to reconnect to another system and it succeeds, the connection lost should be written to the log with Warn level.

If the connection retry failed, it should be written with an Error level.

However, in my opinion, this handling should be done only if the retry period is short enough. If the retry period is 30 seconds, for example, it is too long. In this case, it’s better to write the connection lost with the Error level.

It can be categorized in info level but I added it here because connection loss is not a normal state.

Info

Info level is used to know how far the program proceeds. It shouldn’t contain error-related info. It should not be too detailed. Which information should be here?

Versions

Our own application version

The application version is one of the most important things. If we need to investigate a root cause of a bug, the application version is absolutely needed. The code where we are looking at might be the same for the recent versions but other classes where it interacts with can be different. The root cause is sometimes tangled and hard to find. Make sure that our application writes the version to the log file.

Application version that we interact with

This is also important. If the version is not what we expect, the structure of a response can be different. It might contain extra property, be the same date-time but different format, and so on. If the version is old, the feature we implement might not be supported.

Plugin or dependency version

Some applications might have a plugin system. The additional things can be added at run time or at least after the compilation. In this case, the plugin can cause an issue for some reason. It’s better to have the version in our log.

If the application is dependent on the other module, write the version if possible. It’s not necessary if the dependency is determined at compile-time, e.g. using npm by Node.js.

Mode and user settings

There might be admin mode, user mode, or something else. Those modes can influence behavior. That’s why we introduced them. Without knowing the mode, it might be difficult to find the root cause.

Many applications allow users to set the settings according to their preferences. There might be too many settings. I recommend writing the info to a different file in this case because the settings are not often updated and the information could be removed after a while if the application runs a long time.

Information for the procedure

If an error occurs, we write it to our log file. Doesn’t it write anything if it works correctly? It should write something even if it works as expected. The error log is not always written if an uncaught error occurs or the application crashes. Info level log is helpful in this case because we know how far the program goes forward.

However, logging for every line is too much. It depends on the team or the developer how often a log should be written but it’s better to have a standard.

A feature consists of multiple functions. It means a top-level function calls multiple functions in it. How about logging for each function call or end of the function before returning the result? If we write the log in this way, every developer doesn’t have to consider where to put it.

If there is other necessary information in the function, it should also be written to the log file.

Debug

Normally, debug info is not written to a log while it’s running in production mode. However, it’s hard to find the root cause with the info/warn/error levels. In this case, debug mode can be helpful for further investigation.

We can write anything with Debug level if it is helpful for the development but I have two examples here.

Received and sending data

If the application communicates with other modules, write the received data and send data to the log. The received message might be different from what we expected. The data can make the log unreadable if it’s written every time. Therefore, it should be written at Debug level.

Keep alive information

The application might have a keep-alive system for the communication module. If a monitoring system works in a different application, this is not necessary to write to the log.

This is helpful if the application sends data to another module and waits for the response but the module doesn’t send the response back to our application. Our application might block the main thread and the log is no longer written for the main thread.

Keep alive must work on a different thread, it should write the state of the module to the log. We can at least know the programs are still running if the keep-alive system is working on a different thread. The problem is perhaps data itself. It couldn’t be handled correctly somewhere.