Integration monitoring is often seen as just following up on integration errors for example connectivity problems. There is much more to it. Since integration is about connecting different kinds of systems and the data quality varies from system to system, maintaining an integration does not stop when it is deployed to production and seen as working and functional.
Data errors occur in many integrations because different systems have different requirements for data and it is nearly impossible to prepare for all possible differences. It would require knowing all connected systems inside out and even then, it would be a huge effort. Also some problems with data do not show up in the integration execution and it is not always clear, which system the data error originated from. That is why monitoring is essential but it does not always have to be done entirely by technical experts.
Why extensive logging is important
The most common question I get when maintaining integrations is “Did this message go through the integration”. It’s hard to believe how many integration platforms still focus on error handling, forgetting to monitor the messages that do go through the integration platform without errors. Or if they do follow them up, it is very difficult to find a certain transaction.
Most often the connected system’s user does not know what updates have been done to a record and all messages related need to be checked. Very often the message has gone through but there is a problem with the message content. Then the whole transaction needs to be checked; what was sent to the integration, what was done to the data in the integration, and what was sent to the target system.
Few systems have good validations in their interfaces, and this comes back to the problem described in the first paragraph. It is very difficult to validate the data in integration without knowing the other system inside out. If the validation is too strict on the receiving end, the sending system might not always be able to produce the required data. Their data structure might not require the same info because it is used for a different purpose. Because of all this, the successful data transfers need to be monitored as well as the errors.
How to log
Most often integrations are monitored by system administrators or technical people. These are the people who know the integrations very well but they might not be so familiar with the business side of the integration. Their time would be best used in solving technical issues.
As mentioned earlier, the most common question from business is asking if a certain piece of data has been transferred. If the logging is easy to read and the place where logs can be read is easy enough to use, these most common issues could be checked by the business people directly, without delay in communicating with technical people. In this approach it is important to pay attention to the usability of the monitoring solution and the level of information. Not going to the usability of the user interface itself but on the usefulness of the logged information, here are some practices I have found useful.
- Make transactions easy to find
Have an identifier from the message that identifies the data that is being transferred and make monitored transactions easily searchable by this identifier. Following again the common question, business might need to know if purchase order X has been updated to the other system. In the monitoring logs you should be able to search by the purchase order number to see all transfers related to that purchase order. That should usually limit the number of log rows to an amount that is easy to manually go through to find if certain info has been transferred.
- Make execution phases easy to see at a glance
Log different phases in the integration. Log when it does mapping, when filtering, when there is a branch in the logic and what part of the execution the integration executes. This information is more useful to technical people, but it will also help anyone looking for what happened in the case of an error. It will tell if the integration failed already reading the source data or when sending to the target system. If the integration has conditional logic, log which branch was selected based on the condition. It will tell you for example if the integration tried to perform an update, even though the record does not exist in the target system.
- Archive messages
Archiving is a bit double edged, it requires space and GDPR concerns make some data not able to be archived, or at least the archive needs to be well secured. On the plus side, only this way you can really see exactly what was received or sent to and from any system, making you able to verify if the systems have had the correct data in their use. Also it makes it a lot easier to debug what happened in between. Sometimes it can be very useful to archive the message in the middle of the execution if there is a lot of logic and different phases in the transformation of the message.
- In case of errors
Error messages should contain an integration ID or some other reference to the documentation, where you can find more information about the integration. Make sure the error report is as precise as possible but multileveled. The first thing you see from the error should be a simple message, where the error happened and what happened. This should be the higher level exception message or the message a connected system returns.
This is also a guideline for creating APIs. In error cases they should return a multileveled error message with easier to read overall error and a more detailed error message in a separate field. The first one is for a quick check if this is a known error or a simple data problem and that should also be readable to a non-technical person. The detailed message is for the more difficult cases and for developer level examination. This can be the stack trace of an exception or some other detailed description of the error.
- Use information levels
It is useful to assign levels to logged events. Most events will be non-essential to everyday maintenance and can be ignored until you need to actually find something in the successful transfers. My usual system consists of four levels: Info, Debug, Warning and Error.
Info level would be anything that is needed for the basic check, what the integration has done. Logging the start and end of the execution, phases the integration goes through and the identifier to find what has been transferred. Debug is for the technical information about the execution and the different phases. Here you can log for example some data that affects how the integration makes decisions. This level can usually be disabled in production and is more useful in testing the integration.
Warning level I use for any errors returned by other systems. In these cases, the integration has technically run successfully but there is an issue with the end result. These are the most likely errors found in production and usually this is what a business person will find if the transaction he has been looking for is in error. Error level is reserved for situations where the integration actually fails to do what it is supposed to do. The most common error is a connection error but any other kind of exception would also be this level.
The difference between warning and error in my system is that in a warning the integration has technically been completed but there is an error from an outside source or the data has been caught by a validation system or some other check. Connection errors should be in this category but most integration platforms throw an exception at this point so that can be handled as an error also since the transaction has not been completed.
Moving towards better monitoring
Traditionally logs have been stored in places only technical people have had access to. Connecting to a server, opening and reading some text files takes time. The logs are usually also just a huge amount of text and it can be hard to find what you are looking for. Writing from experience, it is much more efficient to check problems from a central logging system with a web UI, even for a developer.
It shows its true value when other people than technical experts can also check their issues straight from the monitoring platform. This usually requires integrating the logs into the central logging system and more defined logging practices to make the logs easily searchable and visually clear. This will slightly increase the amount of work needed for implementing logging but will save a lot of time in integration support. It is a modern way of monitoring logs and in my opinion, should be the way it is done.