The modern world offers multiple ways and platforms to integrate information systems together. As everyone is talking about new and sexy design patterns such as API and event-driven architectures one shouldn’t forget that there’s always a need to give some love for the old-fashioned legacy file system integration interfaces.
You might wonder how the platform could support the requirements to migrate your existing on-premises file system integrations into the cloud or how it could fulfil the future need to do so. Among other solutions on the market also Microsoft Azure has the tools for your file system integrations and it even gives plenty of options to choose from. Azure integration solutions do not solely restrict you into cloud but it also gives you possibilities to take advantage of your existing on-premises systems. Extending from your more traditional integration platforms such as BizTalk Server towards public cloud you can enjoy the many benefits brought to you by cloud environment.
This blog post aims to give you an introduction and some ideas on how to do simple on-premises file system integrations using Azure Logic Apps. While being targeted only to Logic Apps the same principles are matched by Microsoft Power Automate (formerly known as Microsoft Flow) and Azure Data Factory. Ultimately the tool you choose is reasoned how you value the differences and features between these services.
Azure Integration Services
Microsoft provides an iPaaS solution called Azure Integration Services which consists of four key cloud services that you can build your integration solution with. These services are API Management, Logic Apps, Service Bus and Event Grid. Depending on your chosen architecture and requirements you may want to utilise all of these services or only a subset. All of these services serve different purposes and have different pricing models. As our focus is on Logic Apps alone we will pass the others with only short mentions.
Azure API Management is Microsoft’s solution for managing APIs across clouds and on-premises with which you can easily add features such as data protection, security and discoverability on your API’s. Service Bus is Microsoft’s PaaS offering for transferring messages between different applications and services. If you have a messaging oriented architecture this is a must-have Azure service. It enables things like queues, sessions, transactions topics, subscriptions and decoupling. If your applications are event-based then Azure Event Grid is the service to spice up your integration architecture.
Logic Apps is Microsoft’s cloud service to create and automate workflows and business processes. Workflow definition consists of triggers and actions. Triggers are your starting point for running any workflow. Simplified it can be a scheduled event or an event from another service or system. Actions are your tools to control your process flow, interact with data and connect to other systems by consuming a range of ready-made connectors. Actions are run in sequence and values from one action will be passed on to the subsequent action. A sample of a simple scheduled file handling Logic App workflow is shown in Picture 1 below.
Picture 1: Sample Logic App
Out of the box, Logic Apps gives you tons of great features introduced by the serverless service model. On the financial side, you have flexible consumption-based pricing and on the technical side automatic scalability, fault tolerance, availability and no/low code implementation just to mention a few.
Firstly, we shall introduce a simple scenario to set up basic requirements for our integration solution. Let’s say that we have an on-premises data system (System A) on a server located in a company network. System A writes files into a shared network drive that is accessible only within the given network. We also have another system, System B, where we want to transfer data to. Target System B is not necessarily part of the same network as System A and it’s integration interfaces can vary. Data itself or how it’s passed into System B doesn’t matter. We are only concerned on how to access files provided by System A using the Azure integration platform and more specifically Logic Apps. We will be comparing two patterns on how to do so.
Picture 2: Simplified scenario
Before we can start designing and implementing our Logic Apps solution we have to fill a few requirements. To access resources and services located in your company network you have to set up On-Premises Data Gateway. Basically, you will need a physical or virtual server that has a connection to the company network. On this server, you will install and configure the on-premises data gateway runtime. This service acts as a bridge between on-premises data sources and your Logic Apps. For more detailed instructions on how to set up the gateway refer to Microsoft’s documentation: Install on-premises data gateway for Azure Logic Apps.
What comes to the triggering event you basically have two choices, the traditional scheduled recurrence event or filesystem event trigger. Let’s see what are the key differences, pros, and cons between these two options.
Workflow options to access files
When your workflow input is either a file or an event based on the file you basically have two options to trigger the workflow. The first option would be the traditional scheduled polling that only starts your workflow without knowing anything about the underlying filesystem integration. In this case, you define the connection later on in your workflow definition by using connectors. The latter option would be triggering the workflow by events generated by the filesystem. In this case, your trigger is already aware of the filesystem connection. Let’s review these two options more closely and with pros and cons.
Option 1: Recurrence trigger
As stated earlier, recurrence trigger is your choice to create scheduled workflows without specifying any context to possible data sources. The trigger gives you the basic selections for the frequency with different interval options. For example, you can fine-tune it for specific times on certain weekdays.
- You will have full control of when the workflow is supposed to execute. You can limit polling to only moments when you are expecting any files to be available for processing. Similarly recurrence will help if your source or target system has any internal scheduled background processes that have to be taken into consideration.
- Batch-like file processing is easy to implement. You simply list all files in the source directory and process the files however desired. Note that the file listing action doesn’t support any wildcards or filters. All possible filtering, e.g. excluding folders have to be done separately. Picture 1 is a simplified example of this kind of scenario.
- Recurrence start time setting gives you the option to deploy the workflow beforehand and it will become active on the specified time.
- It is easy to limit workflow concurrency with a Recurrence trigger. By default Logic App would run multiple instances at the same time if previously triggered ones haven’t been finished. This helps to prevent long-running processes accessing the same file.
- Recurrence will trigger every time no matter if there are files to be processed or not. This means that you may get workflow executions that do practically nothing. If you want to indicate the status of your workflow’s final status you have to implement a bit of logic. For example, you might want to exit with status ‘Cancelled’ if no files were processed.
- Related to previous behaviour you will always have polling events in your filesystem. As you may have many integration workflows accessing the filesystem by using the same connections and services this might cause performance issues. You should try to find a compromise between ASAP and resource cost-effective scheduling. All these polling events and actions are also billable actions so it also affects your total Azure costs.
Option 2: Filesystem event trigger
For more event-oriented workflows you have two filesystem triggers to choose from. As their names suggest “When a file is added or modified” triggers on added or modified files and “When a file is created” on created files. Both triggers can subscribe to multiple file events and neither of those includes subfolders. Adding multiple source folders is possible by adding multiple workflow triggers (be noted that this is not supported by the graphical designer). Only file properties are passed to the workflow, not the file content.
Both triggers have the same configurations of which folder and interval being the only mandatory ones. These triggers are also based on scheduled intervals much like a recurrence trigger. The difference is that you have more control when the trigger actually fires and causes a workflow to run.
As both file system triggers have the same configurations what are the key differences? It’s basically how they handle full duplicate and modified files. They both will pick up fresh and completely new files similarly but pick up or skip on modifications. For example, if your pattern is to always receive new files and remove them afterward both triggers would do the trick. But if you have to trigger on file modifications only the other would give you desired results.
- Your workflow execution history won’t be bloated by triggered runs which didn’t process anything. This keeps your monitoring simple.
- You can add trigger conditions to filter on which file properties should be matched to trigger the process. This way you can for example include or exclude by filename and extension.
- File metadata is available without any extra actions.
- When your workflow fails to process the file, it’s up to you to resume the failed process. Previously failed files won’t trigger again automatically. This is not necessarily a drawback if you have to handle certain failures in a more detailed manner. Let’s say that the file content validation failed and you have to initialise a manual process to fix the problem. In that case automatic resubmit wouldn’t fix the root issue and might cause extra alerts and monitoring challenges.
- Batching multiple files into one workflow execution is not automatic. You would have to implement a separate batch pattern with an additional Logic App workflow (batch receiver and batch sender).
On-premises file system integrations are easily doable by Azure Logic Apps and depending on your needs the patterns are pretty simple and clear. You have many options to choose from and within those options, there are ways to make desired behavioural adjustments. What drives your choice-making could be architectural alignments, best practices, business process limitations or requirements, cost-effectiveness, reliability, monitoring, or something else. As stated earlier this post covered mainly how to trigger and access files. Handling file contents would be worth another topic. Also mixing Logic Apps with other Azure services opens more possibilities for more advanced and flexible integration patterns.