Synthetic Transactions and Capability Monitoring of your Enterprise Architecture

Back in my days at Lehman Brothers, I was introduced to the concept of “Synthetic Transactions”. That is an automated action that is scheduled to execute periodically to monitor performance and availability of one of more components in your enterprise architecture.

Most architects will use SNMP, and simple pinging of servers, routers, networks, etc, and monitoring things like Disk Space, CPU Usage and Memory Usage. Pretty much anything that can be recorded via HP OpenView / HP BTO (Business Technology Optimization) I believe this is ok for infrastructure monitoring, but for application monitor, which I believe gives you a better view into the health of your Enterprise Architecture, that matters to the real users and clients, Synthetic Transactions are far more superior.

Synthetic Transactions go further than simple network or infrastructure monitoring and it goes further than even simple application performance metrics monitoring with say a tool like ITRS’s Geneos. A Synthetic Transaction is really about testing the capabilities of your systems and applications from the view point of a end user or a calling client system, to ensure that the system is available with the capabilities and performance profile agree upon by the contract set in your requirements.

Synthetic Transactions are not always easy to implement, and great care must be put into planning the inclusion of Synthetic Transactions from the beginning of system design and architecture analysis and should be part of Non-Functional Requirements.

Also in terms of Information Security, and Intrusion Detection, Synthetic Transactions are a way to start implementing the next phase of network defenses. As you all know in today’s work, firewalls are no longer sufficient to keep the hackers out of your systems. More and more hackers have already turned to attacking specific application weaknesses instead of going after the raw network infrastructure as the infrastructure was the first and easiest way for organizations to shore up their security.

While Synthetic Transactions won’t prevent cyber attacks, or increase security by themselves, the detailed level component monitoring and performance metrics collection that Synthetic Transactions provide can potentially help identify applications or components of applications that are under attack or have been compromised due to potential performance or application behavioral issues caused by hackers attacking your applications.

Microsoft has a good outline of what a Synthetic Transaction is, although they related it to their Operations Manager product, the general information is valid regardless if you use a tool or develop your own Synthetic Transaction Agents. Specifically Microsoft states in this article: “Synthetic transactions are actions, run in real time, that are performed on monitored objects. You can use synthetic transactions to measure the performance of a monitored object and to see how Operations Manager reacts when synthetic stress is placed on your monitoring settings. For example, for a Web site, you can create a synthetic transaction that performs the actions of a customer connecting to the site and browsing through its pages. For databases, you can create transactions that connect to the database. You can then schedule these actions to occur at regular intervals to see how the database or Web site reacts and to see whether your monitoring settings, such as alerts and notifications, also react as expected.”

Another good definition however more of just a summary than what Microsoft outlined, is available on Wikipedia in the Operational Intelligence article, specifically the section on System Monitoring where they state: “Capability monitoring usually refers to synthetic transactions where user activity is mimicked by a special software program, and the responses received are checked for correctness.”

Although, Wikipedia does not have a lot of direct information about Synthetic Transactions, I do like their term “Capability Monitoring”, which is exactly what Synthetic Transactions attempts to do, monitor the capabilities of your system at any given moment, to give you, your developers and your operations support staff a dashboard level view into how your system is performing and what components are available and their through the performance measures, what is the health of each of your system’s components and therefore the overall health of your system and applications.

Back at Lehman, and if you look at the Microsoft description, most times a Synthetic Transaction focuses on a single aspect of the System; for example, checking if you are able to open a connection to a database. While this is a valid Synthetic Transaction, it is extremely simple, and may not provide you with enough information to tell if you application is actually available from an end user or client system standpoint.

What I developed as a model for Synthetic Transactions back in 2006, was they ability for my Transaction to interact with multiple-tiers of my architecture, if not all tiers.

The application which I was developing Synthetic Transactions for was a Reference Data system that included a Desktop and Web base Front Ends, a JavaEE (J2EE at the time) based Middleware, a Relational Database, a Workflow Engine, and a Message Publisher, among other various supporting components such as ETL processes, and other batch processing.

The most useful test in this case would be one that touched the Middleware, interacted with the workflow engine, retrieved data from the database and potentially updated test records, and had those test messages published and received by the Synthetic Transaction Agent to verify the full flow of the system.

Creating the Agent:

To create the Agent that would initiate the Transactions, I used a Job schedule such as Autosys or Control-M to schedule the process to kick off every couple of hours to collect metrics (Since the application was a global app used 24 x 7, it was important that the application was not only available but was performant around the clock, and we needed to be alerted if the application was performing out of an acceptable range, and which component was affected).

The Agent itself was a client of the middleware. Since all services such as the Database and the Workflow Engine were wrapped by the middleware, we could have the agent invoke different APIs that would perform a Database Search and record metrics, and call an API that would create a Workflow request, and move it automatically through the workflow steps.

At the end of the workflow, we were able to trigger the messaging publisher to broadcast a message. Since our Data Model allowed for Test records, and we built into our requirements that consumers generally filter out or otherwise ignore Test records in the message flow, we were able to send out test messages in the production environment that would not affect any of our downstream clients.

However, our Agent process could start up a message listener and listen for test records specifically. The Agent then by recording the start time of the workflow transaction to the receive time of the test record message, could calculate the round trip time of data flowing through the system.

Each individual API call from invocation to return can also be timed to test how each different API was performing.

In terms of ETL, since the Data Model again allowed for test records, we were able to create a small file of test records and trigger the ETL process as well to load the test records. The records in the database would be updated, in some cases with just a timestamp update, but it would still be a valid test, and valid metrics can still be collected.

Together this gave us good dashboard view of the system’s availability and performance at a given time. If we wanted to increase the resolution all we had to do was decrease the period between each job start of the Agents.

We recorded the metrics in a database table, and created a simple web page, which production support teams could use to monitor the Synthetic Transactions and their reported metrics.

On a side note: If your APIs and libraries are written in Java, and already record metrics that your developers used for debugging, and Unit Testing, you can expose these directly via JMX, which can be accessed and used directly if your Synthetic Transaction Agent process(es) are also written in Java. Or you can create a separate function or API that returns the internal metrics recorded by your libraries, frameworks and API deployments.

A number of years ago, I developed a Performance Metrics object model and small set of helper functions for Java that I have been using for over a decade and I find that even today they are still the most useful performance metrics I can collect. Perhaps I will write up an article on collecting performance metrics in the applications you develop and share that simple object model and helper functions.

Automated alerts, such as paging the on call support staff could also be accomplished by simply specifying how many seconds or milliseconds a call to an API should take, and if that period is exceeded, the Agent would send out emails and paging alerts.

In the end a lot of organizations have a Global Technology and Architecture Principal that mandates all their applications have some sort of automated system testing.

This can be accomplished by using the Synthetic Transaction paradigm.

It is worth noting that creating an architecture that supports Synthetic Transactions is not simply. You need to ensure that all components, especially your data and information models allow for test records.

A way around the information model requirement is to Rollback all transactions on your database instead of committing them. This would force you to have a flag or special API separate from the normal data flow in your system to ensure data is not permanently written to your database. However, the issue here, is if you implement it this way, you cannot have a true end to end flow in production of test records. Still you will be able to get most of the metrics you need.

Also if you organization only mandates a certain level of automated testing or performance and availability monitoring, than perhaps true end to end data flow through your system is not required.

It is my experience however, that even if my company I work for does not mandate true end to end testing, as a responsible application owner, I prefer to have the capabilities to have true end to end data flow testing available to me, so I can monitor my systems more accurately and give proper answers to stakeholders when users and client systems complain about performance or system availability.

Just Another Stream of Random Bits…
– Robert C. Ilardi
This entry was posted in Architecture. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s