Load and performance analyzing is often only done the moment it is needed. The moment customers are walking away from the slushy and bad responsive system. Too late tools are put in place to start profiling the system for finding the bottle neck and that one slow statement. Better to start performance monitoring and load testing from the beginning. With the cloud tools available it is even fun.
Structuring the Load and Performance monitoring and analyzing practices of your team is the only answer. The good thing, it is easy now the Cloud is here. That easy a team member silently confessed he liked it (I promised not to mention his name in public).
A lit bit up front thinking is needed. There are differences between monitoring and analyzing, the goal is different and also the tools you use for it are different.
An explanation about production monitoring, release validation and profiling with Azure and Visual Studio Team Services.
Production Monitoring with Azure Application Insights.
When monitoring the production environment, you want to check the availability and throughput of several key scenarios’ while gathering metrics about the system. You want to check the health of the system and the behavior on real usages.
You want to get alerts when on specific thresholds so you can take action before the system falls apart.
With Azure this can be done via Azure Application Insights. From where you can monitor on different levels on the system.
Azure Availability Tests
The Availability of the system can be monitored via pings, and via scenario’s and give you a good idea of the general response time the user experiences. A scenario is a
written recorded Web Performance Test with Visual Studio Enterprise.
Azure Application Insights Availability tests with Visual Studio web test as scenario’s gives you a good indication of the system behavior. For example, in this configuration the LoadReport web test on average takes 5 seconds, but sometimes it peaks at 20 seconds. A candidate for more investigation. Drilling deeper in the Web Test steps you can see the response, request details.
Setting a validation rule for timing or other validation in the Web Test together with a failure alert in Azure will keep you posted when scenarios are going out of their timing boundaries or giving the wrong response. See this post on setting rules: http://www.benday.com/2013/08/19/validation-extraction-rules-for-visual-studio-2012-web-performance-tests/
Azure Application Insights Availability tests with Visual Studio Web Test are great for continuous monitoring, how does my system behave for specific scenarios. Or how does my system behave after a new release. You also can make an adhoc web test for monitoring an experiment, feature flag implementation before you set it open for the world.
The analytics data an Azure Application Insights Availability Test produce is very useful. The setup is very easy, because you don’t need to do any customization on the system. Simply run the web test. The analytics data stays on the outside of the system. Questions can be answered like: which call was the slowest?
In opposite of the availability tests, the client side monitoring gives you real user data, information what the users are doing on your system and how the system, from the client side perspective, behaves.
Client side application insights gives a good indication how the system is used and behaves. You can track if users are experiences any errors or slow page loadings.
It is real user behavior, while the availability tests scenario is recorded user behavior. This difference also makes the analytics different. With Client Side Application Insights, it is hard to say what a change within the system for impact has on the performance, you only can see how users are using it and the experiences this user has. Still valuable information but for a different monitoring category.
Moving deeper on insights, capture server side application insights.
When you also want to have information about the behavior of the system on the server, you need to implement and use the Application Insights SDK in the application.
When having done this you will get some rich information from the server (for on premise applications you need to install a Application Insight Status monitor, https://azure.microsoft.com/en-us/documentation/articles/app-insights-monitor-performance-live-website-now/ .).
The same as with Client Side Application Insight data, it is real user data no prerecorded scenario as with the availability web tests.
Both client side and server side application insights gives you dependency response information. Availability web tests are providing this information via the prerecorded steps.
More info on Application Insights from Azure.
Some notes on Application map, environments, .NET Core and more info on reporting.
Interesting is to look at the application map in the Application insights portal, it gives a clean over view of the test and monitor points of your application. From left to right the availability test, the client and server monitoring and the dependencies. You can drill in to them or configure the points.
When using a multi environment deployment strategy, DTAP for example. You don’t want to collect all the data from all environments in one basket. Specially for the client and server side monitoring this can be annoying.
There are two ways to work around this. One is with TAGS, the application insight information is than captured in one place (one graph) but you can differentiate it by the tag. When you want to have AI data in different places you have to use the InstrumentationKey. In the ApplicationInsights.config there is the InstrumentationKey, change this one via config transforms or deploy process or environment variable for the different environments.
To get it working for .NET core, follow the steps in this GITHub repo. https://github.com/Microsoft/ApplicationInsights-aspnetcore/wiki/Getting-Started#add-application-insights-instrumentation-code-to-startupcs
More on Reporting.
Visualize the data for the customer, or for the team is a key to the value of your monitoring effort. There are several ways to do it. First, when you don’t want to give everybody access to you Azure subscription. To look at the graphs. You can use PowerBI.
PowerBI provides a content pack for Application insights. Too bad only the client side Application data is used. Would be great is also availability data, server side monitoring and dependencies are also available. But for showing what your customers, site users are doing is good.
The great thing PowerBI adds to the data is the capability to add additional queries to it. This can make the data even more rich and informative.
Analytics is something what is added to the azure portal recently. Analytics gives you query language for rich querying the analytics data. Also on this one, only Client side data is available.
Execute Performance Tests during build with VSTS.
Validating the performance of a new release on a test environment is something different as monitoring the behavior of a system in production. On a test environment you want to validate the impact on a system change, a code change. For this you must have a baseline and a standard set of validation scenarios to execute.
The availability tests which run on production can also run on a test environment. They will give some insight in the impact (and show the differences with production) of a new release. How are the key scenario’s doing after a new deployment? The challenges that a test environment is behind a firewall can be overcome by adding this list of IP addresses to the firewall rules. (see the list at the bottom of this article: https://azure.microsoft.com/en-us/documentation/articles/app-insights-monitor-web-app-availability/ )
Visual Studio Team Services Cloud based Load Testing.
A structured way of monitoring changes in system behavior after a release is using Visual Studio Team Services Load testing. A Visual Studio load test contain one or more web tests, user scenarios, which are executed a predefined time and sequence on the system.
There are two important capabilities of a Visual Studio Load test which makes it an ideal way of measure differences in releases. The first capability is that they can be compared against each other. This can be done in the web interface (see: https://www.visualstudio.com/docs/test/performance-testing/getting-started/performance-reports ) or the results are imported and analyzed in Excel (see: https://msdn.microsoft.com/en-us/library/dd728091.aspx ). Excel even brings next to a comparison also a trend report.
The other capability of a Visual Studio Team Services load test is that they can be executed as a Build step and/ or as a Release step. See : https://blogs.msdn.microsoft.com/visualstudioalm/p/cltbuildtaskhelp/
Preferable as a release step, because than they are part of a deployment.
Most often test environments are behind a firewall, with a whitelisted trusted IP this can be covered. https://blogs.msdn.microsoft.com/visualstudioalm/2015/03/09/load-testing-applications-behind-firewall-using-trusted-ip/
Root cause analyzing Performance bugs on test environments.
When you notice a difference in performance, a slow scenario or key indicators who are showing maximum values. There is a need for investigation, find the performance bug.
When having not only UI Load and Performance Test, but also Load and Performance Test on the API and Service it is easier to narrow down the search for the slowest method. As mentioned in this article from Ed Glas, Load Testing Visual Studio Online:
Another set of load tests we run are directed tests, aimed at isolating a particular component or service for performance, stress, and scale testing. We have found that targeting load testing at a particular component is a much more effective way to quickly find stress-related bugs in that component, vs. just adding that component into the larger load test that hits everything on the service.
Application Insights data on the test environment.
Although there aren’t real users on a test environment, it is still relevant to keep on collecting Application Insights data on the test environment.
Next that the results give information about the behavior of the system under test it also gives the team the opportunity to look at the behavior of the monitoring itself. This makes it easier to tune the graphs, queries, reports and the availability web tests used.
Performance profiling for root cause analyzing with Visual Studio.
And then you have found a performance bug or a trigger which needs a fix. How to get to the right method and solve it? There are multiple performance profilers available for different places in the system to use for profiling.
When the monitoring reports point to an issue related to the client, browser
DotTrace and Visual Studio Profiling tools are for analyzing .NET code.
And SQL profiling and query analyzers for the database.
And more tools for different resources. With all these tools you can find the right spot to improve your code its performance.
Don’t wait till you have a production performance issue. The minimal a team should use are the load tests to validate the performance of the system during a release (see image below bullet 3). This saves the team from releasing a slow system.
Actually, when the team doesn’t want to invest in load test (bullet 3), they could at least do the monitoring of the system behavior during test runs (4). Don’t wait till there are production issues. When an performance issues is found, it can be solved (5) before it ends up in production.
Everything what is monitored in production, availability and usages data (1 and 2), is actually there to build a check if the team didn’t forget anything in test (3 and 4). When the team did forget something it will be noticed in production. The application insights data will probably give enough information to immediately dive in to the profiling and solving of the performance bug (5).