So when developing components in Azure, one of technologies that I’ve been involved with was Azure Functions. They’re great, in addition to all the benefits of a PaaS service they’ve really showed how to simplify a number of components and slim down the code base (deleting code is always great!). One of the features that was really useful was deployment slots.
Azure Functions deployment slots allow your function app to run different instances called “slots”. Slots are different environments exposed via a publicly available endpoint. One app instance is always mapped to the production slot, and you can swap instances assigned to a slot on demand. Function apps running under the Apps Service plan may have multiple slots, while under the Consumption plan only one slot is allowed.
Azure Functions deployment slots | Microsoft Docs
So, let’s break that down, what it gives you is the ability to have multiple ‘active’ instances at any one point and you can swap between these instances. If you think about use cases for this, it gives you the following:
- You can test any changes in one slot before swapping it into production, for example you could deploy to a staging slot, do some final checks, and then swap it in.
- If something isn’t quite right after you’ve swapped the changes in, no problem, just swap them back out with the staging slot and you’ll be back to the previous version.
- When you deploy to a slot, before you can swap it, the Azure App service will ‘warm-up’ the slot basically so that when it does get swapped it’s ready to go and consumers of the services don’t experience a ‘cold start’, particularly if they’re the first to hit the endpoint.
All the above is covered in detail in the link provided above, what I’d like to cover in this post is really a few things that we learnt about deployment slots along the way.
Lesson Number 1 – There’s a brief period of downtime when performing swaps
A requirement of a system I was involved with building was that it would be preferable if during a deployment that the consumers of the functions didn’t experience a period where the function app became unavailable. Although the app was tolerant of these failures it was just something that we thought was important. To test the slot swap I set up a simple console app that effectively rapidly requested our health check endpoint (This is a very simple HTTP triggered function that effectively just returns: OkObjectResult("Ok");
). So, if swaps were seamless there should be no point where this function doesn’t return a 200 - Ok
response right? Nope. What I observed was for a few seconds following the swap, the health check function returned with 503 - service unavailable
before recovering and once again returning 200 - Ok
.
It turns out that this is a well know problem and after adding the recommended setting:
WEBSITE_ADD_SITENAME_BINDINGS_IN_APPHOST_CONFIG
and setting it to a value of ‘1’ this problem went away.
Lesson Number 2 – AzureWebJobsDashboard is probably redundant and costing money if you’re using Application Insights
Out of the box, you get an app configuration setting AzureWebJobsDashboard
set to a storage account connection string. So far so good. However, according to the documentation:
For better performance and experience, runtime version 2.x and later versions use APPINSIGHTS_INSTRUMENTATIONKEY and App Insights for monitoring instead of AzureWebJobsDashboard.
And
This setting is only valid for apps that target version 1.x of the Azure Functions runtime.
We were using Application Insights and were on version 3 of Azure Functions, looking at the daily cost breakdown of our new service it looked like we were being charged for the storage that the AzureWebJobsDashboard
was referring to. I’m uncertain why this comes as defaulted the way it does considering that we have had version 2 and 3 of the function app runtime for quite a while now, perhaps it’s just in case customers wanted to use version 1 for their particular use case. Anyway, we deleted it and saved some cash in doing so.
Lesson Number 3 – At first glance slots give you the impression that they are isolated, but you need to pay attention.
When you think of slots, it’s sold to you on the basis that you have a production slot and one (or more) staging slots. When you want your code to go into production you swap from a staging slot into a production slot and this code becomes active in production. This works great for HTTP triggered functions in that what’s happening under the covers is the routing rules are swapped so now your old ‘staging slot’ is effectively receiving the live requests.
However, you need to be more careful with, for example, Service Bus Triggered functions. The important thing to remember about slots is that they are all active, in that, you can interact with the functions by going to the URL of the slot. If you make the mistake of not having a slot specific setting for your Service Bus connection what will end up happening is the code deployed to the staging slots and production slot will all end up being triggered as they will be responding to messages on the bus (effectively completing with each other). This issue has also been documented in this GitHub post. This can get quite confusing as you may end up with multiple different versions of your code executing (depending on how many slots you have). The solution, as recommended in the above post, is to ensure that you make use of slot specific settings so that, for example, only the Production slot refers to the Production Service Bus instance.
In Summary
Hopefully, the above tips help you on your journey to using deployment slots more effectively. I think they’re a really excellent feature and once you get past some of the niggles work pretty smartly.