5 minutes
Terraform for Azure - Deploying Multiple VMs with Multiple Managed Disks
Cover image by Taylor Vick.
Intro
I recently came across an old module that I had developed on v0.11.7 which deploys Linux (Ubuntu) virtual machines on Azure, unfortunately not usable now as it requires a whole lot of refactoring. During the porting process I discovered some immaturity on my part in my Terraform code (lack of experience back then), and I wanted to add the use of Azure Managed Disks to my deployment.
Future-proofing the script means refactoring resources such as azurerm_virtual_machine
, as per the docs:
Note: The
azurerm_virtual_machine
resource has been superseded by theazurerm_linux_virtual_machine
andazurerm_windows_virtual_machine resources
. The existingazurerm_virtual_machine
resource will continue to be available throughout the 2.x releases however is in a feature-frozen state to maintain compatibility - new functionality will instead be added to theazurerm_linux_virtual_machine
andazurerm_windows_virtual_machine resources
.
Prerequisites
In this post we are using the following versions:
terraform {
required_version = ">= 0.12.25"
required_providers {
azurerm = ">= 2.10.0"
}
}
Deploying Multiple VMs with Multiple Data Disks
Using count
was the obvious answer (at first!) because of its easy-to-use nature. I’m passing in a list of VMs for some fine grained control of what is deployed/destroyed. For example:
terraform.tfvars
resource_group_name = "my-test-rg"
instances = ["vm-test-1", "vm-test-2", "vm-test-3"]
nb_disks_per_instance = 2
tags = {
environment = "test"
}
main.tf
resource "azurerm_linux_virtual_machine" "vm" {
count = length(var.instances)
name = element(var.instances, count.index)
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
size = "Standard_D2s_v3"
network_interface_ids = [element(azurerm_network_interface.nic.*.id, count.index)]
admin_username = "adminuser"
admin_password = "Password1234!@"
disable_password_authentication = false
os_disk {
name = "osdisk-${element(var.instances, count.index)}-${count.index}"
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
source_image_reference {
publisher = "Canonical"
offer = "UbuntuServer"
sku = "18.04-LTS"
version = "latest"
}
tags = var.tags
}
resource "azurerm_managed_disk" "managed_disk" {
count = length(var.instances) * var.nb_disks_per_instance
name = element(var.instances, count.index)
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
storage_account_type = "Standard_LRS"
create_option = "Empty"
disk_size_gb = 10
tags = var.tags
}
resource "azurerm_virtual_machine_data_disk_attachment" "managed_disk_attach" {
count = length(var.instances) * var.nb_disks_per_instance
managed_disk_id = azurerm_managed_disk.managed_disk.*.id[count.index]
virtual_machine_id = azurerm_linux_virtual_machine.vm.*.id[ceil((count.index + 1) * 1.0 / var.nb_disks_per_instance) - 1]
lun = count.index + 10
caching = "ReadWrite"
}
3 VMs each with 2 data disks! Awesome right? WRONG!
Problems with count
What if I remove vm-test-2
from the list of instances
?
Problem 1: Pretty much all of the disks have to be amended. With managed disks, this mean replacements are forced which is probably not very healthy.
Problem 2: Each managed disk attachment for each VM requires a unique LUN (Logical Unit Number) - although we did achieve this in the script above (6 data disks, with LUNs [10, 11, 12, 13, 14, 15]), any change to this number forces a new resource created. This is inefficient.
Problem 3: It doesn’t achieve what I want it to do.
A Better Solution - for_each
After endlessly searching for solutions, I mashed up a few and found a way using for_each
, for
and locals
.
for_each
and for
were introduced in v0.12 of Terraform.
I’ve abstracted away the evolved resources as it’s pretty straight forward to do - all you need to do is remove the references to count and replace with for_each
at the top of each resource and iterate over var.instances
.
As this is a list, you would need to use toset()
to convert its type.
Introducing locals
at the top of main.tf
as follows:
locals {
vm_datadiskdisk_count_map = { for k in toset(var.instances) : k => var.nb_disks_per_instance }
luns = { for k in local.datadisk_lun_map : k.datadisk_name => k.lun }
datadisk_lun_map = flatten([
for vm_name, count in local.vm_datadiskdisk_count_map : [
for i in range(count) : {
datadisk_name = format("datadisk_%s_disk%02d", vm_name, i)
lun = i
}
]
])
}
This is like a temporary store to evaluate expressions before being used in resources. For example, the expression for luns
above is looping through the datadisk_lun_map
, and creating a key / value pair for each key. Check the result using terraform console
(you reference locals
in resources as local.variable
):
➜ terraform console
> local.luns
{
"datadisk_vm-test-1_disk00" = 0
"datadisk_vm-test-1_disk01" = 1
"datadisk_vm-test-2_disk00" = 0
"datadisk_vm-test-2_disk01" = 1
"datadisk_vm-test-3_disk00" = 0
"datadisk_vm-test-3_disk01" = 1
}
Now I can use these locals
in my resource blocks - here are the managed_disk
and managed_disk_attach
blocks:
resource "azurerm_managed_disk" "managed_disk" {
for_each = toset([for j in local.datadisk_lun_map : j.datadisk_name])
name = each.key
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
storage_account_type = "Standard_LRS"
create_option = "Empty"
disk_size_gb = 10
tags = var.tags
}
resource "azurerm_virtual_machine_data_disk_attachment" "managed_disk_attach" {
for_each = toset([for j in local.datadisk_lun_map : j.datadisk_name])
managed_disk_id = azurerm_managed_disk.managed_disk[each.key].id
virtual_machine_id = azurerm_linux_virtual_machine.vm[element(split("_", each.key), 1)].id
lun = lookup(local.luns, each.key)
caching = "ReadWrite"
}
Results
When removing vm-test-2
from instances
, the plan shows that ONLY the resources related to this particular machine
are destroyed. No unnecessary updates or forced replacements are carried out.
➜ terraform plan
azurerm_linux_virtual_machine.vm["vm-test-2"] will be destroyed
...
azurerm_managed_disk.managed_disk["datadisk_vm-test-2_disk00"] will be destroyed
...
azurerm_managed_disk.managed_disk["datadisk_vm-test-2_disk01"] will be destroyed
...
azurerm_network_interface.nic["vm-test-2"] will be destroyed
...
azurerm_virtual_machine_data_disk_attachment.managed_disk_attach["datadisk_vm-test-2_disk00"]
...
azurerm_virtual_machine_data_disk_attachment.managed_disk_attach["datadisk_vm-test-2_disk01"]
...
Plan: 0 to add, 0 to change, 6 to destroy.
Conclusion
The solution seems to work well and provides a relatively simple way to manage larger deployments that require multiple data disks per VM. I cannot confirm whether this is the best practices way of achieving this goal, I’d open to hearing feedback on how to improve it! :)