Enhancing Azure Security with Load Balancer and VMSS

After locking down my Azure network with Bastion, I realized security was only half of the equation. What good is a fortress if your users can’t reach it when demand spikes? That’s when I dove into Azure Load Balancer and Virtual Machine Scale Sets (VMSS) the backbone of high availability on Azure.

Architecture Overview

Before we dive into the code, let’s understand what we’re building:

Key Components:

  • Front End: Public IP exposing port 80 to the web
  • Backend Pool: Auto-managed set of Linux VMs
  • Health Probe: Checks port 80 to reroute traffic when a node fails
  • Scale Set: Adds/removes VMs based on CPU usage

💡 Think of it like a restaurant: The Load Balancer is your host distributing customers (traffic) to available tables (VMs). When it gets busy, you open more tables (scale out). When it’s quiet, you close some (scale in).


Step 1: Define Infrastructure with Terraform

Let’s start by creating the foundation. First, set up your project structure:

azure-resilience/
├── main.tf
├── variables.tf
├── outputs.tf
└── terraform.tfvars

Create the Resource Group and Public IP

# main.tf
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "rg" {
  name     = "rg-fintech-resilience"
  location = "East US"
}

resource "azurerm_public_ip" "lb_public" {
  name                = "pip-loadbalancer"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  allocation_method   = "Static"
  sku                 = "Standard"
}

Create the Load Balancer

resource "azurerm_lb" "app_lb" {
  name                = "fintech-lb"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  sku                 = "Standard"

  frontend_ip_configuration {
    name                 = "PublicIP"
    public_ip_address_id = azurerm_public_ip.lb_public.id
  }
}

resource "azurerm_lb_backend_address_pool" "backend_pool" {
  loadbalancer_id = azurerm_lb.app_lb.id
  name            = "BackendPool"
}

resource "azurerm_lb_probe" "health_probe" {
  loadbalancer_id = azurerm_lb.app_lb.id
  name            = "http-probe"
  protocol        = "Http"
  port            = 80
  request_path    = "/"
}

resource "azurerm_lb_rule" "lb_rule" {
  loadbalancer_id                = azurerm_lb.app_lb.id
  name                           = "http-rule"
  protocol                       = "Tcp"
  frontend_port                  = 80
  backend_port                   = 80
  frontend_ip_configuration_name = "PublicIP"
  backend_address_pool_ids       = [azurerm_lb_backend_address_pool.backend_pool.id]
  probe_id                       = azurerm_lb_probe.health_probe.id
}

Deploy your infrastructure:

terraform init
terraform plan
terraform apply

🎯 What just happened? You created a Load Balancer that will distribute incoming HTTP traffic across multiple VMs. The health probe continuously checks if each VM is responding on port 80.

Step 2: Create the VM Scale Set

Now let’s add the VMs that will actually serve your application:

# Create a virtual network first
resource "azurerm_virtual_network" "vnet" {
  name                = "vnet-vmss"
  address_space       = ["10.0.0.0/16"]
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
}

resource "azurerm_subnet" "subnet" {
  name                 = "subnet-vmss"
  resource_group_name  = azurerm_resource_group.rg.name
  virtual_network_name = azurerm_virtual_network.vnet.name
  address_prefixes     = ["10.0.1.0/24"]
}

resource "azurerm_linux_virtual_machine_scale_set" "vmss" {
  name                = "finops-vmss"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  sku                 = "Standard_B1s"
  instances           = 2
  admin_username      = "azureuser"
  
  admin_ssh_key {
    username   = "azureuser"
    public_key = file("~/.ssh/id_rsa.pub")  # Update with your SSH key path
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts-gen2"
    version   = "latest"
  }

  os_disk {
    storage_account_type = "Standard_LRS"
    caching              = "ReadWrite"
  }

  network_interface {
    name    = "vmss-nic"
    primary = true

    ip_configuration {
      name      = "internal"
      primary   = true
      subnet_id = azurerm_subnet.subnet.id

      load_balancer_backend_address_pool_ids = [
        azurerm_lb_backend_address_pool.backend_pool.id
      ]
    }
  }

  # Install a simple web server on each VM
  custom_data = base64encode(<<-EOF
    #!/bin/bash
    apt-get update
    apt-get install -y nginx
    echo "<h1>Hello from $(hostname)</h1>" > /var/www/html/index.html
    systemctl start nginx
    systemctl enable nginx
  EOF
  )
}

Run terraform apply again to create your VM Scale Set.

Note: The custom_data script installs Nginx and creates a simple HTML page showing which VM is serving the request, also ensure to update the ssh key which we have shown in previous write ups

Step 3: Add Autoscaling Rules

Now for the magic part making your infrastructure respond to demand automatically:

resource "azurerm_monitor_autoscale_setting" "vmss_autoscale" {
  name                = "vmss-autoscale"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  target_resource_id  = azurerm_linux_virtual_machine_scale_set.vmss.id

  profile {
    name = "defaultProfile"

    capacity {
      default = 2
      minimum = 2
      maximum = 5
    }

    # Scale OUT when CPU > 70%
    rule {
      metric_trigger {
        metric_name        = "Percentage CPU"
        metric_resource_id = azurerm_linux_virtual_machine_scale_set.vmss.id
        time_grain         = "PT1M"
        statistic          = "Average"
        time_window        = "PT5M"
        time_aggregation   = "Average"
        operator           = "GreaterThan"
        threshold          = 70
      }

      scale_action {
        direction = "Increase"
        type      = "ChangeCount"
        value     = "1"
        cooldown  = "PT5M"
      }
    }

    # Scale IN when CPU < 30%
    rule {
      metric_trigger {
        metric_name        = "Percentage CPU"
        metric_resource_id = azurerm_linux_virtual_machine_scale_set.vmss.id
        time_grain         = "PT1M"
        statistic          = "Average"
        time_window        = "PT5M"
        time_aggregation   = "Average"
        operator           = "LessThan"
        threshold          = 30
      }

      scale_action {
        direction = "Decrease"
        type      = "ChangeCount"
        value     = "1"
        cooldown  = "PT5M"
      }
    }
  }
}

Understanding the rules:

  • Time Grain: How often to collect metrics (every 1 minute)
  • Time Window: How far back to look when calculating average (5 minutes)
  • Cooldown: Wait time before performing another scale action (prevents flapping)

Add NSG Rules to Your Terraform

Here’s what you need to add to your Terraform configuration:

# Create a Network Security Group
resource "azurerm_network_security_group" "vmss_nsg" {
  name                = "nsg-vmss"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name

  # Allow HTTP traffic from Internet
  security_rule {
    name                       = "AllowHTTP"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "80"
    source_address_prefix      = "Internet"
    destination_address_prefix = "*"
  }

  # Allow HTTPS traffic (optional, but recommended)
  security_rule {
    name                       = "AllowHTTPS"
    priority                   = 110
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "443"
    source_address_prefix      = "Internet"
    destination_address_prefix = "*"
  }

  # CRITICAL: Allow Azure Load Balancer health probes
  security_rule {
    name                       = "AllowAzureLoadBalancerProbe"
    priority                   = 120
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "*"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "AzureLoadBalancer"
    destination_address_prefix = "*"
  }

  # Allow outbound internet access (for VM updates, etc.)
  security_rule {
    name                       = "AllowInternetOutbound"
    priority                   = 100
    direction                  = "Outbound"
    access                     = "Allow"
    protocol                   = "*"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = "Internet"
  }
}

# Associate NSG with the subnet
resource "azurerm_subnet_network_security_group_association" "vmss_subnet_nsg" {
  subnet_id                 = azurerm_subnet.subnet.id
  network_security_group_id = azurerm_network_security_group.vmss_nsg.id
}

Critical Rule: Azure Load Balancer Health Probe

Network Security Groups must allow Load Balancer traffic from the Azure infrastructure IP address 168.63.129.16, which is used by health probes. The rule using source_address_prefix = "AzureLoadBalancer" takes care of this automatically.

Add this output to see your results:

# outputs.tf
output "lb_public_ip" {
  value = azurerm_public_ip.lb_public.ip_address
  description = "Public IP of the Load Balancer"
}

output "vmss_id" {
  value = azurerm_linux_virtual_machine_scale_set.vmss.id
  description = "Resource ID of the VM Scale Set"
}

Reflection

Scaling shouldn’t just mean “more servers.” It means continuity users never noticing the chaos behind the scenes. When I saw the Load Balancer distributing requests seamlessly and VMs spinning up automatically under load, I felt I was one step closer to what real-world Platform Ops looks like stable, secure, self-healing infrastructure.

Step 4: Integrate with Azure Pipelines

To automate deployments and infrastructure updates, create an Azure Pipeline: We can destroy the deploymnet by the “terraform destroy” command and push the code to our azure git repo.

trigger:
  - main

pool:
  name: "Self-Hosted"

variables:
  terraformWorkingDirectory: "$(System.DefaultWorkingDirectory)/azure-resilience"

steps:
  # Install Terraform
  - task: TerraformInstaller@1
    displayName: "Install Terraform"
    inputs:
      terraformVersion: "1.8.5"

  # Terraform Init
  - task: TerraformTaskV4@4
    displayName: "Terraform Init"
    inputs:
      provider: "azurerm"
      command: "init"
      workingDirectory: "$(terraformWorkingDirectory)"
      backendServiceArm: "azure-devsecops-connection"
      backendAzureRmResourceGroupName: "devsecops-rg"
      backendAzureRmStorageAccountName: "devsecopsstate00211"
      backendAzureRmContainerName: "tfstate"
      backendAzureRmKey: "devsecops-env.tfstate"

  # Terraform Validate
  - task: TerraformTaskV4@4
    displayName: "Terraform Validate"
    inputs:
      provider: "azurerm"
      command: "validate"
      workingDirectory: "$(terraformWorkingDirectory)"

  # Terraform Plan
  - task: TerraformTaskV4@4
    displayName: "Terraform Plan"
    inputs:
      provider: "azurerm"
      command: "plan"
      workingDirectory: "$(terraformWorkingDirectory)"
      environmentServiceNameAzureRM: "azure-devsecops-connection"
      commandOptions: "-out=tfplan"

  # Terraform Apply
  - task: TerraformTaskV4@4
    displayName: "Terraform Apply"
    inputs:
      provider: "azurerm"
      command: "apply"
      workingDirectory: "$(terraformWorkingDirectory)"
      environmentServiceNameAzureRM: "azure-devsecops-connection"
      commandOptions: "-auto-approve tfplan"

Key Takeaways:

✅ Load Balancers ensure no single point of failure
✅ Health Probes automatically detect and route around failed VMs
✅ Auto-scaling adjusts capacity based on actual demand
✅ Infrastructure as Code makes everything repeatable and version-controlled

Leave a comment

I’m Adedeji

I am a Microsoft MVP. Welcome to my blog. On this blog, I will be sharing my knowledge, experience and career journey. I hope you enjoy.

Let’s connect