Azure Network Engineer

Azure Network Engineer Associate (exam: AZ-700)

Очень хороший youtube playlist по экзамену AZ-700.

Azure Networking Cookbook # мб поизучать если будешь сдавать Azure Networking

Статьи performance tuning в azure, по тестированию performance с использованием NTTCP и статьи по маршрутизации в virtual network Azure.


  • Vendors
  • Для rename network interface предлагается или создать новый или использовать Azure CLI. Через portal никак.
  • Azure reserves some IP addresses within each subnet. The first and last IP addresses of the subnets are reserved for protocol conformance, along with 3 more addresses used for Azure services.
The IP address falls within the reserved IP range of subnet
  • Для instance azure публикует ожидаемую производительность.
Standard_D5_v2 - Expected network bandwidth (Mbps) 12000
Azure MTU

Azure and VM MTU. В сети Azure mtu 1400.

The default MTU for Azure VMs is 1,500 bytes. The Azure Virtual Network stack will attempt to fragment a packet at 1,400 bytes.

Note that the Virtual Network stack isn't inherently inefficient because it fragments packets at 1,400 bytes even though VMs have an MTU of 1,500. A large percentage of network packets are much smaller than 1,400 or 1,500 byte

Нигде явно не сказано что нужно уменьшать MTU.

Azure and VM MTU
The default MTU for Azure VMs is 1,500 bytes. The Azure Virtual Network stack will attempt to fragment a packet at 1,400 bytes.

Note that the Virtual Network stack isn't inherently inefficient because it fragments packets at 1,400 bytes even though VMs have an MTU of 1,500. A large percentage of network packets are much smaller than 1,400 or 1,500 bytes.

Virtual Network stack is set up to drop "out of order fragments," that is, fragmented packets that don't arrive in their original fragmented order. These packets are dropped mainly because of a network security vulnerability announced in November 2018 called FragmentSmack.

FragmentSmack is a defect in the way the Linux kernel handled reassembly of fragmented IPv4 and IPv6 packets. A remote attacker could use this flaw to trigger expensive fragment reassembly operations, which could lead to increased CPU and a denial of service on the target system.

You can configure an Azure VM MTU, as you can in any other operating system. But you should consider the fragmentation that occurs in Azure, described above, when you're configuring an MTU.

We don't encourage customers to increase VM MTUs. Increasing MTU isn't known to improve performance and could have a negative effect on application performance.

Because a larger MTU means a larger MSS, you might wonder whether increasing the MTU can increase TCP performance. Probably not. 
There are pros and cons to packet size beyond just TCP traffic.
The most important factors affecting TCP throughput performance are TCP window size, packet loss, and RTT.


Azure GRE

GRE явно не поддерживается, в том числе на отдельных NVA (network virtual appliances).
GRE doesn’t work inside Azure, as they use it for their internal transport. 
“Multicast, broadcast, IP-in-IP encapsulated packets, and Generic Routing Encapsulation (GRE) packets are blocked within VNets.”
GRE tunnel
Cisco CSR 1000v on Microsoft Azure - GRE tunnel is unsupported
Cisco CSR 1000v on AWS - GRE tunnel is supported
Выбор производительной ОС Linux

Наилучшая оптимизация Linux для Azure в контексте производительности – Ubuntu 18.04-LTS из коробки или патч ядра Linux (для старых систем). Для остальных систем (CentOS, RHel) рекомендуют ставить их последнюю версию (7.x) + Linux Integration Services (LIS). Экстеншнов по факту порядочно, причем в них есть даже уязвимости 😀 Среди extension есть VMAccessForLinux, который позволяет сбросить SSH конфигурацию хоста/создать нового пользователя (если “отвалился” доступ), включая root.

apt-get -y update
apt-get -y upgrade
apt-get -y dist-upgrade

linux-azure-5.4-cloud-tools-5.4.0-1058 linux-azure-5.4-headers-5.4.0-1058 linux-azure-5.4-tools-5.4.0-1058 linux-cloud-tools-5.4.0-1058-azure linux-headers-5.4.0-1058-azure linux-image-5.4.0-1058-azure
linux-modules-5.4.0-1058-azure linux-modules-extra-5.4.0-1058-azure linux-tools-5.4.0-1058-azure

This uses the VMAccessForLinux extension to reset the credentials of an existing user or create a new user with sudo privileges, and reset the SSH configuration

В контексте задержки Ubuntu не выделяют – просто указывают необходимость использование latest версии ОС.
- The Ubuntu Azure kernel is the most optimized for network performance on Azure.
- Significant throughput performance can be achieved by upgrading to the Azure Linux kernel. 
- Use the latest version of Windows or Linux.


Proximity Placement Groups (PPG)

Proximity Placement Groups – настройка реализует то, что все VM находятся в одном ДЦ. Особенности, которые нужно учесть в тексте/технически: гарантии деплоя нет т.к. необходимые ресурсы могут отсутствовать в ДЦ, по этой причине рекомендуется:

    • деплой с использованием ARM templates (одновременный деплой нескольких VM)
    • при фейле деплоя с использованием Azure CLI изменить последовательность деплоя (начать с того ресурса, который не инициализировался в PPG)
Proximity placement groups offer colocation in the same data center. 
Placing VMs in a single region reduces the physical distance between the instances. Placing them within a single availability zone will also bring them physically closer together. However, as the Azure footprint grows, a single availability zone may span multiple physical data centers, which may result in a network latency impacting your application.
To get VMs as close as possible, achieving the lowest possible latency, you should deploy them within a proximity placement group.
If latency is your first priority, put VMs in a proximity placement group and the entire solution in an availability zone. But, if resiliency is your top priority, spread your instances across multiple availability zones (a single proximity placement group cannot span zones).
A proximity placement group is a logical grouping used to make sure that Azure compute resources are physically located close to each other. Proximity placement groups are useful for workloads where low latency is a requirement.
A proximity placement group is a resource in Azure. You need to create one before using it with other resources. Once created, it could be used with virtual machines, availability sets, or virtual machine scale sets. You specify a proximity placement group when creating compute resources providing the proximity placement group ID.
You can also move an existing resource into a proximity placement group. When moving a resource into a proximity placement group, you should stop (deallocate) the asset first since it will be redeployed potentially into a different data center in the region to satisfy the colocation constraint.
Because proximity placement groups represent an additional deployment constraint, allocation failures can occur. A proximity placement group is a colocation constraint rather than a pinning mechanism. It is pinned to a specific data center with the deployment of the first resource to use it. Once all resources using the proximity placement group have been stopped (deallocated) or deleted, it is no longer pinned.
For the lowest latency, use proximity placement groups together with accelerated networking. 

Azure ARM Templates
Deploy all VM sizes in a single template. In order to avoid landing on hardware that doesn't support all the VM SKUs and sizes you require, include all of the application tiers in a single template so that they will all be deployed at the same time.

Azure CLI
If you are scripting your deployment using PowerShell, CLI or the SDK, you may get an allocation error OverconstrainedAllocationRequest. In this case, you should stop/deallocate all the existing VMs, and change the sequence in the deployment script to begin with the VM SKU/sizes that failed.
When using CLI, colocation status can be obtained using az ppg show by including the optional parameter `--include-colocation-status`.
Aligned: Resource is within the same latency envelop of the proximity placement group.
Unknown: at least one of the VM resources are deallocated. Once starting them back successfully, the status should go back to Aligned.
Not aligned: at least one VM resource is not aligned with the proximity placement group. The specific resources which are not aligned will also be called out separately in the membership section
Accelerated Networking

Accelerated Networking – настройка реализует bypass (SR-IOV) vSwitch/Hypervisor для Mellanox сетевых карт, используемых в поддерживаемых ОС/instance Azure Cloud. Исключаем vSwitch/Hypervisor (Host) из схемы передачи трафика (соединяем VM с network card напрямую). Политики в таком случае реализуются на сетевой карте, а не на vSwitch (как в стандартном случае).

SR-IOV (сокращение от англ. Single Root Input/Output Virtualization, виртуализация ввода-вывода с единым корнем) — технология виртуализации устройств, позволяющая предоставить виртуальным машинам прямой доступ к части аппаратных возможностей устройства.

Результаты реальных тестирований: без accelerated networking производительность в 10 раз меньше (600 mbps вместо 6 gbps).

На поддерживаемых VM (ОС, instance) включается по умолчанию (не всегда по факту). При включении на неподдерживаемой ОС она может не загрузится – решение в отключении accelerated networking.

Enable accelerated networking?
I have validated that my operating system is part of the supported operating systems. If connectivity to your VM is disrupted due to incompatible OS, please disable accelerated networking here and connection will resume.

Особенности, которые нужно учесть в тексте/технически: binding приложения должен быть на synthetic NIC, а не на проброшенную bypass NIC (Mellanox Virtual Function, VF). Иначе Azure не гарантирует, что приложение получает все пакеты (пакеты будут получен только для VF NIC).

Как выглядит Virtual Function Mellanox MT27710 ConnectX-4  mlx5_core интерфейс в системе: определяется как 50G, slave интерфейс.

root@serv1:~# ifconfig enP17060s1
enP17060s1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500
ether 00:0d:3a:99:db:a3 txqueuelen 1000 (Ethernet)
RX packets 486 bytes 70661 (70.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1831 bytes 464557 (464.5 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
root@serv1:~# ethtool enP17060s1
Settings for enP17060s1:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 50000Mb/s
Duplex: Unknown! (255)
Port: Other
Transceiver: internal
Auto-negotiation: off
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000004 (4)
Link detected: yes
root@serv1:~# ethtool -i enP17060s1
driver: mlx5_core
version: 5.0-0
firmware-version: 14.25.8368 (MSF0010110035)
bus-info: f9a4:00:02.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
root@serv1:~# lshw -class network 
description: Ethernet interface
product: MT27710 Family [ConnectX-4 Lx Virtual Function]
vendor: Mellanox Technologies
physical id: 2
bus info: pci@42a4:00:02.0
logical name: enP17060s1
version: 80
serial: 00:0d:3a:99:db:a3
width: 64 bits
clock: 33MHz
capabilities: pciexpress msix bus_master cap_list ethernet physical autonegotiation
configuration: autonegotiation=off broadcast=yes driver=mlx5_core driverversion=5.0-0 firmware
=14.25.8368 (MSF0010110035) latency=0 link=yes multicast=yes slave=yes
resources: iomemory:f0-ef irq:0 memory:fe0000000-fe00fffff
description: Ethernet interface
physical id: 1
logical name: eth0
serial: 00:0d:3a:99:db:a3
capabilities: ethernet physical
configuration: autonegotiation=off broadcast=yes driver=hv_netvsc duplex=full firmware=N/A ip= link=yes multicast=yes
Applications binding to the synthetic NIC is a mandatory requirement for all applications taking advantage of Accelerated Networking.
If the application runs directly over the VF NIC, it doesn't receive all packets that are destined to the VM, since some packets show up over the synthetic interface. If you run an application over the synthetic NIC, it guarantees that the application receives all packets that are destined to it.
Accelerated networking moves much of the Azure software-defined networking stack off the CPUs and into FPGA-based SmartNICs. This change enables end-user applications to reclaim compute cycles, which puts less load on the VM, decreasing jitter and inconsistency in latency. In other words, performance can be more deterministic.
Accelerated networking improves performance by allowing the guest VM to bypass the host and establish a datapath directly with a host’s SmartNIC.
Important, please note, 
- If your VM was created individually, without an availability set, you only need to stop/deallocate the individual VM to enable Accelerated Networking. 
- If your VM was created with an availability set, all VMs contained in the availability set will need to be stopped/deallocated before enabling Accelerated Networking on any of the NICs.
If you have chosen a supported operating system and VM size, this option will automatically populate to "On."
Accelerated networking enables single root I/O virtualization (SR-IOV) to a VM, greatly improving its networking performance. This high-performance path bypasses the host from the datapath, reducing latency, jitter, and CPU utilization, for use with the most demanding network workloads on supported VM types.
Without accelerated networking, all networking traffic in and out of the VM must traverse the host and the virtual switch. The virtual switch provides all policy enforcement, such as network security groups, access control lists, isolation, and other network virtualized services to network traffic.
With accelerated networking, network traffic arrives at the virtual machine's network interface (NIC), and is then forwarded to the VM. All network policies that the virtual switch applies are now offloaded and applied in hardware. Applying policy in hardware enables the NIC to forward network traffic directly to the VM, bypassing the host and the virtual switch, while maintaining all the policy it applied in the host.
The benefits of accelerated networking only apply to the VM that it is enabled on. For the best results, it is ideal to enable this feature on at least two VMs connected to the same Azure virtual network (VNet). When communicating across VNets or connecting on-premises, this feature has minimal impact to overall latency.
If you are using a custom image, and your image supports Accelerated Networking, please make sure to have the required drivers to work with Mellanox ConnectX-3 and ConnectX-4 Lx NICs on Azure.
A supported VM size without accelerated networking enabled can only have the feature enabled when it is stopped and deallocated.
- Lower Latency / Higher packets per second (pps): Removing the virtual switch from the datapath removes the time packets spend in the host for policy processing and increases the number of packets that can be processed inside the VM.
- Reduced jitter: Virtual switch processing depends on the amount of policy that needs to be applied and the workload of the CPU that is doing the processing. Offloading the policy enforcement to the hardware removes that variability by delivering packets directly to the VM, removing the host to VM communication and all software interrupts and context switches.
- Decreased CPU utilization: Bypassing the virtual switch in the host leads to less CPU utilization for processing network traffic.


Active Connections MAX

250k for network virtual appliance

500k for endpoints

- Today, the Azure networking stack supports 1M total flows (500k inbound and 500k outbound) for a VM. Total active connections that can be handled by a VM in different scenarios are as follows.
VMs that belongs to VNET can handle 500k active connections for all VM sizes with 500k active flows in each direction.
- VMs with network virtual appliances (NVAs) such as gateway, proxy, firewall can handle 250k active connections with 500k active flows in each direction due to the forwarding and additional new flow creation on new connection setup to the next hop as shown in the above diagram.

Once this limit is hit, additional connections are dropped. Connection establishment and termination rates can also affect network performance as connection establishment and termination shares CPU with packet processing routines. We recommend that you benchmark workloads against expected traffic patterns and scale out workloads appropriately to match your performance needs.

Leave a Reply