Failed to promote replica devices on NetApp cluster mode. | Settlersoman

Recently my customer as a standard procedure wanted to check if SRM works well. Unfortunately, they have got problem during come back to the DC with the error:

Error - Failed to promote replica devices. Failed to promote replica device '//netapp184/vmware_55/lun_repl'. SRA command 'failover' failed for device '//netapp184/vmware_55/lun_repl'. Failed to map the SAN device on an igroup of ostype vmware Ensure that the device is not mapped to an already existing igroup.

The environment architecture was NetApp Cluster Mode with SRM 5.1 and following NetApp KB:

the solution is to ensure that the device is not mapped to an already exisiting igroup.

Everything is simple but... as I was informed everything worked well till customer added standalone ESXi host (with some SRM protected VMs) to DRS/HA cluster.

Following VMware documentation:

SRM registers virtual machines across the available ESX hosts in a round-robin order, to distribute the potential load as evenly as possible. SRM always uses DRS placement to balance the load intelligently across hosts before it powers on recovered virtual machines on the recovery site, even if DRS is disabled on the cluster. If DRS is enabled and in fully automatic mode, DRS might move other virtual machines to further balance the load across the cluster while SRM is powering on the recovered virtual machines. DRS continues to balance all virtual machines across the cluster after SRM has powered on the recovered virtual machines.

It means that after adding the ESXi host to cluster , a VM can be started on another ESXi host even if DRS is disabled on the cluster.

The problem has been solved by checking and correcting mapping/igroups on all ESXi hosts.

Note:

VMware SRM and NetApp SRA support the use of the FC protocol at one site and the iSCSI protocol at the other site. It does not support having a mix of FC-attached datastores and iSCSI-attached datastores in the same ESXi host or in different hosts in the same cluster. My another customer had similar problem with SRM because they had a mix of FC/ISCSI-attached datastores.