Oracle HAIP

Our RAC clusters are set-up with Oracle HAIP (High Availability IP) in order to have redundant private interfaces for ASM and the interconnect. In this post I share some issues and experiments I had.

For this post I user a RAC on KVM on my home computer (Fedora). The two guest servers are installed with Centos 7 and Oracle/Grid 19. Initially I have only one interface for the interconnect and ASM communication. I added a second interface to build a HAIP (High availability IP).

The first issue I encountered was a database instance evicton. After a long trip with oracle support, I realized it is about a (poorly) documented setting that I had overlooked. So - very important - if you use an interface for private interconnect you must disable reverse path filtering for it

Assuming eth1 and eth2 for the private interfaces, put this in /etc/sysctl.conf

net.ipv4.conf.eth1.rp_filter = 2
net.ipv4.conf.eth2.rp_filter = 2

then execute

sysctl -p

To verify the setting

[root@ora01 orachk]# cat /proc/sys/net/ipv4/conf/eth1/rp_filter
2
[root@ora01 orachk]# cat /proc/sys/net/ipv4/conf/eth2/rp_filter
2

The second point that is crucial is the naming consistency of the interfaces on both system. My understanding is that with systemd this should not require any parameters (net.ifnames, biosdevname) or udev rules, just ensure that the HWADDR field is set in the interface configuration file and systemd will always use the name of the device in the config file (field DEVICE).

Here is a link to the set-up of my home rac cluster: Set-up Oracle RAC on libvirt

basic commands

oifcfg iflist -n -p

The -p flag implies that Oracle will make an assumption on the type of interface, it is only an assumption. So this is just a wrapper of the linux ip command, not very useful. The getif command is better.

oifcfg getif
oifcfg getif -if eth0/192.168.122.0
eth0  192.168.122.0  global  public

oifcfg getif -if eth1/10.0.0.0
eth1  10.0.0.0  global  cluster_interconnect,asm

ip addr shows that the HAIP is assigned to eth1 (look at the alias called eth1:1)

ip addr show eth1

shows

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:e0:5c:78 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.10/24 brd 10.0.0.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 169.254.30.155/19 brd 169.254.31.255 scope global eth1:1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fee0:5c78/64 scope link 
       valid_lft forever preferred_lft forever

On node 2, ip addr shows this IP

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:2f:93:2b brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.11/24 brd 10.0.0.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 169.254.7.4/19 brd 169.254.31.255 scope global eth1:1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe2f:932b/64 scope link 
       valid_lft forever preferred_lft forever

There is a clusterware resource:

crsctl status resource -t -init
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       ora01                    STABLE

Adding an interface

Now let’s add a second interface for the interconnect. This implies adding the interface at OS level and then in clusterware. Adding the interface in clusterware is done via oifcfg setif, it requires a complete shutdown of crs.

First I need to add a virtual network in libvirt, I do this using the graphical interface virt-manager

In virtual manager, go to Edit -> Connection Details -> Virtual Network. I will add a second network, called rac_private_2

Virt manager add second private network

Then I add a new interface to each of the guest, linked to this network. Copy the MAC ADDR assigned to the new interface, it will be needed to configure the interface at OS level.

Virt manager add network interface to guest

This requires a reboot of the guests

After reboot I can configure the newly added interface; add the file /etc/sysconfig/network-scripts/ifcfg-eth2. The HWADDR field must correspond to what is shown in virtmanager.

BOOTPROTO=none
DEFROUTE=no
DEVICE=eth2
GATEWAY=10.0.10.1
IPADDR=10.0.10.10
NETMASK=255.255.255.0
ONBOOT=yes
HWADDR=52:54:00:36:bf:80
TYPE=Ethernet
USERCTL=no
NM_CONTROLLED=no

on node 2 I use the IPADDR 10.0.10.11. Don’t forget to adapt the HWADDR field

Start the interface

ip link set eth2 up

While crs is running we can set a new interface

oifcfg setif -global eth2/10.0.10.0:cluster_interconnect,asm

In order to add or remove a private interface a complete stop/start of crs on both nodes is required, i.e. a rolling restart is not enough.

crsctl stop clusterware -all

then on each node

crsctl stop crs
crsctl start crs

check the file ohasd_orarootagent_root for messages related to HAIP

2020-05-13 15:58:29.131 : USRTHRD:4093638400: [     INFO] {0:5:3} Thread:[NetHAMain] InitializeHaIps[ 1]  infList 'inf eth2, ip 10.0.10.10, sub 10.0.10.0'
2020-05-13 15:58:29.131 : USRTHRD:4093638400: [     INFO] {0:5:3} Thread:[NetHAMain] HAIP: found in HaipList 'inf eth2, ip 10.0.10.10, sub 10.0.10.0'
2020-05-13 15:58:29.131 : USRTHRD:4093638400: [     INFO] {0:5:3} Thread:[NetHAMain] InitializeHaIps[ 0]  infList 'inf eth1, ip 10.0.0.10, sub 10.0.0.0'
2020-05-13 15:58:29.131 : USRTHRD:4093638400: [     INFO] {0:5:3} Thread:[NetHAMain] HAIP: found in HaipList 'inf eth1, ip 10.0.0.10, sub 10.0.0.0'

check the resource

crsctl status resource ora.cluster_interconnect.haip -init

check the IP’s, each private interface will have an HAIP assigned to it, in the form 169.254.x.x

ip addr

On ora01 (node 1)

...

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:e0:5c:78 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.10/24 brd 10.0.0.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 169.254.6.103/20 brd 169.254.15.255 scope global eth1:1
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:36:bf:80 brd ff:ff:ff:ff:ff:ff
    inet 10.0.10.10/24 brd 10.0.10.255 scope global eth2
       valid_lft forever preferred_lft forever
    inet 169.254.24.177/20 brd 169.254.31.255 scope global eth2:1
       valid_lft forever preferred_lft forever

on ora02 (node1)

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:2f:93:2b brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.11/24 brd 10.0.0.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 169.254.15.177/20 brd 169.254.15.255 scope global eth1:1
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:ac:32:c5 brd ff:ff:ff:ff:ff:ff
    inet 10.0.10.11/24 brd 10.0.10.255 scope global eth2
       valid_lft forever preferred_lft forever
    inet 169.254.16.246/20 brd 169.254.31.255 scope global eth2:1
       valid_lft forever preferred_lft forever

In the alert file of the database, we see this info on instance 1

Cluster Communication is configured to use IPs from: GPnP
IP: 169.254.6.103        Subnet: 169.254.0.0
IP: 169.254.24.177       Subnet: 169.254.16.0

check HAIP in the database. For instance 1 the IP’s will correspond to what was dumped in the alert file.

select * from gv$cluster_interconnects;

Simulate interface loss

on node 1

if down eth2

in crs alert file

2020-05-13 16:27:00.871 [GIPCD(21256)]CRS-42216: No interfaces are configured on the local node for interface definition eth2(:.*)?:10.0.10.0: available interface definitions are [eth0(:.*)?:192.168.122.0][eth0:2(:.*)?:192.168.122.0][eth0:3(:.*)?:192.168.122.0][eth0:4(:.*)?:192.168.122.0][eth1:1(:.*)?:169.254.0.0][eth1:2(:.*)?:169.254.16.0][eth0(:.*)?:[fe80:0:0:0:0:0:0:0]][eth1(:.*)?:10.0.0.0].

in ohasd_orarootagent_root

2020-05-13 16:26:58.878 : USRTHRD:4093638400: [     INFO] {0:5:3} HAIP:  Moving ip '169.254.24.177' from inf 'eth2' to inf 'eth1'
ifup eth2 

the HAIP will be assigned back to eth2

Remove an interface

[root@ora01 orachk]# oifcfg getif -global -if  eth2
eth2  10.0.10.0  global  cluster_interconnect,asm
crsctl stop clusterware -all

this stops all resources, except gipcd, gpnd, mdnsd. To stop the complete stack, execute stop crs on each node.

crsctl stop crs

On each node

crsctl start crs

Check that the interface is removed

oifcfg getif -global -if eth2
Written on March 14, 2020