Oracle HAIP
Our RAC clusters are set-up with Oracle HAIP (High Availability IP) in order to have redundant private interfaces for ASM and the interconnect. In this post I share some issues and experiments I had.
For this post I user a RAC on KVM on my home computer (Fedora). The two guest servers are installed with Centos 7 and Oracle/Grid 19. Initially I have only one interface for the interconnect and ASM communication. I added a second interface to build a HAIP (High availability IP).
The first issue I encountered was a database instance evicton. After a long trip with oracle support, I realized it is about a (poorly) documented setting that I had overlooked. So - very important - if you use an interface for private interconnect you must disable reverse path filtering for it
Assuming eth1 and eth2 for the private interfaces, put this in /etc/sysctl.conf
net.ipv4.conf.eth1.rp_filter = 2
net.ipv4.conf.eth2.rp_filter = 2
then execute
sysctl -p
To verify the setting
[root@ora01 orachk]# cat /proc/sys/net/ipv4/conf/eth1/rp_filter
2
[root@ora01 orachk]# cat /proc/sys/net/ipv4/conf/eth2/rp_filter
2
The second point that is crucial is the naming consistency of the interfaces on both system. My understanding is that with systemd this should not require any parameters (net.ifnames, biosdevname) or udev rules, just ensure that the HWADDR field is set in the interface configuration file and systemd will always use the name of the device in the config file (field DEVICE).
Here is a link to the set-up of my home rac cluster: Set-up Oracle RAC on libvirt
basic commands
oifcfg iflist -n -p
The -p flag implies that Oracle will make an assumption on the type of interface, it is only an assumption. So this is just a wrapper of the linux ip command, not very useful. The getif command is better.
oifcfg getif
oifcfg getif -if eth0/192.168.122.0
eth0 192.168.122.0 global public
oifcfg getif -if eth1/10.0.0.0
eth1 10.0.0.0 global cluster_interconnect,asm
ip addr shows that the HAIP is assigned to eth1 (look at the alias called eth1:1)
ip addr show eth1
shows
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:e0:5c:78 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.10/24 brd 10.0.0.255 scope global eth1
valid_lft forever preferred_lft forever
inet 169.254.30.155/19 brd 169.254.31.255 scope global eth1:1
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fee0:5c78/64 scope link
valid_lft forever preferred_lft forever
On node 2, ip addr shows this IP
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:2f:93:2b brd ff:ff:ff:ff:ff:ff
inet 10.0.0.11/24 brd 10.0.0.255 scope global eth1
valid_lft forever preferred_lft forever
inet 169.254.7.4/19 brd 169.254.31.255 scope global eth1:1
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe2f:932b/64 scope link
valid_lft forever preferred_lft forever
There is a clusterware resource:
crsctl status resource -t -init
ora.cluster_interconnect.haip
1 ONLINE ONLINE ora01 STABLE
Adding an interface
Now let’s add a second interface for the interconnect. This implies adding the interface at OS level and then in clusterware. Adding the interface in clusterware is done via oifcfg setif, it requires a complete shutdown of crs.
First I need to add a virtual network in libvirt, I do this using the graphical interface virt-manager
In virtual manager, go to Edit -> Connection Details -> Virtual Network. I will add a second network, called rac_private_2
Then I add a new interface to each of the guest, linked to this network. Copy the MAC ADDR assigned to the new interface, it will be needed to configure the interface at OS level.
This requires a reboot of the guests
After reboot I can configure the newly added interface; add the file /etc/sysconfig/network-scripts/ifcfg-eth2. The HWADDR field must correspond to what is shown in virtmanager.
BOOTPROTO=none
DEFROUTE=no
DEVICE=eth2
GATEWAY=10.0.10.1
IPADDR=10.0.10.10
NETMASK=255.255.255.0
ONBOOT=yes
HWADDR=52:54:00:36:bf:80
TYPE=Ethernet
USERCTL=no
NM_CONTROLLED=no
on node 2 I use the IPADDR 10.0.10.11. Don’t forget to adapt the HWADDR field
Start the interface
ip link set eth2 up
While crs is running we can set a new interface
oifcfg setif -global eth2/10.0.10.0:cluster_interconnect,asm
In order to add or remove a private interface a complete stop/start of crs on both nodes is required, i.e. a rolling restart is not enough.
crsctl stop clusterware -all
then on each node
crsctl stop crs
crsctl start crs
check the file ohasd_orarootagent_root for messages related to HAIP
2020-05-13 15:58:29.131 : USRTHRD:4093638400: [ INFO] {0:5:3} Thread:[NetHAMain] InitializeHaIps[ 1] infList 'inf eth2, ip 10.0.10.10, sub 10.0.10.0'
2020-05-13 15:58:29.131 : USRTHRD:4093638400: [ INFO] {0:5:3} Thread:[NetHAMain] HAIP: found in HaipList 'inf eth2, ip 10.0.10.10, sub 10.0.10.0'
2020-05-13 15:58:29.131 : USRTHRD:4093638400: [ INFO] {0:5:3} Thread:[NetHAMain] InitializeHaIps[ 0] infList 'inf eth1, ip 10.0.0.10, sub 10.0.0.0'
2020-05-13 15:58:29.131 : USRTHRD:4093638400: [ INFO] {0:5:3} Thread:[NetHAMain] HAIP: found in HaipList 'inf eth1, ip 10.0.0.10, sub 10.0.0.0'
check the resource
crsctl status resource ora.cluster_interconnect.haip -init
check the IP’s, each private interface will have an HAIP assigned to it, in the form 169.254.x.x
ip addr
On ora01 (node 1)
...
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:e0:5c:78 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.10/24 brd 10.0.0.255 scope global eth1
valid_lft forever preferred_lft forever
inet 169.254.6.103/20 brd 169.254.15.255 scope global eth1:1
valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:36:bf:80 brd ff:ff:ff:ff:ff:ff
inet 10.0.10.10/24 brd 10.0.10.255 scope global eth2
valid_lft forever preferred_lft forever
inet 169.254.24.177/20 brd 169.254.31.255 scope global eth2:1
valid_lft forever preferred_lft forever
on ora02 (node1)
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:2f:93:2b brd ff:ff:ff:ff:ff:ff
inet 10.0.0.11/24 brd 10.0.0.255 scope global eth1
valid_lft forever preferred_lft forever
inet 169.254.15.177/20 brd 169.254.15.255 scope global eth1:1
valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:ac:32:c5 brd ff:ff:ff:ff:ff:ff
inet 10.0.10.11/24 brd 10.0.10.255 scope global eth2
valid_lft forever preferred_lft forever
inet 169.254.16.246/20 brd 169.254.31.255 scope global eth2:1
valid_lft forever preferred_lft forever
In the alert file of the database, we see this info on instance 1
Cluster Communication is configured to use IPs from: GPnP
IP: 169.254.6.103 Subnet: 169.254.0.0
IP: 169.254.24.177 Subnet: 169.254.16.0
check HAIP in the database. For instance 1 the IP’s will correspond to what was dumped in the alert file.
select * from gv$cluster_interconnects;
Simulate interface loss
on node 1
if down eth2
in crs alert file
2020-05-13 16:27:00.871 [GIPCD(21256)]CRS-42216: No interfaces are configured on the local node for interface definition eth2(:.*)?:10.0.10.0: available interface definitions are [eth0(:.*)?:192.168.122.0][eth0:2(:.*)?:192.168.122.0][eth0:3(:.*)?:192.168.122.0][eth0:4(:.*)?:192.168.122.0][eth1:1(:.*)?:169.254.0.0][eth1:2(:.*)?:169.254.16.0][eth0(:.*)?:[fe80:0:0:0:0:0:0:0]][eth1(:.*)?:10.0.0.0].
in ohasd_orarootagent_root
2020-05-13 16:26:58.878 : USRTHRD:4093638400: [ INFO] {0:5:3} HAIP: Moving ip '169.254.24.177' from inf 'eth2' to inf 'eth1'
ifup eth2
the HAIP will be assigned back to eth2
Remove an interface
[root@ora01 orachk]# oifcfg getif -global -if eth2
eth2 10.0.10.0 global cluster_interconnect,asm
crsctl stop clusterware -all
this stops all resources, except gipcd, gpnd, mdnsd. To stop the complete stack, execute stop crs on each node.
crsctl stop crs
On each node
crsctl start crs
Check that the interface is removed
oifcfg getif -global -if eth2