IPMP failures on bge Interface

We've been testing IPMP on Solaris Sparc hosts that also have the Apani IPSec Agent installed. It works fine on older hosts that have 'qfe' and 'le' interfaces, but our v210's and T1000's with 'bge' interfaces have a problem. If we configure an IPMP group to use, say, bge0 and bge1 (with bge0 as the primary interface), it works fine. Disconnecting bge0 causes a failover to bge1, also fine. Disconnecting bge1 causes the following errors:

-

Nov 2 10:32:29 cs22 in.mpathd[146]: NIC failure detected on bge1 of group test

Nov 2 10:32:29 cs22 in.mpathd[146]: Successfully failed over from NIC bge1 to NIC bge0

Nov 2 10:32:37 cs2 in.mpathd[146]: All Interfaces in group test have failed

-

All interfaces fail, even though bge0 is still connected and was active before disconnecting bge1. The system recovers once bge0 is reconnected. The two interfaces are physically connected to the same switch, and the hostname.bgeX files are:

-- hostname.bge0

cs22 netmask + broadcast + group test up \

addif cs21 deprecated -failover netmask + broadcast + up

-- hostname.bge1

sp12 netmask + broadcast + group test up \

addif sp16 deprecated -failover netmask + broadcast + up

Any help would be appreciated, thanks in advance.

[1308 byte] By [CS@apani] at [2007-11-26 11:13:23]
# 1

Could you post:

+ showrev

+ netstat -nr

+ /etc/hosts file

+ ifconfig -a (when bge0 and bge1 are connected)

+ ifconfig -a (after removing bge1)

+ ifconfig -a (after inserting bge1)

+ ifconfig -a (after removing bge0)

+ ifconfig -a (after inserting bge0)

+ /var/adm/messages file

id8102257 at 2007-7-7 3:27:58 > top of Java-index,Solaris Operating System,Solaris 10 Features...
# 2

Thanks for replying. Here's the requested information:

-> showrev

Hostname: cstoc77022

Hostid: 842a9b82

Release: 5.10

Kernel architecture: sun4v

Application architecture: sparc

Hardware provider: Sun_Microsystems

Domain: nis.nl.com

Kernel version: SunOS 5.10 Generic_118833-03

-> netstat -rn

Routing Table: IPv4

DestinationGatewayFlags RefUseInterface

-- -- -- --

63.192.85.64 63.192.77.9 UG10 bge0

63.192.78.0 63.192.77.9 UG10 bge0

63.192.77.0 63.192.77.22 U 1162 bge0

63.192.77.0 63.192.77.12 U 112 bge1

63.192.77.0 63.192.77.12 U 10 bge0:1

63.192.77.0 63.192.77.12 U 10 bge1:1

63.192.76.0 63.192.77.9 UG10 bge0

10.3.0.0 63.192.77.92 UG10 bge0

172.20.0.063.192.77.4 UG10 bge0

172.16.0.063.192.77.9 UG10 bge0

10.0.0.0 63.192.77.9 UG10 bge0

224.0.0.063.192.77.22 U 10 bge0

127.0.0.1127.0.0.1UH7328 lo0

-> more /etc/hosts

#

# Internet host table

#

127.0.0.1localhost

63.192.77.22cstoc77022loghost

63.192.77.1mls1

- BOTH CONNECTED: bge0, bge1

-> ifconfig -a

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

bge0: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 2

inet 63.192.77.22 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:82

bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2

inet 63.192.77.21 netmask ffffff00 broadcast 63.192.77.255

bge1: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 3

inet 63.192.77.12 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:83

bge1:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3

inet 63.192.77.16 netmask ffffff00 broadcast 63.192.77.255

- REMOVING bge1

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

bge0: flags=1011000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FAILED,FIXEDMTU> mtu 1442 index 2

inet 63.192.77.22 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:82

bge0:1: flags=19040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 i

ndex 2

inet 63.192.77.21 netmask ffffff00 broadcast 63.192.77.255

bge0:2: flags=1011000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FAILED,FIXEDMTU> mtu 1442 index 2

inet 63.192.77.12 netmask ffffff00 broadcast 63.192.77.255

bge1: flags=1019000802<BROADCAST,MULTICAST,IPv4,NOFAILOVER,FAILED,FIXEDMTU> mtu 0 index 3

inet 0.0.0.0 netmask 0

groupname test

ether 0:14:4f:2a:9b:83

bge1:1: flags=19040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 3

inet 63.192.77.16 netmask ffffff00 broadcast 63.192.77.255

Nov 2 13:00:22 cstoc77022 bge: NOTICE: bge1: link down

Nov 2 13:00:22 cstoc77022 in.mpathd[146]: The link has gone down on bge1

Nov 2 13:00:22 cstoc77022 in.mpathd[146]: NIC failure detected on bge1 of group test

Nov 2 13:00:22 cstoc77022 in.mpathd[146]: Successfully failed over from NIC bge1 to NIC bge0

Nov 2 13:00:30 cstoc77022 in.mpathd[146]: All Interfaces in group test have failed

- INSERTING bge1

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

bge0: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 2

inet 63.192.77.22 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:82

bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2

inet 63.192.77.21 netmask ffffff00 broadcast 63.192.77.255

bge1: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 3

inet 63.192.77.12 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:83

bge1:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3

inet 63.192.77.16 netmask ffffff00 broadcast 63.192.77.255

Nov 2 13:01:59 cstoc77022 bge: NOTICE: bge1: link up 100Mbps Full-Duplex

Nov 2 13:01:59 cstoc77022 in.mpathd[146]: The link has come up on bge1

Nov 2 13:02:14 cstoc77022 in.mpathd[146]: NIC repair detected on bge1 of group test

Nov 2 13:02:14 cstoc77022 in.mpathd[146]: Successfully failed back to NIC bge1

Nov 2 13:02:14 cstoc77022 in.mpathd[146]: At least 1 interface (bge1) of group test has repaired

Nov 2 13:02:14 cstoc77022 in.mpathd[146]: NIC repair detected on bge0 of group test

Nov 2 13:02:14 cstoc77022 in.mpathd[146]: Successfully failed back to NIC bge0

- REMOVING bge0

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

bge0: flags=1019000802<BROADCAST,MULTICAST,IPv4,NOFAILOVER,FAILED,FIXEDMTU> mtu 0 index 2

inet 0.0.0.0 netmask 0

groupname test

ether 0:14:4f:2a:9b:82

bge0:1: flags=19040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 2

inet 63.192.77.21 netmask ffffff00 broadcast 63.192.77.255

bge1: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 3

inet 63.192.77.12 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:83

bge1:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3

inet 63.192.77.16 netmask ffffff00 broadcast 63.192.77.255

bge1:2: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 3

inet 63.192.77.22 netmask ffffff00 broadcast 63.192.77.255

Nov 2 13:03:20 cstoc77022 in.mpathd[146]: The link has gone down on bge0

Nov 2 13:03:20 cstoc77022 in.mpathd[146]: NIC failure detected on bge0 of group test

Nov 2 13:03:20 cstoc77022 in.mpathd[146]: Successfully failed over from NIC bge0 to NIC bge1

- INSERTING bge0

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

bge0: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 2

inet 63.192.77.22 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:82

bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2

inet 63.192.77.21 netmask ffffff00 broadcast 63.192.77.255

bge1: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 3

inet 63.192.77.12 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:83

bge1:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3

inet 63.192.77.16 netmask ffffff00 broadcast 63.192.77.255

Nov 2 13:04:20 cstoc77022 bge: NOTICE: bge0: link up 100Mbps Full-Duplex

Nov 2 13:04:20 cstoc77022 in.mpathd[146]: The link has come up on bge0

Nov 2 13:04:34 cstoc77022 in.mpathd[146]: NIC repair detected on bge0 of group test

Nov 2 13:04:34 cstoc77022 ip: WARNING: IP: Proxy ARP problem? Hardware address '00:14:4f:2a:9b:82

' thinks it is 063.192.077.022

Nov 2 13:04:34 cstoc77022 in.mpathd[146]: Successfully failed back to NIC bge0

/var/adm/messages

Nov 2 12:55:54 cstoc77022 nfs: [ID 664466 kern.notice] NFS getattr failed for server mls1: error 7 (RPC: Authentication error)

Nov 2 12:57:23 cstoc77022 last message repeated 5 times

Nov 2 13:00:22 cstoc77022 bge: [ID 801593 kern.notice] NOTICE: bge1: link down

Nov 2 13:00:22 cstoc77022 in.mpathd[146]: [ID 215189 daemon.error] The link has gone down on bge1

Nov 2 13:00:22 cstoc77022 in.mpathd[146]: [ID 594170 daemon.error] NIC failure detected on bge1 of group test

Nov 2 13:00:22 cstoc77022 in.mpathd[146]: [ID 832587 daemon.error] Successfully failed over from NIC bge1 to NIC bge0

Nov 2 13:00:30 cstoc77022 in.mpathd[146]: [ID 168056 daemon.error] All Interfaces in group test have failed

Nov 2 13:01:59 cstoc77022 bge: [ID 801593 kern.notice] NOTICE: bge1: link up 100Mbps Full-Duplex

Nov 2 13:01:59 cstoc77022 in.mpathd[146]: [ID 820239 daemon.error] The link has come up on bge1

Nov 2 13:02:14 cstoc77022 in.mpathd[146]: [ID 299542 daemon.error] NIC repair detected on bge1 of group test

Nov 2 13:02:14 cstoc77022 in.mpathd[146]: [ID 620804 daemon.error] Successfully failed back to NIC bge1

Nov 2 13:02:14 cstoc77022 in.mpathd[146]: [ID 237757 daemon.error] At least 1 interface (bge1) of group test has repaired

Nov 2 13:02:14 cstoc77022 in.mpathd[146]: [ID 299542 daemon.error] NIC repair detected on bge0 of group test

Nov 2 13:02:14 cstoc77022 in.mpathd[146]: [ID 620804 daemon.error] Successfully failed back to NIC bge0

Nov 2 13:02:55 cstoc77022 nfs: [ID 664466 kern.notice] NFS getattr failed for server mls1: error 7 (RPC: Authentication error)

Nov 2 13:02:55 cstoc77022 last message repeated 1 time

Nov 2 13:03:20 cstoc77022 bge: [ID 801593 kern.notice] NOTICE: bge0: link down

Nov 2 13:03:20 cstoc77022 in.mpathd[146]: [ID 215189 daemon.error] The link has gone down on bge0

Nov 2 13:03:20 cstoc77022 in.mpathd[146]: [ID 594170 daemon.error] NIC failure detected on bge0 of group test

Nov 2 13:03:20 cstoc77022 in.mpathd[146]: [ID 832587 daemon.error] Successfully failed over from NIC bge0 to NIC bge1

Nov 2 13:04:20 cstoc77022 bge: [ID 801593 kern.notice] NOTICE: bge0: link up 100Mbps Full-Duplex

Nov 2 13:04:20 cstoc77022 in.mpathd[146]: [ID 820239 daemon.error] The link has come up on bge0

Nov 2 13:04:34 cstoc77022 in.mpathd[146]: [ID 299542 daemon.error] NIC repair detected on bge0 of group test

Nov 2 13:04:34 cstoc77022 ip: [ID 388441 kern.warning] WARNING: IP: Proxy ARP problem? Hardware address '00:14:4f:2a:9b:82' thinks it is 063.192.077.022

Nov 2 13:04:34 cstoc77022 in.mpathd[146]: [ID 620804 daemon.error] Successfully failed back to NIC bge0

Nov 2 13:04:44 cstoc77022 in.routed[158]: [ID 559541 daemon.warning] 10.0.0.0 --> 63.192.77.9 disappeared from kernel

Nov 2 13:04:44 cstoc77022 in.routed[158]: [ID 559541 daemon.warning] 63.192.78.0/24 --> 63.192.77.9 disappeared from kernel

Nov 2 13:04:44 cstoc77022 in.routed[158]: [ID 559541 daemon.warning] 63.192.85.64/27 --> 63.192.77.9 disappeared from kernel

Nov 2 13:04:44 cstoc77022 in.routed[158]: [ID 559541 daemon.warning] 172.20.0.0 --> 63.192.77.4 disappeared from kernel

Nov 2 13:04:44 cstoc77022 in.routed[158]: [ID 559541 daemon.warning] 10.3.0.0/16 --> 63.192.77.92 disappeared from kernel

Nov 2 13:04:44 cstoc77022 in.routed[158]: [ID 559541 daemon.warning] 172.16.0.0 --> 63.192.77.9 disappeared from kernel

Nov 2 13:04:44 cstoc77022 in.routed[158]: [ID 559541 daemon.warning] 63.192.76.0/24 --> 63.192.77.9 disappeared from kernel

Nov 2 13:05:31 cstoc77022 nfs: [ID 664466 kern.notice] NFS getattr failed for server mls1: error 7 (RPC: Authentication error)

CS@apani at 2007-7-7 3:27:58 > top of Java-index,Solaris Operating System,Solaris 10 Features...
# 3

Could you post:

+ routeadm

+ arp -an

+ ps -aef

+ /etc/defaultrouter file

Check your box is not acting as a router.

Your box does not have a default router. Is that configuration right?

The documentation states:

<< Routers that are connected to the IP link are automatically selected as targets for probing. If no routers exist on the link, in.mpathd sends probes to neighbor hosts on the link. A multicast packet that is sent to the all host multicast address.

...snip...

If in.mpathd cannot find routers or hosts that responded to the ICMP echo packets, in.mpathd cannot detect probe-based failures.>>

Is it allowed to send ICMP echo packet to the routers in the other networks?

id8102257 at 2007-7-7 3:27:58 > top of Java-index,Solaris Operating System,Solaris 10 Features...
# 4

Hello again,

When gathering data for the previous reply, I also noticed that the default route had not been set. We usually do specify that, so I added that to the configuration. But, the host had found the correct router previously, it's 63.192.77.9. Specifying it did not change the problem symptoms, anyway. Here's the other requested info:

-> netstat -rn

Routing Table: IPv4

DestinationGatewayFlags RefUseInterface

-- -- -- --

63.192.77.0 63.192.77.12 U 15 bge1

63.192.77.0 63.192.77.22 U 11 bge0

63.192.77.0 63.192.77.22 U 10 bge0:1

63.192.77.0 63.192.77.12 U 10 bge1:1

224.0.0.063.192.77.22 U 10 bge0

default 63.192.77.9 UG10

127.0.0.1127.0.0.1UH793 lo0

-> routeadm

ConfigurationCurrent Current

OptionConfigurationSystem State

IPv4 forwardingdisabled disabled

IPv4 routingdefault (disabled)disabled

IPv6 forwardingdisabled disabled

IPv6 routingdisabled disabled

IPv4 routing daemon"/usr/sbin/in.routed"

IPv4 routing daemon args""

IPv4 routing daemon stop"kill -TERM `cat /var/tmp/in.routed.pid`"

IPv6 routing daemon"/usr/lib/inet/in.ripngd"

IPv6 routing daemon args"-s"

IPv6 routing daemon stop"kill -TERM `cat /var/tmp/in.ripngd.pid`"

r

-> arp -an

Net to Media Table: IPv4

DeviceIP AddressMaskFlagsPhys Addr

-- --

bge163.192.77.1 255.255.255.25500:03:ba:c0:77:75

bge063.192.77.9 255.255.255.25500:16:46:f1:b5:c2

bge163.192.77.9 255.255.255.25500:16:46:f1:b5:c2

bge163.192.77.186255.255.255.25500:c0:4f:60:6a:ab

bge063.192.77.186255.255.255.25500:c0:4f:60:6a:ab

bge163.192.77.191255.255.255.25500:0c:f1:bf:1d:01

bge063.192.77.191255.255.255.25500:0c:f1:bf:1d:01

bge163.192.77.169255.255.255.25500:0c:f1:bf:1c:92

bge063.192.77.169255.255.255.25500:0c:f1:bf:1c:92

bge163.192.77.175255.255.255.25500:c0:4f:60:68:64

bge063.192.77.175255.255.255.25500:c0:4f:60:68:64

bge163.192.77.144255.255.255.25500:c0:4f:60:68:94

bge063.192.77.144255.255.255.25500:c0:4f:60:68:94

bge163.192.77.150255.255.255.25500:c0:4f:60:6a:70

bge063.192.77.150255.255.255.25500:c0:4f:60:6a:70

bge063.192.77.130255.255.255.25500:0c:f1:bf:1d:1f

bge163.192.77.130255.255.255.25500:0c:f1:bf:1d:1f

bge163.192.77.128255.255.255.25500:0c:f1:bf:1c:65

bge063.192.77.128255.255.255.25500:0c:f1:bf:1c:65

bge163.192.77.242255.255.255.25500:0d:56:0b:eb:2a

bge063.192.77.242255.255.255.25500:0d:56:0b:eb:2a

bge163.192.77.243255.255.255.25500:0f:1f:91:c1:9b

bge063.192.77.243255.255.255.25500:0f:1f:91:c1:9b

bge163.192.77.240255.255.255.25500:13:72:17:cb:13

bge063.192.77.240255.255.255.25500:13:72:17:cb:13

bge163.192.77.247255.255.255.25500:c0:4f:60:6a:e6

bge063.192.77.247255.255.255.25500:c0:4f:60:6a:e6

bge163.192.77.224255.255.255.25500:09:6b:2e:61:dd

bge063.192.77.224255.255.255.25500:09:6b:2e:61:dd

bge163.192.77.225255.255.255.25500:11:11:c4:9c:eb

bge063.192.77.225255.255.255.25500:11:11:c4:9c:eb

bge163.192.77.236255.255.255.25500:03:ba:eb:17:6d

bge063.192.77.236255.255.255.25500:03:ba:eb:17:6d

bge163.192.77.210255.255.255.25500:11:11:b1:2b:6e

bge063.192.77.210255.255.255.25500:11:11:b1:2b:6e

bge163.192.77.222255.255.255.25500:30:6e:08:ed:3a

bge063.192.77.222255.255.255.25500:30:6e:08:ed:3a

bge163.192.77.193255.255.255.25500:13:72:23:32:aa

bge063.192.77.193255.255.255.25500:13:72:23:32:aa

bge163.192.77.207255.255.255.25500:0c:f1:b6:26:aa

bge063.192.77.207255.255.255.25500:0c:f1:b6:26:aa

bge163.192.77.204255.255.255.25500:c0:4f:60:68:5b

bge063.192.77.204255.255.255.25500:c0:4f:60:68:5b

bge163.192.77.48 255.255.255.25500:0a:95:99:e4:40

bge063.192.77.48 255.255.255.25500:0a:95:99:e4:40

bge063.192.77.49 255.255.255.25500:03:93:90:52:f6

bge163.192.77.61 255.255.255.25500:c0:4f:60:6a:75

bge063.192.77.61 255.255.255.25500:c0:4f:60:6a:75

bge163.192.77.35 255.255.255.25500:30:6e:49:41:50

bge063.192.77.35 255.255.255.25500:30:6e:49:41:50

bge163.192.77.36 255.255.255.25500:16:35:3e:7d:0a

bge063.192.77.36 255.255.255.25500:16:35:3e:7d:0a

bge063.192.77.42 255.255.255.25500:11:11:c4:9d:05

bge163.192.77.42 255.255.255.25500:11:11:c4:9d:05

bge163.192.77.40 255.255.255.25500:0c:f1:bf:1f:8d

bge063.192.77.40 255.255.255.25500:0c:f1:bf:1f:8d

bge163.192.77.41 255.255.255.25500:0c:f1:bf:1d:10

bge063.192.77.41 255.255.255.25500:0c:f1:bf:1d:10

bge063.192.77.19 255.255.255.25508:00:20:f0:ea:e4

bge163.192.77.19 255.255.255.25508:00:20:f0:ea:e4

bge163.192.77.16 255.255.255.255 SP00:14:4f:2a:9b:83

bge063.192.77.22 255.255.255.255 SP00:14:4f:2a:9b:82

bge063.192.77.23 255.255.255.25500:09:6b:3e:2b:82

bge163.192.77.23 255.255.255.25500:09:6b:3e:2b:82

bge063.192.77.21 255.255.255.255 SP00:14:4f:2a:9b:82

bge163.192.77.29 255.255.255.25500:09:6b:2e:46:51

bge063.192.77.29 255.255.255.25500:09:6b:2e:46:51

bge063.192.77.1 255.255.255.25500:03:ba:c0:77:75

bge163.192.77.12 255.255.255.255 SP00:14:4f:2a:9b:83

bge063.192.77.115255.255.255.25500:0c:f1:bf:1c:e6

bge163.192.77.115255.255.255.25500:0c:f1:bf:1c:e6

bge163.192.77.122255.255.255.25500:10:83:f9:34:d4

bge063.192.77.122255.255.255.25500:10:83:f9:34:d4

bge163.192.77.125255.255.255.25500:0f:1f:91:bf:7d

bge063.192.77.125255.255.255.25500:0f:1f:91:bf:7d

bge163.192.77.99 255.255.255.25500:0c:f1:bf:1a:52

bge063.192.77.99 255.255.255.25500:0c:f1:bf:1a:52

bge163.192.77.100255.255.255.25500:0c:f1:b6:26:b4

bge063.192.77.100255.255.255.25500:0c:f1:b6:26:b4

bge163.192.77.101255.255.255.25500:0c:f1:bf:1c:fe

bge063.192.77.101255.255.255.25500:0c:f1:bf:1c:fe

bge163.192.77.107255.255.255.25500:0d:56:14:48:4d

bge063.192.77.107255.255.255.25500:0d:56:14:48:4d

bge163.192.77.110255.255.255.25500:c0:4f:60:6a:44

bge063.192.77.110255.255.255.25500:c0:4f:60:6a:44

bge163.192.77.108255.255.255.25500:14:bf:31:ec:e2

bge063.192.77.108255.255.255.25500:14:bf:31:ec:e2

bge063.192.77.80 255.255.255.25500:16:cb:a6:5e:3d

bge163.192.77.80 255.255.255.25500:16:cb:a6:5e:3d

bge163.192.77.92 255.255.255.25500:40:63:d3:8c:46

bge063.192.77.92 255.255.255.25500:40:63:d3:8c:46

bge163.192.77.68 255.255.255.25500:0c:f1:b6:27:10

bge063.192.77.68 255.255.255.25500:0c:f1:b6:27:10

bge163.192.77.69 255.255.255.25500:13:72:17:ca:4a

bge063.192.77.69 255.255.255.25500:13:72:17:ca:4a

bge163.192.77.73 255.255.255.25500:03:93:d1:db:cc

bge063.192.77.73 255.255.255.25500:03:93:d1:db:cc

bge163.192.77.77 255.255.255.25500:30:65:a8:22:bc

bge063.192.77.77 255.255.255.25500:30:65:a8:22:bc

bge1224.0.0.0240.0.0.0SM01:00:5e:00:00:00

bge0224.0.0.0240.0.0.0SM01:00:5e:00:00:00

-> ps -aef

UIDPID PPIDCSTIME TTY TIME CMD

root000 15:11:12 ?0:11 sched

root100 15:11:13 ?0:00 /sbin/init

root200 15:11:13 ?0:00 pageout

root300 15:11:13 ?0:00 fsflush

daemon19610 15:11:37 ?0:00 /usr/sbin/rpcbind

root710 15:11:15 ?0:10 /lib/svc/bin/svc.startd

root910 15:11:16 ?0:16 /lib/svc/bin/svc.configd

root25610 15:11:40 ?0:00 /usr/sbin/cron

root33510 15:11:49 ?0:00 /usr/sbin/syslogd

root11310 15:11:33 ?0:00 /usr/sbin/nscd -S passwd,yes

root7266910 15:16:16 pts/10:00 ps -aef

daemon20110 15:11:37 ?0:00 /usr/lib/nfs/statd

root20010 15:11:37 ?0:00 /usr/sbin/keyserv

root19210 15:11:36 ?0:01 /opt/apani/uagent/nlagent

daemon8610 15:11:26 ?0:00 /usr/lib/crypto/kcfd

root15210 15:11:35 ?0:00 /usr/lib/inet/in.mpathd -a

root21270 15:11:38 ?0:00 /usr/lib/saf/sac -t 300

root8910 15:11:26 ?0:00 /usr/lib/picl/picld

daemon24710 15:11:40 ?0:00 /usr/lib/nfs/nfs4cbd

root10210 15:11:28 ?0:00 /usr/lib/power/powerd

root9810 15:11:27 ?0:00 /usr/lib/sysevent/syseventd

root21510 15:11:38 ?0:00 /usr/sbin/nis_cachemgr

daemon21410 15:11:38 ?0:00 /usr/lib/nfs/lockd

root21310 15:11:38 ?0:00 /usr/lib/utmpd

root21770 15:11:38 console0:00 -sh

root2231920 15:11:39 ?0:00 inm -p9165

root2222120 15:11:39 ?0:00 /usr/lib/saf/ttymon

daemon25510 15:11:40 ?0:00 /usr/lib/nfs/nfsmapid

root3993970 15:11:52 ?0:00 /usr/sadm/lib/smc/bin/smcboot

root25210 15:11:40 ?0:04 /usr/lib/inet/inetd start

root3983970 15:11:52 ?0:00 /usr/sadm/lib/smc/bin/smcboot

root31710 15:11:48 ?0:00 /usr/lib/autofs/automountd

root35910 15:11:50 ?0:00 /usr/lib/sendmail -bd -q15m

root4484470 15:11:53 ?0:00 /usr/lib/locale/ja/wnn/jserver_m

root35110 15:11:50 ?0:02 /usr/lib/fm/fmd/fmd

root6742520 15:12:14 ?0:00 /usr/sbin/in.telnetd

root34710 15:11:50 ?0:00 /usr/lib/ssh/sshd

smmsp36010 15:11:50 ?0:00 /usr/lib/sendmail -Ac -q15m

root46110 15:11:53 ?0:00 /usr/lib/locale/ja/atokserver/atokmngdaemon

root39710 15:11:52 ?0:00 /usr/sadm/lib/smc/bin/smcboot

root4684590 15:11:53 ?0:00 htt_server -port 9010 -syslog -message_locale C

root44110 15:11:53 ?0:00 /usr/lib/locale/ja/wnn/dpkeyserv

root44710 15:11:53 ?0:00 /usr/lib/locale/ja/wnn/jserver

root45910 15:11:53 ?0:00 /usr/lib/im/htt -port 9010 -syslog -message_locale C

root51210 15:11:55 ?0:00 /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf

root52010 15:11:56 ?0:00 /usr/lib/dmi/dmispd

root52810 15:11:56 ?0:00 /usr/sbin/vold

root52110 15:11:56 ?0:00 /usr/lib/dmi/snmpXdmid -s cstoc77022

root51110 15:11:55 ?0:00 /usr/dt/bin/dtlogin -daemon

root6916770 15:12:18 pts/10:00 bash

root6776740 15:12:14 pts/10:00 -sh

root58510 15:11:57 ?0:00 /usr/sfw/sbin/snmpd

CS@apani at 2007-7-7 3:27:58 > top of Java-index,Solaris Operating System,Solaris 10 Features...
# 5
Be sure your changes are permanent, reboot the box and try your tests again. IPMP finds targets at boot time.If it does not work, post all information as requested in reply 1 and 3.
id8102257 at 2007-7-7 3:27:58 > top of Java-index,Solaris Operating System,Solaris 10 Features...
# 6

OK. Actually, I had permanized and rebooted before the previous reply, but I had not rechecked all the ifconfig settings. Here there are again, this time with a configured default router:

-> netstat -rn

Routing Table: IPv4

DestinationGatewayFlags RefUseInterface

-- -- -- --

63.192.77.0 63.192.77.22 U 121 bge0

63.192.77.0 63.192.77.12 U 11 bge1

63.192.77.0 63.192.77.22 U 10 bge0:1

63.192.77.0 63.192.77.22 U 10 bge1:1

224.0.0.063.192.77.22 U 10 bge0

default 63.192.77.9 UG11

127.0.0.1127.0.0.1UH799 lo0

- BOTH CONNECTED bge0, bge1

-> ifconfig -a

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

bge0: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 2

inet 63.192.77.22 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:82

bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2

inet 63.192.77.21 netmask ffffff00 broadcast 63.192.77.255

bge1: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 3

inet 63.192.77.12 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:83

bge1:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3

inet 63.192.77.16 netmask ffffff00 broadcast 63.192.77.255

- REMOVED bge1

-> ifconfig -a

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

bge0: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 2

inet 63.192.77.22 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:82

bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2

inet 63.192.77.21 netmask ffffff00 broadcast 63.192.77.255

bge0:2: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 2

inet 63.192.77.12 netmask ffffff00 broadcast 63.192.77.255

bge1: flags=1019000802<BROADCAST,MULTICAST,IPv4,NOFAILOVER,FAILED,FIXEDMTU> mtu 0 index 3

inet 0.0.0.0 netmask 0

groupname test

ether 0:14:4f:2a:9b:83

bge1:1: flags=19040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 3

inet 63.192.77.16 netmask ffffff00 broadcast 63.192.77.255

Nov 2 16:47:59 cstoc77022 bge: NOTICE: bge1: link down

Nov 2 16:47:59 cstoc77022 in.mpathd[153]: The link has gone down on bge1

Nov 2 16:47:59 cstoc77022 in.mpathd[153]: NIC failure detected on bge1 of group test

Nov 2 16:47:59 cstoc77022 in.mpathd[153]: Successfully failed over from NIC bge1 to NIC bge0

Nov 2 16:48:07 cstoc77022 in.mpathd[153]: All Interfaces in group test have failed

- RESTORED bge1

-> ifconfig -a

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

bge0: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 2

inet 63.192.77.22 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:82

bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2

inet 63.192.77.21 netmask ffffff00 broadcast 63.192.77.255

bge1: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 3

inet 63.192.77.12 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:83

bge1:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3

inet 63.192.77.16 netmask ffffff00 broadcast 63.192.77.255

Nov 2 16:48:51 cstoc77022 bge: NOTICE: bge1: link up 100Mbps Full-Duplex

Nov 2 16:48:51 cstoc77022 in.mpathd[153]: The link has come up on bge1

Nov 2 16:49:06 cstoc77022 in.mpathd[153]: NIC repair detected on bge0 of group test

Nov 2 16:49:06 cstoc77022 in.mpathd[153]: Successfully failed back to NIC bge0

Nov 2 16:49:06 cstoc77022 in.mpathd[153]: At least 1 interface (bge0) of group test has repaired

Nov 2 16:49:06 cstoc77022 in.mpathd[153]: NIC repair detected on bge1 of group test

Nov 2 16:49:06 cstoc77022 in.mpathd[153]: Successfully failed back to NIC bge1

- REMOVED bge0

-> ifconfig -a

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

bge0: flags=1019000802<BROADCAST,MULTICAST,IPv4,NOFAILOVER,FAILED,FIXEDMTU> mtu 0 index 2

inet 0.0.0.0 netmask 0

groupname test

ether 0:14:4f:2a:9b:82

bge0:1: flags=19040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 2

inet 63.192.77.21 netmask ffffff00 broadcast 63.192.77.255

bge1: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 3

inet 63.192.77.12 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:83

bge1:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3

inet 63.192.77.16 netmask ffffff00 broadcast 63.192.77.255

bge1:2: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 3

inet 63.192.77.22 netmask ffffff00 broadcast 63.192.77.255

Nov 2 16:50:02 cstoc77022 bge: NOTICE: bge0: link down

Nov 2 16:50:02 cstoc77022 in.mpathd[153]: The link has gone down on bge0

Nov 2 16:50:02 cstoc77022 in.mpathd[153]: NIC failure detected on bge0 of group test

Nov 2 16:50:02 cstoc77022 in.mpathd[153]: Successfully failed over from NIC bge0 to NIC bge1

- RESTORED bge0

-> ifconfig -a

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

bge0: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 2

inet 63.192.77.22 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:82

bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2

inet 63.192.77.21 netmask ffffff00 broadcast 63.192.77.255

bge1: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1442 index 3

inet 63.192.77.12 netmask ffffff00 broadcast 63.192.77.255

groupname test

ether 0:14:4f:2a:9b:83

bge1:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3

inet 63.192.77.16 netmask ffffff00 broadcast 63.192.77.255

Nov 2 16:51:12 cstoc77022 bge: NOTICE: bge0: link up 100Mbps Full-Duplex

Nov 2 16:51:12 cstoc77022 in.mpathd[153]: The link has come up on bge0

Nov 2 16:51:12 cstoc77022 ip: WARNING: IP: Hardware address '00:14:4f:2a:9b:82' trying to be our a

ddress 063.192.077.021!

Nov 2 16:51:26 cstoc77022 in.mpathd[153]: NIC repair detected on bge0 of group test

Nov 2 16:51:26 cstoc77022 in.mpathd[153]: Successfully failed back to NIC bge0

Nov 2 16:51:34 cstoc77022 ip: WARNING: IP: Hardware address '00:14:4f:2a:9b:82' trying to be our a

ddress 063.192.077.022!

CS@apani at 2007-7-7 3:27:58 > top of Java-index,Solaris Operating System,Solaris 10 Features...
# 7

1. Test your default router:

ping 63.192.77.9

2. Test another Sun boxes 63.192.77.1 and 63.192.77.236 and 63.192.77.19:

ping 63.192.77.1 ; ping 63.192.77.236 ; ping 63.192.77.19

3. If it works, add static routes and in a boot script:

route add -host 63.192.77.1 63.192.77.1 -static

route add -host 63.192.77.236 63.192.77.236 -static

route add -host 63.192.77.19 63.192.77.19 -static

4. Try your tests again.

5. If it does not work, install Recommended patches and bge patch (122027-08).

By the way, did your software uses arp for publishing MAC-IP addresses?

id8102257 at 2007-7-7 3:27:58 > top of Java-index,Solaris Operating System,Solaris 10 Features...
# 8

I don't see how the explicit routes will change the results, but it's worth a try. Our software doesn't do anything with ARPs. The only thing we do is reduce the MTU size to make room for all the ESP headers. We've only had problems with the 'bge' interface, which is the mystery to us. Thanks for your help so far!

CS@apani at 2007-7-7 3:27:58 > top of Java-index,Solaris Operating System,Solaris 10 Features...
# 9

The static routes didn't help so I installed the bge patch. It made my host unbootable, and since I'm using a Try&Buy T1000 there's no optical drive or external SCSI port. leaving a net install as my only option. I'm currently creating a JumpStart server, so hopefully my host will be back up for more testing later today.

CS@apani at 2007-7-7 3:27:58 > top of Java-index,Solaris Operating System,Solaris 10 Features...
# 10
My host is back online, with a newer version of Solaris 10. It already has versions of the BGE patch, so I reran the IPMP tests. It now works normally for me.We'll test the corresponding patch for Solaris 8 also.Thanks!
CS@apani at 2007-7-7 3:27:58 > top of Java-index,Solaris Operating System,Solaris 10 Features...