Resource tunables
IPMP failure triggers SharedAddress to failover, but is taking ~ 4 mins. Which tunable can be adjusted to fix this?
Probably in the following fashion,
scrgadm -c -j cluster -y tunable="something"
or via the scsetup menu.
I'm trawling through man page for r_properties so if I find something I'll post the solution back here. Otherwise, if someone knows, please be my guest.
The defaults are as follows:
Resource_dependencies <NULL>
Resource_dependencies_weak<NULL>
Resource_dependencies_restart <NULL>
PRENET_START_TIMEOUT300
MONITOR_CHECK_TIMEOUT 300
MONITOR_STOP_TIMEOUT300
MONITOR_START_TIMEOUT 300
BOOT_TIMEOUT300
FINI_TIMEOUT300
INIT_TIMEOUT300
UPDATE_TIMEOUT300
VALIDATE_TIMEOUT300
STOP_TIMEOUT300
START_TIMEOUT500
Failover_modeSOFT
[885 byte] By [
diggles] at [2007-11-26 10:48:43]

# 6
1. STATUS PRIOR TO FAULT
###################################################
Resource: clusternet nodetwo OnlineOnline - SharedAddress online.
Resource: httpdnodetwo OnlineOnline
Resource: squidnodetwo OnlineOnline
Resource: namednodetwo OnlineOnline
2. IMMEDIATELY AFTER UNPLUGGING PUBLIC ETHERNETS
###################################################
Saturday October 21 16:01:48 EST 2006
Resource: clusternet nodetwo Online but not monitored Degraded - IPMP Failure.
3. AFTER TIMEOUT PERIOD
###################################################
Saturday October 21 16:05:18 EST 2006
Resource: clusternet nodeone OnlineOnline - SharedAddress online.
Resource: httpdnodeone OnlineOnline
Resource: squidnodeone OnlineOnline
Resource: namednodeone OnlineOnline
Looks like httpd is the culprit... but I don't yet know why.....
nodetwo /var/adm# fgrep "Oct 21 16" messages|fgrep httpd
Oct 21 16:01:46 nodetwo Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <gds_monitor_stop> for resource <httpd>, resource group <cluster-services>, timeout <300> seconds
Oct 21 16:01:46 nodetwo Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <gds_monitor_stop> for resource <httpd>, resource group <cluster-services>, timeout <300> seconds
Oct 21 16:01:47 nodetwo SC[SUNW.gds:5,cluster-services,httpd,gds_monitor_stop]: [ID 227820 daemon.info] Attempting to stop the data service running under process monitor facility.
Oct 21 16:01:47 nodetwo SC[SUNW.gds:5,cluster-services,httpd,gds_monitor_stop]: [ID 675776 daemon.info] Stopped the fault monitor.
Oct 21 16:01:47 nodetwo Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <gds_monitor_stop> completed successfully for resource <httpd>, resource group <cluster-services>, time used: 0% of timeout <300 seconds>
Oct 21 16:01:47 nodetwo Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <gds_monitor_stop> completed successfully for resource <httpd>, resource group <cluster-services>, time used: 0% of timeout <300 seconds>
Oct 21 16:01:47 nodetwo Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <gds_svc_stop> for resource <httpd>, resource group <cluster-services>, timeout <300> seconds
Oct 21 16:01:47 nodetwo Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <gds_svc_stop> for resource <httpd>, resource group <cluster-services>, timeout <300> seconds
Oct 21 16:01:47 nodetwo SC[SUNW.gds:5,cluster-services,httpd,gds_svc_stop]: [ID 721263 daemon.info] Extension property <stop_signal> has a value of <15>
Oct 21 16:05:07 nodetwo SC[SUNW.gds:5,cluster-services,httpd,gds_svc_stop]: [ID 227820 daemon.info] Attempting to stop the data service running under process monitor facility.
Oct 21 16:05:07 nodetwo SC[SUNW.gds:5,cluster-services,httpd,gds_svc_stop]: [ID 401400 daemon.info] Successfully stopped the application
Oct 21 16:05:07 nodetwo Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <gds_svc_stop> completed successfully for resource <httpd>, resource group <cluster-services>, time used: 66% of timeout <300 seconds>
Oct 21 16:05:07 nodetwo Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <gds_svc_stop> completed successfully for resource <httpd>, resource group <cluster-services>, time used: 66% of timeout <300 seconds>