close() over NFS hangs for 15 seconds on NAS 5320
Summary: We are experiencing long pauses waiting for files to close on NAS 5320.
Hardware:
NFS Server: 5320 NAS S/N ST5320200609190020 Version 4.20 M0 (Build 78)
NFS Clients: SunOS 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Fire-V240
SunOS mnemo 5.6 Generic_105181-20 sun4u sparc SUNW,Ultra-4
(Various other Solaris & Linux clients)
Symptom:
User copies a tiny file using "cp". The "cp" program hangs for 15 seconds before
normal exit and successful completion. Running "truss" on the "cp" command
reveals that the copy happens nearly instantaneously, and the program goes into
a sleep state waiting for the close() of the output file.
The problem is reproducible regardless of whether the filesystem is NFS hard
mounted or automounted. The problem seems to occur most often if the output
file being written previously existed under the same name, was recently deleted,
and is being recreated by the copy operation.
The problem is reproducible from every NFS client I've tried, both Solaris and Linux.
The problem is only exhibited for filesystems served by the NAS 5320.
Here's a typical truss output exhibiting the behavior. The 15-second hang occurs
after truss prints the line that says "close(4) (sleeping...)":
> touch ../../xx ; rm ../../xx
> truss cp -p trav_op.c ../../xx
execve("/bin/cp", 0xEFFFF738, 0xEFFFF74C) argc = 4
open("/dev/zero", O_RDONLY) = 3
mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF7B0000
open("/usr/openwin/lib/libc.so.1", O_RDONLY)Err#2 ENOENT
open("/usr/lib/libc.so.1", O_RDONLY)= 4
fstat(4, 0xEFFFF2D4)= 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF7A0000
mmap(0x00000000, 704512, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF680000
munmap(0xEF714000, 57344)= 0
mmap(0xEF722000, 28504, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 598016) = 0xEF722000
mmap(0xEF72A000, 2664, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xEF72A000
close(4)= 0
open("/usr/openwin/lib/libdl.so.1", O_RDONLY)Err#2 ENOENT
open("/usr/lib/libdl.so.1", O_RDONLY)= 4
fstat(4, 0xEFFFF2D4)= 0
mmap(0xEF7A0000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 0) = 0xEF7A0000
close(4)= 0
open("/usr/platform/SUNW,Ultra-4/lib/libc_psr.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFF0B4)= 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF790000
mmap(0x00000000, 16384, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF780000
close(4)= 0
close(3)= 0
munmap(0xEF790000, 8192)= 0
stat64("trav_op.c", 0x00024200)= 0
acl("trav_op.c", GETACLCNT, 0, 0x00000000)= 4
stat64("../../xx", 0x00024168)Err#2 ENOENT
open64("trav_op.c", O_RDONLY)= 3
creat64("../../xx", 0644)= 4
stat64("../../xx", 0x00024168)= 0
chmod("../../xx", 0100644) = 0
mmap64(0x00000000, 38425, PROT_READ, MAP_SHARED, 3, 0) = 0xEF770000
memcntl(0xEF770000, 38425, MC_ADVISE, 0x0002, 0, 0) = 0
write(4, " / * - - - - - - - - - -".., 38425)= 38425
munmap(0xEF770000, 38425)= 0
close(3)= 0
close(4)(sleeping...) *** HANGS APPROX 15 SEC ***
close(4)= 0
chown("../../xx", 2048, 525)= 0
chmod("../../xx", 0100644) = 0
utime("../../xx", 0xEFFFEB08)= 0
llseek(0, 0, SEEK_CUR) = 23915
_exit(0)

