close() over NFS hangs for 15 seconds on NAS 5320

Summary: We are experiencing long pauses waiting for files to close on NAS 5320.

Hardware:

NFS Server: 5320 NAS S/N ST5320200609190020 Version 4.20 M0 (Build 78)

NFS Clients: SunOS 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Fire-V240

SunOS mnemo 5.6 Generic_105181-20 sun4u sparc SUNW,Ultra-4

(Various other Solaris & Linux clients)

Symptom:

User copies a tiny file using "cp". The "cp" program hangs for 15 seconds before

normal exit and successful completion. Running "truss" on the "cp" command

reveals that the copy happens nearly instantaneously, and the program goes into

a sleep state waiting for the close() of the output file.

The problem is reproducible regardless of whether the filesystem is NFS hard

mounted or automounted. The problem seems to occur most often if the output

file being written previously existed under the same name, was recently deleted,

and is being recreated by the copy operation.

The problem is reproducible from every NFS client I've tried, both Solaris and Linux.

The problem is only exhibited for filesystems served by the NAS 5320.

Here's a typical truss output exhibiting the behavior. The 15-second hang occurs

after truss prints the line that says "close(4) (sleeping...)":

> touch ../../xx ; rm ../../xx

> truss cp -p trav_op.c ../../xx

execve("/bin/cp", 0xEFFFF738, 0xEFFFF74C) argc = 4

open("/dev/zero", O_RDONLY) = 3

mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF7B0000

open("/usr/openwin/lib/libc.so.1", O_RDONLY)Err#2 ENOENT

open("/usr/lib/libc.so.1", O_RDONLY)= 4

fstat(4, 0xEFFFF2D4)= 0

mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF7A0000

mmap(0x00000000, 704512, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF680000

munmap(0xEF714000, 57344)= 0

mmap(0xEF722000, 28504, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 598016) = 0xEF722000

mmap(0xEF72A000, 2664, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xEF72A000

close(4)= 0

open("/usr/openwin/lib/libdl.so.1", O_RDONLY)Err#2 ENOENT

open("/usr/lib/libdl.so.1", O_RDONLY)= 4

fstat(4, 0xEFFFF2D4)= 0

mmap(0xEF7A0000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 0) = 0xEF7A0000

close(4)= 0

open("/usr/platform/SUNW,Ultra-4/lib/libc_psr.so.1", O_RDONLY) = 4

fstat(4, 0xEFFFF0B4)= 0

mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF790000

mmap(0x00000000, 16384, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF780000

close(4)= 0

close(3)= 0

munmap(0xEF790000, 8192)= 0

stat64("trav_op.c", 0x00024200)= 0

acl("trav_op.c", GETACLCNT, 0, 0x00000000)= 4

stat64("../../xx", 0x00024168)Err#2 ENOENT

open64("trav_op.c", O_RDONLY)= 3

creat64("../../xx", 0644)= 4

stat64("../../xx", 0x00024168)= 0

chmod("../../xx", 0100644) = 0

mmap64(0x00000000, 38425, PROT_READ, MAP_SHARED, 3, 0) = 0xEF770000

memcntl(0xEF770000, 38425, MC_ADVISE, 0x0002, 0, 0) = 0

write(4, " / * - - - - - - - - - -".., 38425)= 38425

munmap(0xEF770000, 38425)= 0

close(3)= 0

close(4)(sleeping...) *** HANGS APPROX 15 SEC ***

close(4)= 0

chown("../../xx", 2048, 525)= 0

chmod("../../xx", 0100644) = 0

utime("../../xx", 0xEFFFEB08)= 0

llseek(0, 0, SEEK_CUR) = 23915

_exit(0)

[3548 byte] By [kewamplera] at [2007-11-26 15:39:50]
# 1
Hello,are you aware that this isn't official Sun Support but a user-to-user forum ?Michael- When will the deficencies/bugs in the Forums software be fixed ?
MAALATFTa at 2007-7-8 21:58:16 > top of Java-index,Storage Forums,Storage General Discussion...
# 2
We opened a service request with Sun.They were able to correlate our problem with the following known problem:"6452655 Postmark performance severely degraded with release 4.12 build 20"They advised us to apply update 119352-04, and this solved the problem for us.
kewamplera at 2007-7-8 21:58:16 > top of Java-index,Storage Forums,Storage General Discussion...
# 3
Thank you for posting the update !I didn't want you "to talk away" just point out that's unlikely that you get help within time. If and when you get an answer depends on the spare time (and knowledge) of the volunteers, versus getting a service case assigned.Michael
MAALATFTa at 2007-7-8 21:58:16 > top of Java-index,Storage Forums,Storage General Discussion...