poor scsi write performance
hello
ive got a sunfire v240 with 2 36gb scsi disks in a software raid1
the write performance is very poor (16mbit/s)...
ive patched the box with patchpro, the original install was from solaris 8 2/02
any suggestions?
thanks in advance
<div class="pre"><pre>bash-2.05$ ./Bonnie -s 512File './Bonnie.5220', size: 536870912Writing with putc()...doneRewriting...doneWriting intelligently...doneReading with getc()...doneReading intelligently...doneSeeker 1...Seeker 2...Seeker 3...start 'em...done...done...done... -Sequential Output-- Sequential Input-- --Random-- -Per Char- --Block -Rewrite-- -Per Char- --Block --SeeksMachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 512 2036 3.8 2015 0.9 1905 1.4 45215 99.3 322729 98.5 1079.1 8.4bash-2.05# formatSearching for disks...doneAVAILABLE DISK SELECTIONS:0. c0t0d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107> v240.0 /pci@1c,600000/scsi@2/sd@0,01. c0t1d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107> v240.1 /pci@1c,600000/scsi@2/sd@1,0bash-2.05# uname -aSunOS v240 5.8 Generic_117350-27 sun4u sparcbash-2.05# /usr/platform/SUNW,Sun-Fire-V240/sbin/prtdiag System Configuration: Sun Microsystems sun4u Sun Fire V240System clock frequency: 160 MHzMemory size: 2048 Megabytes========================= CPUs =========================RunEcacheCPUCPUBrd CPUModuleMHzMBImpl.Mask- --- 00012801.0162.4 01112801.0162.4========================= IO Cards =========================bash-2.05# df -hTFilesystemTypeSize Used Avail Use% Mounted on/dev/md/dsk/d0 ufs29G13G15G 46% /swap tmpfs3.8G32k 3.7G1% /var/runswap tmpfs3.8G 8.0k 3.7G1% /tmpbash-2.05# metastatd0: MirrorSubmirror 0: d10State: Okay Submirror 1: d20State: Okay Pass: 1Read option: roundrobin (default)Write option: parallel (default)Size: 61440363 blocksd10: Submirror of d0State: Okay Size: 61440363 blocksStripe 0:Device Start Block Dbase StateHot Sparec0t1d0s00NoOkay d20: Submirror of d0State: Okay Size: 61440363 blocksStripe 0:Device Start Block Dbase StateHot Sparec0t0d0s00NoOkay </pre></div>
[2098 byte] By [
vinz] at [2007-11-25 23:00:07]

# 1
Can you post some vmstat output?
# 2
thats while it writes (starting at line 4):
<div class="pre"><pre>bash-2.05# vmstat 2 20 procsmemorypagedisk faultscpu r b wswap free re mf pi po fr de sr m0 m1 m2 s0insycs us sy id 0 0 0 3929272 1937032 7 27 38 0 0 0 0 0 0 0 0 410 194 228 0 1 99 0 0 0 3988272 1933624 3 5 0 0 0 0 0 0 0 0 0 12619 112 0 0 100 0 0 0 3988272 1933624 0 0 0 0 0 0 0 0 0 0 0 13118 120 0 0 100 0 0 0 3988184 1933104 8 91 0 0 0 0 0 0 0 0 0 696 220 260 0 1 99 0 1 0 3988104 1930016 0 0 0 0 0 0 0 0 0 0 0 459 307 159 2 0 98 0 0 0 3988104 1925736 0 0 0 0 0 0 0 0 0 0 0 356 226 129 1 1 98 0 0 0 3988104 1921880 0 0 0 0 0 0 0 0 0 0 0 338 283 130 1 1 97 0 0 0 3988096 1917592 0 0 0 0 0 0 0 0 0 0 0 350 280 131 1 1 97 0 0 0 3988088 1913736 0 0 0 0 0 0 0 0 0 0 0 342 285 129 1 1 98 0 0 0 3988088 1909456 0 0 0 0 0 0 0 0 0 0 0 336 279 120 1 1 98 0 0 0 3988088 1905176 0 0 0 0 0 0 0 0 0 0 0 334 227 118 1 0 98 0 0 0 3988080 1901320 0 0 0 0 0 0 0 0 0 0 0 344 286 128 2 1 97 0 0 0 3988080 1897464 0 0 0 0 0 0 0 0 0 0 0 339 280 122 1 1 98 0 0 0 3988080 1893184 0 0 0 0 0 0 0 0 0 0 0 346 283 129 2 0 98 0 0 0 3988080 1888904 0 0 0 0 0 0 0 0 0 0 0 344 240 128 1 1 98 0 0 0 3988072 1885048 0 0 0 0 0 0 0 0 0 0 0 347 265 123 2 0 98 0 0 0 3988072 1881192 0 0 0 0 0 0 0 0 0 0 0 348 283 124 2 1 97 0 0 0 3988072 1876912 0 0 0 0 0 0 0 0 0 0 0 335 280 122 2 0 98 0 0 0 3988072 1872632 0 0 0 0 0 0 0 0 0 0 0 531 533 205 2 1 97</pre></div>
vinz at 2007-7-5 17:49:22 >

# 3
What is the Bonnie script you are running? As you can see from the output the system is idle, can you post 5 10 as well.
# 4
bonnie is a <quote>Performance Test of Filesystem I/O using standard C library calls.</quote>: <a href="http://www.textuality.com/bonnie/" target="_blank">http://www.textuality.com/bonnie/</a>
it writes an amount of bytes, specified by input, to the disk and meassures the time needed to write and read from it
<div class="pre"><pre>bash-2.05# vmstat 5 10 procsmemorypagedisk faultscpu r b wswap free re mf pi po fr de sr m0 m1 m2 s0insycs us sy id 0 0 0 3967848 1930984 3 9 13 1 1 0 0 0 0 0 0 23897 155 0 1 99 0 0 0 3987848 1926544 3 17 0 0 0 0 0 0 0 0 0 239 200 144 1 1 98 0 0 0 3987784 1917392 0 0 0 0 0 0 0 0 0 0 0 331 265 123 1 1 98 0 0 0 3987768 1907224 0 0 0 0 0 0 0 0 0 0 0 339 266 126 1 1 98 0 0 0 3987768 1897000 0 0 0 0 0 0 0 0 0 0 0 347 267 131 1 1 97 0 0 0 3987768 1886896 0 0 0 0 0 0 0 0 0 0 0 331 265 123 1 1 98 0 0 0 3987760 1876776 0 0 0 0 0 0 0 0 0 0 0 343 246 127 2 1 98 0 0 0 3987760 1866680 0 0 0 0 0 0 0 0 0 0 0 343 286 123 1 1 97 0 0 0 3987752 1856400 0 0 0 0 0 0 0 0 0 0 0 341 246 118 2 0 98 0 0 0 3987744 1846280 0 0 0 0 0 0 0 0 0 0 0 368 267 120 2 1 97</pre></div>
vinz at 2007-7-5 17:49:22 >

# 5
Any external enclosures attached to the system?
# 6
yeah, an old scsi-2 cdrombut it isnt the same channel
vinz at 2007-7-5 17:49:22 >

# 7
Ok, something I noticed in the Bonnie documentation:
<i>-s size-in-Mb
The number of megabytes to test with. If you do not use this, Bonnie will test with a 100Mb file. In this discussion, Megabyte means 1048576 bytes. If you have a computer that does not allow 64-bit files, the maximum value you can use is 2047.
<b>It is important to use a file size that is several times the size of the available memory (RAM) - otherwise, the operating system will cache large parts of the file, and Bonnie will end up doing very little I/O. At least four times the size of the available memory is desirable.</b></i>
In your bench test you used 512MB which is a quarter of the current available RAM, try the test again with 8000Mb and see if there is a change.
# 8
that does not change anything at all...
<div class="pre"><pre>bash-2.05# vmstat 5 10$ procsmemorypagedisk faultscpu r b wswap free re mf pi po fr de sr m0 m1 m2 s0insycs us sy id 0 0 0 3974904 1920104 2 8 9 1 1 0 0 0 0 0 0 222 102 143 0 1 99 0 0 0 3986688 1699400 1 2 0 0 0 0 0 0 0 0 0 357 269 133 1 1 97 0 1 0 3986680 1699392 0 0 0 0 0 0 0 0 0 0 0 423 266 148 2 1 97 0 0 0 3986672 1699384 0 0 0 0 0 0 0 0 0 0 0 338 246 122 1 1 98 0 0 0 3986664 1699376 0 0 0 0 0 0 0 0 0 0 0 341 266 123 2 1 98 0 0 0 3986656 1699368 0 0 0 0 0 0 0 0 0 0 0 332 270 119 1 1 97 0 0 0 3986640 1699352 0 0 0 0 0 0 0 0 0 0 0 345 267 121 2 1 97 0 0 0 3986632 1699344 0 0 0 0 0 0 0 0 0 0 0 336 265 125 2 1 97 0 0 0 3986624 1699336 0 0 0 0 0 0 0 0 0 0 0 338 266 121 1 1 98 0 0 0 3986608 1699320 0 0 0 0 0 0 0 0 0 0 0 349 274 121 2 1 97</pre></div>
vinz at 2007-7-5 17:49:22 >

# 9
There can be issues with 3rd party SCSI devices and V240 SCSI disks. Spectrum InfoDoc 74480 covers the issue, but I think it mainly covers storage arrays, check /var/adm/messages for SCSI time out entries. Can you post the result of Bonnie -s 8000, I will do some tests on some systems over the week end with Bonnie and see what I can turn up. I think your issue is related to this benchmark tool. Perhaps TB has something to contribute to this as well.
# 10
the box is all genuine sun hardware, expect for the external cdrom.ill run the full 8gb bench...btw, the flags i used to compile it:gcc -O Bonnie.c -o Bonnie -mcpu=ultrasparc -D_FILE_OFFSET_BITS=64
vinz at 2007-7-5 17:49:22 >

# 11
The Sun36GB internal FC-AL drive (for comparison) delivers about 3.5 MB/sec with 15 threads of 8 KB Random Write, and about 35 MB/sec with 15 threads of 512 KB Random Write. That齭 an order of magnitude difference based on I/O size and load level.
I am not familiar with Bonnie, but based on your throughput of ~ 2 MB/sec, you appear to have small I/O and low load levels. Even a single thread of 512 KB Random Write delivers ~ 20 MB/sec raw on the reference drive? the moral to the story is even a Ferrari goes slow if you don齮 put your foot on the gas ;-)
What does iostat -xn 5 show for your test?
If you want to maximize write bandwidth, build the filesystem with a large I/O size and contagious block allocation. For UFS that would be with maxcontig i.e. newfs 鼵 64 for a 512 KB I/O size. You also have to adjust the kernel maxphys, the default is still only 128 KB. And for LVM you also need to adjust md_maxphys. In order to avoid fragmentation set both maxphys and md_maxphys to 1048576 (1 MB) if you are shooting for 512 KB I/O. Also for UFS, tunefs 齟 can be used to allow larger contagious block allocation, and 齛 will allow maxcontig to be set with out rebuilding the file system, however in that case for an existing file, you will need to copy the file to reallocate contagious blocks.
With the above configuration, if you write the file sequentially, you will get at least some large I/O, and depending on the mood fsflush is in (your autoup setting, and how much memory is available), you will also get higher concurrency levels, all of which will increase your sustainable write performance. However, there will still be some 8 KB random read and write due to inode access, so you can not quite realize the pure 512 KB throughput rates.
HTH,
# 12
Additionally to Daves piece, I can't see that Bonnie is a good way to test the system throughput ( I didn't have time to check it out, sorry ). Most systems administrators and database administrators tune the file system based on hours to days of statistical analysis of a production environment. Bonnie just creates a file and bases it's throughput calculation on the creation of one file. Take Oracle for example, a busy database may access the disks many times to retrieve database blocks over 24 hours, the variation in the amount of times it accesses the disks will most definitely change as the database hits it's peak load and then drops off again. To get an accurate picture of possible performance bottle neck the administrator would gather statistics using various tools: vmstat, iostat and sar, covering peak and off peak times. Sun have a tool as well, Sun Storedge Workload Analysis Tool, but it is not available to the public or spectrum customers, but you can make a call to Sun services or your iForce partner and arrange to have performance tests carried out. There are also other interesting tools as well, Dave has a link in his signature to an analytical tool that is available to download for demo use.
# 13
Right first off I have had this issue before and the problem is that the OS on the server is the incorrect revision. You have stated that the server was is installed with Solaris 8 2/02. This is not even close to the minimum requirement for this server, which is Solaris 8 12/02.
It is not as easy as applying patches to get the same level of support in Solaris 8 2/02 as is provided in Solaris 8 12/02. The reason for this, is that there are a collection of fundamentally missing packages between the 2 version that you are never going to get unless you install the correct supported version on the server.
# 14
it's the OSi booted the latest solaris 10 and i am getting amazing speeds now :-)thanks alot everyone for the good hints and advices :-)
vinz at 2007-7-5 17:49:22 >

# 15
Very observant of you Stuart, I didn't notice the OS myself, I would assume that 2/02 would not have booted on an unsupported system!