3960 SAN I/O problem.
Hi all,
We have a 3960 with 2 arrays, 4 bricks with 73G drives.
It is connected to a Sun 890 for Oracle database with 2GB HBAs.
Currently, each brick is configured raid0+1 (mirror/stripe) with disk #9 as hotspare.
Disk1,2,3,4 = LUN1 ~136G
Disk 5,6,7,8 = LUN2 ~136G
Is each LUN a separate I/O channel?
Currently, we have each 136G as one single partition.
LUN1 /db01
LUN2 /db02
Do we get more performance by create 2 partitions per LUN?
LUN1 /db01a 68G and /db01b 68G
LUN2 /db02a 68G and /db02b 68G
With 1 LUN per 136G, we seem to hit a bottleneck on throughput wiht mkfile tests.
This are the Mkfile tests
time /usr/sbin/mkfile 512m TESTFILE
real0m24.84s user0m0.01s sys0m2.99s
Is this slow for the T3?
Thank you for your help.
JohnHo
[862 byte] By [
auspexian] at [2007-11-25 23:03:09]

# 1
JohnHo,
Personally I would be configuring each brick as a Raid 5 LUN (raid 5 disks 1-8 and 9 as hotspare) and then using software to mirror over the Raid 5 LUNS. This will not only increase the amount of available storage but also it can increase reliability and performance.
I would also say it would be worth while checking and upgrading your firmware on this array as the T3 bricks have a large number of firmware releases. If you are not happy to do this yourself then open a Sun Support Case and they will arrange an engineer to come out and do the firmware on your 3960.
With regard to the IO channels, how many connections do you have going to your server? It is normally the case that the IO channels are shared with all of the arrays.
So looking at your config I would say you probably have the following, but please correct me if I am wrong.
1. You have 4 T3 bricks
2. Each T3 brick has cables connecting their loop cards
3. You have 2 bricks with controllers, each of which with 1 connection to it (providing redundancy over the controllers)
4. You have 2 bricks with no controllers that are just providing extra disks.
In the config above each LUN should be accessible through both IO controllers.
# 2
You are correct in #1-4. Thank you.
1. You have 4 T3 bricks
2. Each T3 brick has cables connecting their loop cards
3. You have 2 bricks with controllers, each of which with 1 connection to it (providing redundancy over the controllers)
4. You have 2 bricks with no controllers that are just providing extra disks.
On each brick, if I do RAID5 disk1-8 (#9 hotspare), then mirror it to the 2nd brick, that mean I will only have 1 LUN? or I can still partition to 4 LUNs?
which are the separate I/O channel, then controller or each individual LUN?
If it is the controller, then I only 2 channels via the 2 controllers.
If it is the LUNs, then each brick gives me 2 LUNs.
Our problem is not with space, but with I/O.
In iostat, I see lots of busy on our current LUNs and with waits from Oracle read/write.
Thank you.
John
# 3
Q. On each brick, if I do RAID5 disk1-8 (#9 hotspare), then mirror it to the 2nd brick, that mean I will only have 1 LUN? or I can still partition to 4 LUNs?
A. You would in fact have 4 Raid 5 LUNs coming from the 4 T3 arrays. But through OS Disk Management (either VXVM volumes or Disksuite soft partitions) you then create as many mirrors as you want over these 4 LUNS and if it is planed correctly you would create the mirrors over the LUNS (I.E. 1 side of the mirror on either LUN).
Q. Which are the separate I/O channel, then controller or each individual LUN?
A. It is my understanding that the LUNs would be accessible through either channel and in fact you can use something like DMP to help manage this. The 3960 would then effectively manage itself.
Q. Our problem is not with space, but with I/O. In iostat, I see lots of busy on our current LUNs and with waits from Oracle read/write.
A. It is possible that you have different block sizes on the T3 than those being used by the Oracle database. I would check with Oracle and then Sun and they will be able to assist you in changing this. Either way I do believe that the Raid 5 option would show you a slight improvement.
Hope this helps a little.
# 4
Thank you very much.
We are getting the Sun SSE out this weekend to upgrade the firmwares on all arrays, disks etc..
We also plan to convert one partition from Mirror/Strip to Raid5. to see if we see any improvement.
Thanks for the suggestion on block sizes.
Next Q, can a local disk on SF890 be faster than the 2GB FC SAN?
We store binaries locally on a local partition and the test came out to:
real0m11.93s user0m0.01s sys0m3.23s
real0m10.31s user0m0.01s sys0m2.30s
While the SAN partition tests are
real0m22.30s user0m0.02s sys0m4.23s
real0m21.56s user0m0.04s sys0m2.31s
These were run during off hour, no activities.
time /usr/sbin/mkfile 512m TESTFILE
Thank you all for you knowledges.
JohnHo
# 5
That result is quite possibly correct, perhaps with tuning changes to the array, I.e. block sizes and config, etc, this might come down a little. Remember though that you will notice some difference due to propagation delays between the server and the array.
Personally I would be looking to keep the OS and program binaries on the servers internal disks (making sure they are mirrored) and the data on the arrays. But that is very much a choice that you have to make and it might be better for you if you have time to play to try both options.
Either way the step of getting the SSE out is good, he will be able to fix allot of the firmware's and might even be able to provide a little more detail on possible config changes to your specific installation, but don't hold me to that.
Keep us informed about how you get on with your testing.
# 6
Additionally, as I mentioned in your other thread, testing storage and filesystem performence with mkfile command doesn't give a realistic statistic of file system performance. The parameters that would be changed to tune a filesystem are dependant on the characteristics of the application that will be using the filesystem. To test the performance of this storage you will need your Oracle DBA to run a statistical analysis to generate an I/O profile for the application as it would in full operation, from that point the systems administrator will make changes to the kernel based on this profile. From a hardware perspective I think that you are also hitting a bottle neck between the 2Gb/s HBAs on your 890 and the 1Gb/s fibre channel network ( SANbox 16 and T3 controllers ).
<b>EDIT</b> I made a spelling change ( charicteristics, what was I thinking - I need a dictionary! ).
# 7
I concur that mkfile is not your best choice for a load generator. It gives you no control over of I/O size or number of threads, and also, I understand it allocates blocks from the end of the file forward, which must make sense for swap file usage but is kind of goofy for normal purposes.
I also concur that performance is context sensitive. You can not take a measurement of say 11 threads of 8 KB random write and use it to tell how fast 1 thread of 1 MB sequential read is going to be. That齭 an extreme example, but you get the idea.
Using RAID-1+0 on top of RAID-5 is a very high availability approach; it is double protection against failure. For most purposes, we only design for no single point of failure. If your workload is read intensive it is also a very high performance approach. However, if you have an update intensive application, especially with small I/O, it is not your best choice for performance. It can be fine, if your I/O workload is not too sustained, but if you have sustained small writes most of time, and want maximum write performance, RAID-1+0 is a better choice. You can still use host based RAID-0 to combine LUNS and make smaller partitions if you like. If you have large sequential writes RAID-5 does pretty good if you get full stripe writes, but you need to make sure your I/O is large enough to fill a stripe (the number of data drives times the RAID stripe unit).
The goal is always to spread the load as evenly as possible over the available LUNS. If the database schema is already doing a good job of load distribution over the datafiles, then you are better off not partitioning the LUNS as it will only increase contention. On the other hand, if the load is not evenly distributed over the datafiles, combining the LUNS and partitioning them will serve you well.
There are tools available to take all the mystery out of this if you want to spend the time to design the configuration for a specific SLA given the workload composition and the capability of the chosen array configuration.
HTH,
Dave
# 8
Thank you all for your help.
I agree that our database has grown beyond the 1G limitation of the T3.
We are planning to replace the T3 with either EMC CX700 or 69xx next year. Corporate mandates EMC stuffs, but we should get a good tradein for 69xx.
My next Question is will Veritas Filesystem help? I may able to get Veritas Filesystem for the 3ea 890s that run Oracles.
Do I need to load Veritas filesystem on the San or just the hosts? For each host, I just load the software, copy the oracle datafiles to another partition, and make the new filesystem on top?
Thank you very much all.
John
# 9
I think you'll find decent configuration information about implementing Veritas on <a href="http://docs.sun.com/" target="_blank">http://docs.sun.com/</a> personally I don't have much experience with VxFS, we use QFS on our testbed database.
I can't say that any product is better than the other, there are a number of factors to consider. One difference is that UFS uses block based allocation and VxFS is extent based allocation. In standard trim UFS would spend more time seeking and the filesystem can become fragmented over time which can greatly slow filesystem performance. VxFS and QFS can be much better performing filesystems that UFS ( from Sol 6, 7, 8 and 9 ), but we will see what changes come about with the UFS project in OpenSolaris:
<a href="http://www.opensolaris.org/os/community/ufs/" target="_blank">http://www.opensolaris.org/os/community/ufs/</a>