Out of space?
[This was originally posted in the JRE forums, but I think that was a mistake. Sorry about the double posting, but I think this forum would be more appropriate.]
I just added a Solaris 10 box (Thin Client) to our Sun Ray Server cluster, and have started having problems with people logged onto it trying to run stuff in java. We are using JGrasp as our editor/compilier, and we're getting an error "solaris-run exists but cannot be executed: Out of space. Please check permisions or space restrictions" Of course, I checked permissions, and they were fine, leaving it to a memory usage problem. Additionally, when running java programs, it would give "ForkAndExec" errors, meaning that something was going on with memory. Similar errors such as JVM not being able to allocate memory for object heap come up occasionally as well. Another disconcerting thing that is happening is that sometimes the server randomly boots all the users back to login (causing them to lose all their work and to have to log back in).
Our maxnproc limit is set to 30000, a number that works on the other servers. the peruser limit is 1000. We usually only have a maximum of 12 people on the server at one time. The server has 6 Gigs of RAM and an equal amount of swap. neither seem to be bogged down when these things happen. What's going on?
[1341 byte] By [
Zyphon] at [2007-11-26 9:19:22]

# 1
Zyphon,Maybe it is kernel parameter such as shared memory, use sysdef and verify them. Maybe it helps.JoshD
JoshD at 2007-7-6 23:49:22 >

# 2
1)how does you swap space look?
2) try running vmstat 2 200
while executing the application.
3) have you set a limitation in your /etc/project, e.g. try using prctl
4) You could also download the DTrace Tool kit from http://users.tpg.com.au/adsln4yb/dtrace.html the use dtruss to see what goes on in your application.
good hunting
Henry
# 3
Here are our current memory usage settings... I don't think there's anything wrong with them, but I'll let you guys decide:
127451136maximum memory allowed in buffer cache (bufhwm)
30000maximum number of processes (v.v_proc)
99maximum global priority in sys class (MAXCLSYSPRI)
1000maximum processes per user id (v.v_maxup)
30auto update time limit in seconds (NAUTOUP)
25page stealing low water mark (GPGSLO)
1fsflush run rate (FSFLUSHR)
25minimum resident memory for avoiding deadlock (MINARMEM)
25minimum swapable memory for avoiding deadlock (MINASMEM)
# 4
Thanks for the replies. Swap has looked fine when I check it while we were getting the error, and /etc/project is the same as it is on our working servers. Today, I haven't gotten the error so far (knock on wood), but if it starts up again I might try out that Tool Kit thing (though I don't think its an error in the application because it also occurs outside of it). I think I forgot to mention that the problem usually goes away after a few minutes and it doesn't write anything to any of the logs I checked (/var/adm, syslog), pointing to a resource allocation problem, but I can't seem to find any cause of it. I guess I'll wait for the next crash and see what I can find out.
# 5
Just got the problem again, here is vmstat while the problem was happening:
kthrmemorypagedisk faultscpu
r b wswap free re mf pi po fr de sr s6 -- -- --insycs us sy id
0 0 0 3547112 4369456 20 96 5 0 0 0 0 0 0 0 0 423 2768 2190 1 1 98
0 0 0 121568 2415096 0 21 0 0 0 0 0 0 0 0 0 883 15288 9507 19 4 77
0 0 0 121568 2415096 00 0 0 0 0 0 0 0 0 0 839 12378 8706 19 3 78
That should be fine, right? (unless I'm misreading it) What other memtests can I run?
# 6
Actually, I think I may have misread it: here is what it prints out when it does work:
kthrmemorypagedisk faultscpu
r b wswap free re mf pi po fr de sr s6 -- -- --insycs us sy id
0 0 0 3546544 4369128 20 96 5 0 0 0 0 0 0 0 0 423 2770 2191 1 1 98
0 0 0 891400 2894992 149 1998 4 0 0 0 0 0 0 0 0 1275 13487 9713 20 8 72
0 0 0 904784 2888080 67 2531 4 0 0 0 0 0 0 0 0 1494 14794 11370 19 8 73
Notice that re and mf (page reclaims and minor faults) are no longer 0... not sure what this means as I don't really understand what the "page" items (re, mf, pi, etc.) represent, or what pages are (if someone could explain, that'd be great).
# 7
I only dable in this so; pages are chunks of address space that can be paged in and our of physical memory. As I remember it you shouldn't have a problem unless SR (scan rate) goes up, when SR goes up this means the system are looking for pages in physical memory that haven't been tuched in an set time.
As to DTT, dtruss is a script that will show both the function calls and return values, so hopefully it would give you a hint as to what function call result in the barf.
# 8
The book "Sun Performance and Tuning" by Adrian Cockcroft is a great resource for understanding all of those numbers and helping reach a resolution on these symptoms.
# 9
> The book "Sun Performance and Tuning" by Adrian
> Cockcroft is a great resource for understanding all
> of those numbers and helping reach a resolution on
> these symptoms.
Well, it's dropped down to only "pretty good" in my estimation. Solaris has had many changes since its publication. The virtual memory system in particular has changed dramatically.
I would recommend the new performance and debugging book that's just been released. Amazon has just shipped my copy today, so I haven't read it yet, but it will be considerably more up to date.
http://www.sun.com/books/catalog/solaris_perf_tools.xml
Of course the cockcroft book's principles are still completely valid. It's just that you have to pay more attention to verifying that you're interpreting the the data from Solaris correctly.
--
Darren
# 10
> Just got the problem again, here is vmstat while the
> problem was happening:
> kthrmemorypage
>disk faultscpu
> ap free re mf pi po fr de sr s6 -- -- --insy
>cs us sy id
> 0 0 3547112 4369456 20 96 5 0 0 0 0 0 0 0 0
>423 2768 2190 1 1 98
> 0 0 121568 2415096 0 21 0 0 0 0 0 0 0 0 0
>883 15288 9507 19 4 77
> 0 0 121568 2415096 00 0 0 0 0 0 0 0 0 0
>839 12378 8706 19 3 78
> That should be fine, right? (unless I'm misreading
> it) What other memtests can I run?
That's actually pretty odd. You show a good amount of RAM, but the swap space is down to just a couple hundred megabytes. That seems quite unusual to me. Although the numbers don't suggest any memory pressure. Odd.
How about:
swap -l
swap -s
echo "::memstat" | mdb -k
--
Darren