No message was displayed and nothing was logged in the log files. All files are there and permissons are right. When I checked why nothing logged and found syslogd is hung. I couldnt' kill syslogd. So I called SUN support centre and was told if syslogd stops working no user can loging. Because syslogd couldn't be killed with -9, the only solution is to restart the system. Is there a way to kill and restart syslogd without restart whole system?
Thanks.
> So I called SUN support centre and was told if syslogd stops working no user can loging.
o_O
What? I've -never- heard such a thing before! Did they give any reasonable explanation about this? Because I for the life of mine cannot imagine how syslogd would stop someone from logging in! AFAIK it's even impossible... Weird!
> Because syslogd couldn't be killed with -9, the only solution is to restart the system.
> Is there a way to kill and restart syslogd without restart whole system?
It's not even responding to a kill -9 from root? I'd say there's something borked in your system.
SUN support said When su to a user or login remotely, syslogd will log system messages to the log files. The login service will wait for response from syslogd then processes user login. That's why when syslogd is hung, su or remote login is hung too. All other services look ok and the application (luminis portal) is running fine. I can login as root from console no problem and can run any commands. what I should do before/after restarting the system to prevent this problem happen again?
Thanks.
@Hasan: apparently the problem also occurs when logging in locally... So I'm not sure whether the interfaces would affect anything...
@bx5551: I dunno.... I find the whole thing fishy. I for the life of me cannot imagine "login" to be dependant upon "syslogd". That would mean nobody could log in if the syslog daemon was down...
Well... Just to make sure I tested this scenario...
1. Login to Solaris 9 box.
2. Become root
3. Run /etc/init.d/syslog stop
4. Verify that syslogd is down
5. Login to same box from another window
6. Try to use su in the new window
Both steps 5 and 6 work like a charm as I suspected. This confirms the fact that all calls to syslog are (and should be) asynchronous... Meaning that the calling process shouldn't give a fig whether the message gets logged or not.
So Sun's engineer is full of it...
EDIT:
Did some more investigating with a colleague of mine. I'm not sure if the Solaris implementation of "su" and "login" follow this rule, but others do:
* Logging of messages is done through LOG_AUTH and LOG_CONS.
* If the syslog daemon is down, the LOG_CONS call will attempt to write the message to /dev/console.
* LOG_AUTH is not a blocking call.
* LOG_CONS apparently is.
* So the only situation in which your su and login can be blocked by syslogd being borked, is if your console is also unavailable for writing. Which seems quite unlikely.
Let me go test this a bit further :)
EDIT 2:
I repeated the test I described above, but this time I kept a close eye on the console window as well. I noticed the following:
* User logins were not shown on the console.
* Usage of su was shown on the console.
* Usage of sudo was not shown on the console.
-> "login" does -not- use LOG_CONS and is thus completely independent of syslogd and /dev/console. The same goes for "sudo".
-> "su" does indeed use LOG_CONS, so that -could- hang, if the console was unavailable for writing...
Conclusion:
Sun's statement is incorrect.
To continue with the problem at hand:
* It would be handy if we could see log messages.
* To do this, we need to restart syslogd.
* To do this, we need to find out why it's hanging :p
I agree that rebooting the box isn't very useful right now. It may fix the problem, but we won't know what was wrong :) Less interesting, you see?
I'm worried by the fact that a "kill -9" by root is unable to kill the syslogd process. Quite odd indeed.
I'd be interesting in seeing the output for "pfiles $PID", where $PID is the PID of the syslog daemon. We could see what the process is hanging on... possibly...
Point is that the hanging syslog daemon could be a symptom of the same bug that's causing your su/login problems.
Message was edited by:
Cailin_Coilleach
> I'm worried by the fact that a "kill -9" by root is
> unable to kill the syslogd process. Quite odd
> indeed.
Signals can only be delivered to processes in user space. Any process executing a system call must wait until the call returns before the signal can be dealt with.
It is very likely that syslogd is executing blocking I/O (like write) that has not returned. Perhaps the console for this machine is directed to a serial port, and the serial port has a line asserted that is suspending data traffic. I've seen that before.
If this is happening, 'ttya-ignore-cd' should be set to 'true'. This setting will not take effect until the next reboot though. In the meantime, something would need to connect to the serial port to drive the carrier detect line high.
There are other possibilities, but I've run into that one several times.
> I'd be interesting in seeing the output for "pfiles
> $PID", where $PID is the PID of the syslog daemon. We
> could see what the process is hanging on...
> possibly...
>
> Point is that the hanging syslog daemon could be a
> symptom of the same bug that's causing your su/login
> problems.
More likely the hung syslog is solely responsible for the hung su.
--
Darren
>> Point is that the hanging syslog daemon could be a
>> symptom of the same bug that's causing your su/login
>> problems.
>
> More likely the hung syslog is solely responsible for the hung su.
Aye, that could very well be true since we've already proven that su does indeed depend on a valid write to the console.
Doesn't explain why login won't work though since my tests show that it does not rely on LOG_CONS. Unless maybe the TS's version of "login" is different from mine...
> >> Point is that the hanging syslog daemon could be
> a
> >> symptom of the same bug that's causing your
> su/login
> >> problems.
> >
> > More likely the hung syslog is solely responsible
> for the hung su.
>
> Aye, that could very well be true since we've already
> proven that su does indeed depend on a valid write to
> the console.
>
> Doesn't explain why login won't work though since my
> tests show that it does not rely on LOG_CONS. Unless
> maybe the TS's version of "login" is different from
> mine...
Hi there
I was reading your issue, and have recently come across an issue with Syslogd deeming a system unresponsive. This has happened on V880's, V480's and V280's.
The strange thing is that when we open a terminal to it's console via a Perle 9000 Terminal Server, this seems to resolve the issue. Unfortunately, this has resulted in us having a PC open with over 20 terminals open. This only happens when we migrate servers to a new DataCentre, and the only thing that changes, are:
1. Network Switchport
2. Power
3. Terminal Server
4. Cabling to Terminal Server (through structured cabling as opposed to local cabling)
Stuv
> The strange thing is that when we open a terminal to
> it's console via a Perle 9000 Terminal Server, this
> seems to resolve the issue.
Not strange. Consistent with several configurations.
Have you verified that 'ttya-ignore-cd' is set to true? That usually eliminates the need for an external console for me.
--
Darren
Although this may not be your problem, we experienced a similar issue. We were logging to a remote site and su would hang if the remote logging site was not connected or could not be reached. (this fits with the replumbing the interface suggestion)
You can see the problem occur if you run syslogd in the forground with debugging.
It seems to be partially DNS related and in our situation, we did the following to reproduce it:
1. Use an unreachable remote host in the syslog.conf file for auth.
2. restart syslog in debug mode
3. From a different terminal, login and try to su.
4. You should see that it is trying to contact the remote server but a wait state occurs.
5. After a while, it suspends the tries and you can login as normal.
6. When the suspension times out....it causes a hang when trying to su.
One way to fix this is to add a record in the hosts file to point to the remote server.
An alternate method would be to have a local DNS caching server and prime it with the remote hosts address through cron on a regular basis.
For us, this occured because syslog was restarted right when the link was down.
I was working with another Sys admin when we tested this, so some of my memory of the steps we took may be a bit shady. :) If someone else can reproduce this and give better info, I'm all ears.
Mike B.