4 Replies Latest reply on Jun 28, 2012 7:44 PM by b.eckenfels

AS 7 domain robustness - process controller doesn't autorestart app server node after a fail

maksymg Jun 1, 2012 6:17 PM

Hi,

We are evaluating whether jBoss 7 is ready for production run, and found that jboss process controller detects a fail of a java application server node process (we killed it manually), but do not attempt to restart it automatically. Assuming that app node process might fail for some reason, for example out of memory, it's critical it will be restarted automatically. The only way we found it do it manually via CLI/console. Is there a way to force the process controller do it automatically? Is it a bug?

We use latest 7.1.2.Final

12:48:43,326 INFO [org.jboss.as.process.Server:demo-jbnode1.status] (ProcessController-threads - 3) JBAS012017: Starting process 'Server:demo-jbnode1'

14:50:46,666 INFO [org.jboss.as.process.Server:demo-jbnode1.status] (reaper for Server:demo-jbnode1) JBAS012010: Process 'Server:demo-jbnode1' finished with an exit status of 1 (after we kill the correspondent java)

14:53:09,495 INFO [org.jboss.as.process.Server:demo-jbnode1.status] (ProcessController-threads - 4) JBAS012017: Starting process 'Server:demo-jbnode1' (only happens after manual restart)

With our current JBoss 4, the wrapper takes that responsibility. We use JBossNative to run it as a win service.

Thanks,

Maksym

1. Re: AS 7 domain robustness - process controller doesn't autorestart app server node after a fail

emuckenhuber Jun 2, 2012 5:25 AM (in response to maksymg)

Yes, that is the intended behavior. The process-controller will only restart the host-controller if the process exits unexpected. This is important that potentially remote hosts stay manageable. We think that starting a server is an administrative task though and shouldn't be done silently. So a crashed server should rather be detected by a monitoring solution, which usually provides a more sophisticated set of tools to properly handle such an event.
Actions
2. Re: AS 7 domain robustness - process controller doesn't autorestart app server node after a fail

maksymg Jun 4, 2012 12:04 PM (in response to emuckenhuber)

Emanuel,

Can you clarify whether Java termination due to getting out of memory qualifes as unexpected exit, and as result be restarted automatically?

Thanks,

Maksym
Actions
3. Re: AS 7 domain robustness - process controller doesn't autorestart app server node after a fail

wdfink Jun 5, 2012 3:41 AM (in response to maksymg)

Some of my experience.
If the JVM is dead it will be mostly an accident (Yes I've seen false scripts and admins kill processes ) or a JVM bug.
This is a reason to force a restart imediately.

OOM or i.e. slow response might be a bug (memory leak) or a temporary overloading of your (cluster)system.
What might happen is that if you restart the JBoss instance it blow up your system, let me assume you have 3 nodes handling your throughput, one of it get in such OOM situation.
Your automatic will shutdown or kill the instance (maybe a shutdown is not possible if the GC is running crazy).
The load will be distributed to the other two nodes which will be overloaded as well and the system is complete down and it will heavy tocome back to work automaitcaly.

So the decision whether and how to restart in such situation is very difficult and might require a complex automatic or even an administrator.
Actions
4. Re: AS 7 domain robustness - process controller doesn't autorestart app server node after a fail

b.eckenfels Jun 28, 2012 7:44 PM (in response to wdfink)

Yes the admin has to decide what to do. But if he decides that a died server is a signal, that it needs restart the PC (or HC) does need a "auto-restart" setting (with parameters for sleep time before restart, maximum restart count (and time the count is reset)) "Wait 10s and restart, do this maximum 3 times in 1h".
Actions

Go to original post