We are evaluating whether jBoss 7 is ready for production run, and found that jboss process controller detects a fail of a java application server node process (we killed it manually), but do not attempt to restart it automatically. Assuming that app node process might fail for some reason, for example out of memory, it's critical it will be restarted automatically. The only way we found it do it manually via CLI/console. Is there a way to force the process controller do it automatically? Is it a bug?
We use latest 7.1.2.Final
12:48:43,326 INFO [org.jboss.as.process.Server:demo-jbnode1.status] (ProcessController-threads - 3) JBAS012017: Starting process 'Server:demo-jbnode1'
14:50:46,666 INFO [org.jboss.as.process.Server:demo-jbnode1.status] (reaper for Server:demo-jbnode1) JBAS012010: Process 'Server:demo-jbnode1' finished with an exit status of 1 (after we kill the correspondent java)
14:53:09,495 INFO [org.jboss.as.process.Server:demo-jbnode1.status] (ProcessController-threads - 4) JBAS012017: Starting process 'Server:demo-jbnode1' (only happens after manual restart)
With our current JBoss 4, the wrapper takes that responsibility. We use JBossNative to run it as a win service.
Yes, that is the intended behavior. The process-controller will only restart the host-controller if the process exits unexpected. This is important that potentially remote hosts stay manageable. We think that starting a server is an administrative task though and shouldn't be done silently. So a crashed server should rather be detected by a monitoring solution, which usually provides a more sophisticated set of tools to properly handle such an event.
Some of my experience.
If the JVM is dead it will be mostly an accident (Yes I've seen false scripts and admins kill processes ) or a JVM bug.
This is a reason to force a restart imediately.
OOM or i.e. slow response might be a bug (memory leak) or a temporary overloading of your (cluster)system.
What might happen is that if you restart the JBoss instance it blow up your system, let me assume you have 3 nodes handling your throughput, one of it get in such OOM situation.
Your automatic will shutdown or kill the instance (maybe a shutdown is not possible if the GC is running crazy).
The load will be distributed to the other two nodes which will be overloaded as well and the system is complete down and it will heavy tocome back to work automaitcaly.
So the decision whether and how to restart in such situation is very difficult and might require a complex automatic or even an administrator.
Yes the admin has to decide what to do. But if he decides that a died server is a signal, that it needs restart the PC (or HC) does need a "auto-restart" setting (with parameters for sleep time before restart, maximum restart count (and time the count is reset)) "Wait 10s and restart, do this maximum 3 times in 1h".