2 Replies Latest reply on Dec 13, 2011 7:57 AM by anujbhatia

    Shutting down JBoss while an invoke is executing

    anujbhatia

      Hi,

       

      If I try to shutdown JBoss while a BPEL process instance is executing an INVOKE activity, the server does not shutdown completely untill the response is recieved from the external web service (or the request timesout). Is this by design?

       

      I think this would be a good feature to have to make sure that the process is not left in an inconsistent state. However, there seems to be a problem and this is not working as I expected. Once the response (or timeout) for the INVOKE is recieved, the server that is partially shutdown tries to process the response but is not able to do so, I've pasted the log messages below.  Also, if this is running in a cluster the other cluster node takes up the INVOKE_CHECK job from the node that is now partially shutdown. But the other cluster node that is still running is not able to process the response (or timeout) and continue the process execution.

       

      My environment is JBoss 5.1.0 GA, JBossWS CXF 3.4.0, RiftSaw 2.3.1-SNAPSHOT and JDK 1.6. (I'm using a snapshot build at revision 1429 that includes the fix for RIFTSAW-404)

       

      Log messages on Shutdown:

       

      13:35:12,373 INFO  [org.jboss.bootstrap.microcontainer.ServerImpl] Runtime shutdown hook called, forceHalt: true

      13:35:12,373 INFO  [org.apache.coyote.http11.Http11Protocol] Pausing Coyote HTTP/1.1 on http-0.0.0.0-8080

      13:35:12,373 INFO  [org.apache.coyote.http11.Http11Protocol] Stopping Coyote HTTP/1.1 on http-0.0.0.0-8080

      13:35:12,389 INFO  [org.apache.coyote.ajp.AjpProtocol] Pausing Coyote AJP/1.3 on ajp-0.0.0.0-8009

      13:35:12,389 INFO  [org.apache.coyote.ajp.AjpProtocol] Stopping Coyote AJP/1.3 on ajp-0.0.0.0-8009

      13:35:12,389 INFO  [org.jboss.soa.bpel.deployer.as5.AS5BPELDeployer] Undeploy: Quickstart_bpel_simple_invoke-1.jar relativePath=deploy.xml

      13:35:12,389 INFO  [org.jboss.soa.bpel.deployer.as5.AS5BPELDeployer] Not undeploying BPEL: Quickstart_bpel_simple_invoke-1.jar relativePath=deploy.xml

      13:35:12,389 INFO  [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/Quickstart_bpel_simple_invoke

      13:35:12,404 INFO  [org.jboss.wsf.framework.management.DefaultEndpointRegistry] remove: jboss.ws:context=Quickstart_bpel_simple_invoke,endpoint=HelloWorldWS

      13:35:12,404 INFO  [org.springframework.context.support.GenericApplicationContext] Closing org.springframework.context.support.GenericApplicationContext@af5c74: startup date [Mon

      Dec 12 13:34:50 IST 2011]; parent: org.jboss.wsf.stack.cxf.client.configuration.JBossWSBusApplicationContext@38a6d2

      13:35:12,404 INFO  [org.jboss.wsf.stack.cxf.client.configuration.JBossWSBusApplicationContext] Closing org.jboss.wsf.stack.cxf.client.configuration.JBossWSBusApplicationContext@3

      8a6d2: startup date [Mon Dec 12 13:34:50 IST 2011]; root of context hierarchy

      13:35:12,435 INFO  [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/web-console

      13:35:12,435 INFO  [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/invoker

      13:35:12,498 INFO  [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Stopping partition DefaultPartition

      13:35:12,498 INFO  [org.jboss.cache.RPCManagerImpl] Disconnecting and closing the Channel

      13:35:12,639 INFO  [org.jboss.cache.RPCManagerImpl] Stopping the RpcDispatcher

      13:35:12,764 INFO  [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Partition DefaultPartition stopped.

      13:35:12,764 INFO  [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Partition DefaultPartition destroyed.

      13:35:12,764 INFO  [org.jboss.soa.bpel.runtime.engine.service.BPELEngineService] Stopping JBoss BPEL Engine

       

      Now this JBoss instance does not exit but waits for the INVOKE reponse to return. In the mean time *on the other cluster node*:

       

      2011-12-12 13:35:14,392 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] (AsynchViewChangeHandler Thread) I am (10.40.41.241:1099) received membershipChanged event:

      <snip>

      2011-12-12 13:35:14,392 INFO  [org.jboss.soa.bpel.clustering.ODEJobClusterListener] (AsynchViewChangeHandler Thread) The available nodes now are [10.40.41.241:1099]

       

      Once the response is recieved *on the orignal JBoss node that is partially shutdown*:

       

      13:36:08,951 DEBUG [org.apache.cxf.transport.http.HTTPConduit] Response Code: 200 Conduit: {http://simple_invokehelloworld}HelloWorldPort.http-conduit

      <snip>

      13:36:09,045 DEBUG [org.apache.ode.bpel.engine.PartnerRoleMessageExchangeImpl] create work event for mex=hqejbhcnphr6tp10hgvakx

      <snip>

      13:36:09,061 DEBUG [org.apache.ode.sql] prepareStatement: insert into BPEL_MESSAGE (DATA, HEADER, MESSAGE_EXCHANGE_ID, TYPE) values (?, ?, ?, ?)

      13:36:09,061 DEBUG [org.apache.ode.sql] executeUpdate, bound (2,null) (3,hqejbhcnphr6tp10hgvakx) (4,{http://simple_invokehelloworld}HelloWorld_sayHelloResponse)

      13:36:09,076 DEBUG [org.apache.ode.sql] close()

      13:36:09,076 DEBUG [org.apache.ode.sql] close()

      13:36:09,076 DEBUG [org.apache.ode.sql] getConnection (tx=2)

      13:36:09,076 DEBUG [org.apache.ode.sql] prepareStatement: update BPEL_MESSAGE_EXCHANGE set CALLEE=?, CHANNEL=?, CORRELATION_ID=?, CORRELATION_KEYS=?, CORRELATION_STATUS=?, CORR_ID=?, CREATE_TIME=?, DIRECTION=?, EPR=?, FAULT=?, FAULT_EXPLANATION=?, OPERATION=?, PARTNER_LINK_ID=?, PARTNER_LINK_MODEL_ID=?, PATTERN=?, PIPED_ID=?, PORT_TYPE=?, PROCESS_ID=?, PROCESS_INSTANCE_ID=?, PROPAGATE_TRANS=?, REQUEST_MESSAGE_ID=?, RESPONSE_MESSAGE_ID=?, STATUS=?, SUBSCRIBER_COUNT=? where MESSAGE_EXCHANGE_ID=?

      13:36:09,076 DEBUG [org.apache.ode.sql] executeBatch, bound (1,null) (2,23) (3,null) (4,null) (5,null) (6,null) (7,2011-12-12 13:35:08.0) (8,P) (10,null) (11,null) (12,sayHello) (13,368) (14,18) (15,REQUEST_RESPONSE) (16,null) (17,{http://simple_invoke/helloworld}HelloWorld) (18,129) (19,113) (20,false) (21,643) (22,644) (23,RESPONSE) (24,0) (25,hqejbhcnphr6tp10hgvakx)

      13:36:09,076 DEBUG [org.apache.ode.sql] close()

      13:36:09,076 DEBUG [org.apache.ode.sql] close()

      13:36:09,076 DEBUG [org.apache.ode.sql] getConnection (tx=2)

      13:36:09,076 DEBUG [org.apache.ode.sql] prepareStatement: insert into ODE_JOB (channel, correlationKeySet, correlatorId, detailsExt, inMem, instanceId, mexId, nodeid, processId,retryCount, scheduled, ts, transacted, type, jobid) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

      13:36:09,076 DEBUG [org.apache.ode.sql] executeBatch, bound (1,23) (2,@2) (3,null) (4,null) (5,false) (6,113) (7,hqejbhcnphr6tp10hgvakx) (8,10.40.42.88:1099) (9,null) (10,0) (11,true) (12,1323677169061) (13,true) (14,INVOKE_RESPONSE) (15,hqejbhcnphr6tp10hgval1)

      13:36:09,076 DEBUG [org.apache.ode.sql] close()

      13:36:09,076 DEBUG [org.apache.ode.sql] close()

      13:36:09,076 DEBUG [org.apache.ode.bpel.engine.BpelServerImpl] BPEL SERVER STOPPING

      13:36:09,092 DEBUG [org.apache.ode.sql] getConnection (tx=4)

      13:36:09,092 DEBUG [org.apache.ode.sql] Set isolation level to 2

      13:36:09,092 INFO  [org.apache.ode.bpel.engine.BpelServerImpl] BPEL Server Stopped.

      13:36:09,092 DEBUG [org.apache.ode.sql] prepareStatement: delete from ODE_JOB where jobid=? and nodeid=?

      13:36:09,092 INFO  [STDOUT] BPAFLogAdapter shutdown

      13:36:09,092 DEBUG [org.apache.ode.sql] executeUpdate, bound (1,hqejbhcnphr6tp10hgval1) (2,10.40.42.88:1099)

      13:36:09,092 DEBUG [org.apache.ode.sql] close()

      13:36:09,092 DEBUG [org.apache.ode.sql] close()

      13:36:09,092 ERROR [org.apache.ode.scheduler.simple.SimpleScheduler] Error while processing a persisted job: [JobId: hqejbhcnphr6tp10hgval1,nodeId: 10.40.42.88:1099,scheduled: true,transacted: true,ts: 1323677169061,channel: 23,instaceId : 113,type: INVOKE_RESPONSE,retrycount: 0]

      java.lang.NullPointerException

              at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:315)

              at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:251)

              at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.processInTransactionContext(SimpleScheduler.java:566)

              at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.call(SimpleScheduler.java:546)

              at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.call(SimpleScheduler.java:533)

              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

              at java.util.concurrent.FutureTask.run(FutureTask.java:138)

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

              at java.lang.Thread.run(Thread.java:619)

      13:36:09,092 ERROR [STDERR] org.apache.ode.bpel.iapi.ContextException: Cannot locate the transaction manager; the server might be shutting down.

      13:36:09,092 ERROR [STDERR]     at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:256)

      13:36:09,092 ERROR [STDERR]     at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:251)

      13:36:09,092 ERROR [STDERR]     at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.processInTransactionContext(SimpleScheduler.java:611)

      13:36:09,092 ERROR [STDERR]     at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.call(SimpleScheduler.java:546)

      13:36:09,092 ERROR [STDERR]     at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.call(SimpleScheduler.java:533)

      13:36:09,092 ERROR [STDERR]     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

      13:36:09,092 ERROR [STDERR]     at java.util.concurrent.FutureTask.run(FutureTask.java:138)

      13:36:09,092 ERROR [STDERR]     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

      13:36:09,092 ERROR [STDERR]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

      13:36:09,092 ERROR [STDERR]     at java.lang.Thread.run(Thread.java:619)

      13:36:09,514 INFO  [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/

      13:36:09,514 INFO  [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/gwt-console-server

      13:36:09,514 INFO  [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/bpel-console

      13:36:09,514 INFO  [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/jbossws

      13:36:09,514 INFO  [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/jmx-console

      13:36:09,514 INFO  [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/Quickstart_bpel_simple_invokeWS

      13:36:09,530 INFO  [org.apache.catalina.core.StandardService] Stopping service jboss.web

      13:36:09,592 INFO  [org.jboss.resource.connectionmanager.ConnectionFactoryBindingService] Unbound ConnectionManager 'jboss.jca:service=DataSourceBinding,name=BPELDB' from JNDI na

      me 'java:BPELDB'

      13:36:09,623 INFO  [org.jboss.jmx.adaptor.snmp.agent.SnmpAgentService] SNMP agent stopped

      13:36:09,733 INFO  [org.jboss.resource.connectionmanager.ConnectionFactoryBindingService] Unbound ConnectionManager 'jboss.jca:service=DataSourceBinding,name=DefaultDS' from JNDI

      name 'java:DefaultDS'

      13:36:09,873 INFO  [com.arjuna.ats.jbossatx.jta.TransactionManagerService] Stopping transaction recovery manager

      13:36:09,873 INFO  [com.arjuna.ats.jbossatx.jta.TransactionManagerService] Destroying TransactionManagerService

       

      And *on ther other cluster node* I see the INVOKE_CHECK for the process is processed, but the INVOKE is not retries nor does process  continue execution from the next step.

       

      2011-12-12 13:37:08,573 DEBUG [org.apache.ode.bpel.engine.BpelEngineImpl] (ODEServer-2) handleJobDetails: InvokeCheck event for mexid hqejbhcnphr6tp10hgvakx

      2011-12-12 13:37:08,573 DEBUG [org.apache.ode.sql] (ODEServer-2) getConnection (tx=2)

      2011-12-12 13:37:08,588 DEBUG [org.apache.ode.sql] (ODEServer-2) prepareStatement: select messageexc0_.MESSAGE_EXCHANGE_ID as MESSAGE1_6_1_, messageexc0_.CALLEE as CALLEE6_1_, messageexc0_.CHANNEL as CHANNEL6_1_, messageexc0_.CORRELATION_ID as CORRELAT4_6_1_, messageexc0_.CORRELATION_KEYS as CORRELAT5_6_1_, messageexc0_.CORRELATION_STATUS as CORRELAT6_6_1_, messageexc0_.CORR_ID as CORR20_6_1_, messageexc0_.CREATE_TIME as CREATE7_6_1_, messageexc0_.DIRECTION as DIRECTION6_1_, messageexc0_.EPR as EPR6_1_, messageexc0_.FAULT as FAULT6_1_, messageexc0_.FAULT_EXPLANATION as FAULT11_6_1_, messageexc0_.OPERATION as OPERATION6_1_, messageexc0_.PARTNER_LINK_ID as PARTNER21_6_1_, messageexc0_.PARTNER_LINK_MODEL_ID as PARTNER13_6_1_, messageexc0_.PATTERN as PATTERN6_1_, messageexc0_.PIPED_ID as PIPED15_6_1_, messageexc0_.PORT_TYPE as PORT16_6_1_, messageexc0_.PROCESS_ID as PROCESS22_6_1_, messageexc0_.PROCESS_INSTANCE_ID as PROCESS23_6_1_, messageexc0_.PROPAGATE_TRANS as PROPAGATE17_6_1_, messageexc0_.REQUEST_MESSAGE_ID as REQUEST24_6_1_, messageexc0_.RESPONSE_MESSAGE_ID as RESPONSE25_6_1_, messageexc0_.STATUS as STATUS6_1_, messageexc0_.SUBSCRIBER_COUNT as SUBSCRIBER19_6_1_, props1_.MEX_ID as MEX2_3_, props1_.ID as ID3_, props1_.ID as ID14_0_, props1_.MEX_ID as MEX2_14_0_, props1_.PROP_KEY as PROP3_14_0_, props1_.PROP_VALUE as PROP4_14_0_ from BPEL_MESSAGE_EXCHANGE messageexc0_ left outer join BPEL_MEX_PROP props1_ on messageexc0_.MESSAGE_EXCHANGE_ID=props1_.MEX_ID where messageexc0_.MESSAGE_EXCHANGE_ID=?

      2011-12-12 13:37:08,588 DEBUG [org.apache.ode.sql] (ODEServer-2) executeQuery, bound (1,hqejbhcnphr6tp10hgvakx)

      2011-12-12 13:37:08,588 DEBUG [org.apache.ode.sql] (ODEServer-2) close()

      2011-12-12 13:37:08,588 DEBUG [org.apache.ode.sql] (ODEServer-2) close()

      2011-12-12 13:37:08,681 DEBUG [org.apache.ode.sql] (ODEServer-2) getConnection (tx=2)

      2011-12-12 13:37:08,681 DEBUG [org.apache.ode.sql] (ODEServer-2) prepareStatement: select processdao0_.ID as ID9_0_, processdao0_.GUID as GUID9_0_, processdao0_.PROCESS_ID as PROCESS3_9_0_, processdao0_.PROCESS_TYPE as PROCESS4_9_0_, processdao0_.VERSION as VERSION9_0_ from BPEL_PROCESS processdao0_ where processdao0_.ID=?

      2011-12-12 13:37:08,681 DEBUG [org.apache.ode.sql] (ODEServer-2) executeQuery, bound (1,129)

      2011-12-12 13:37:08,681 DEBUG [org.apache.ode.sql] (ODEServer-2) close()

      2011-12-12 13:37:08,681 DEBUG [org.apache.ode.sql] (ODEServer-2) close()

       

      On the other hand if I kill the JBoss process from the windows task manager, instead of attempting a clean shutdown, the other cluster node is able to resume the process normally. The other cluster node takes over the scheduled INVOKE_CHECK activity. Once the INVOKE_CHECK is processed the INVOKE is reported to have failed, it is retried and the process instance continues executing on the surviving cluster node.

       

      Thanks

      Anuj