RiftSaw clustering design

1. overview

 

riftsaw-cluster.jpg

 

2. Configuration

 

1) configure the bpel wsdl file into load-balancer url.

 

3. Tasks

1) Deployment mechanism

 

    1.1: use jboss 'farm' folder to do the deployment. Deployment should result in database modifications being made only once, originating from the node the deployment has taken place on, but the deployment unit JAR should be distributed to all nodes, and all nodes should operate using the new process definition from that moment on. The deployment should work correctly for both a brand new process and for a new version of an existing process (see http://community.jboss.org/wiki/RiftSawDeploymentMechanism).

    1.2: manually deploy the artifact from one node to another node manually, this approach has issues. (Not supported)

 

2) Cron scheduler.

   CronScheduler service is mostly for two service, one is RuntimeDataCleanup (http://ode.apache.org/instance-data-cleanup.html), the other one is SystemSchedulesConfig ( I am not sure if we use this feature or not, as I found it was called from DeploymentPoller.java). The CronScheduler used java.util.Timer to register TimerTask, it means that if a node crashes, this TimerTask would lost also.

     For the RuntimeDataCleanUp, because we've used the JPA based DAO  implementation, which didn't implement the FilteredInstanceDeletable, so  this cron job actually does nothing. But we do have the  RuntimeDataCleanup feature available, because once we've finished a  process (both completeOK and completeWithFault), we will check if users  specify the DataCleanUp configuration from the deploy.xml, if it has, it  will clean the data appropriately.

 

3) Heart beat module for detecting nodes.

    It should be able to keep a record of how many and where are those active nodes, and able to identify the master node, which will be used for CronScheduler Service. These information will be needed by bpel-console.

    1.1 In the JBoss AS all config, we have HAPartition service that we can use to know if there is a node added or dropped, we can leverage this feature to accomplish above API.

 

4) bpel-console cluster feature.

   Should we have a tab for displaying that how many active nodes are in the clustering environment.

 

4. TestCases (Requirements)

 

1) we will use simple_correlation example as our test, firstly, we will send a 'ant sendhello' to node1. And then we will send 'ant goodbye' with same correlation value to node2, we expect this action to be run successfully, and this process ends properly.

 

2) we will set up a Runtime instance clean up on the process configuration in the clustering environment. We will need to make sure that it gets clean up properly. (one and only one node runs this task)

 

3) In the bpel console, users are able to see how many active nodes (and also when they are started) on the console. Should provide a 'Refresh' button for users to click show the latest clustering environment.

 

 

5. Limitation

I believe we just support the clustering usage for HTTP based invocation, because of we use the load balancer front-end. It won't work well against JCA adapter.


6. Questions/Answers

1.Comments from  Kurt:

 

" Leaves the question what to do if a node goes down hard and cannot unregister the EPRs. Note that

- when the node comes back up it will register the same EPR (and overwrite the stale one)

-  in ESB land the service invoker is able to fail over to another EPR  (and there is a setting for removing a dead EPR), we may be able to do  the same thing.

- I'm not sure tying EPR  undeployment to jboss-clustering makes sense. "

 

 

In the Simple-Scheduler module, we had a thread task called 'CheckStaleNode', I think we can use this one to remove the EPR if appropriately,

The information that needed by this task would be the 'nodeId', I am not sure if we can find the EPR from this nodeId.