After scavenging the forums and docs for jbpm 5 over the last week, I have drafted up the attached 'reference architecture' for a scalable jBPM 5 deployment. Would love to get feedback (aka ripped apart ).
Here are the key pieces:
1. A shared KnowlegeBase which acts as a repository of processes across the whole deployment (master-slave replicated with writes-to-master and reads-from-slaves etc etc)
2. There are a fixed number of KnowledgeSessions spread across DBs in a DB farm (DB1, DB2)
3. A table contains the mapping between these sessions, the database that they are stored in and the node that is assigned to these sessions. Management of the data in this table is an admin config operation (either automatic or manual).
4. A number of worker nodes which load the KnowledgeSessions that belong to them (as per the table above)
5. Another table that has mapping of processes-to-sessions. This table is updated by the worker nodes when a new process is created.
6. A router or client proxy that routes process instance managment requests as per the following rules:
- For process instance creation requests, round robin to any node
- For requests pertaining to a specific process instance, consult the table in (5) & (6) and route to appropriate node
Now how to scale out:
- If the bottleneck is the CPU/memory of the worker nodes then add more nodes and rebalance the session distribution in table (3)
- If the bottleneck is the DB capacity or disk then add a database, migrate processes/session data to the new database and update table (3)
Of course the resharding above is crude and manual and can probably be more sophisticated but would love to hear thoughts/comments on the basic idea.
Also practically speaking, I hope it is possible to migrate a session and processes to a different DB at all?
jbpm_scale_out.png 178.3 K
This looks like a decent session management strategy for large deployments (where you need to be able to manage a lot more than just a few thousand of active process instances).
It is indeed possible to migrate sessions and process instances to a different DB. Note that a lot more is actually be possible, that your strategy is not using (which might not mean that there is an issue with this of course, just mentioning as this could give you more options to scale as well).
You can move a session from one node to other. So if node1 instantiated a session, but at some point later, node1 is busy, you can restore the same session on node2 and continue execution of the process instance there. So this gives you more flexibility than defined in "For requests pertaining to a specific process instance, consult the table in (5) & (6) and route to appropriate node". But this is probably a trade of between cpu and db (as moving to another node probably means that the session needs to be restored from db while it might be in cache on the original node). We do however recommend to not run the same session in parallel on multiple nodes (as that would usually lead to conflicts and decreased performance. So binding a session to a node isn't necessarily a bad strategy (just means you have some flexibility to reallocate if necessary).
You can even move a process instance to another session. Node that you should only consider this if your process instance is not linked to its session state (otherwise you should always use them together). Examples where session state is used is for example when you are using timers or when you are using business rules anywhere in your process.
We're working on setting up improved session management for cases like this (instantiate session remotely on nodes based on a specific strategy and distribute requests). If you believe this would be something we could collaborate on, let me know !