Problems in CDL-M1 sample app
bernd.koecke Dec 12, 2008 6:37 AMHello,
first, its great to see an M1 release :).
I played a little bit around with the purchasing sample app and encountered a blocked choreography. I think there are two things wich caused this. My environment is:
- JBossAS 4.2.3 standalone and on a two node cluster
- JBossMessaging 1.4.0.SP3
- JBossESB 4.4
- CDL-M1
- MySQL 5.0
It seems that there is a racing condition in the join of the ParallelAction. Normally I see in the server.log two lines which together decrement the pathCount to "0". But there are sometimes two lines which both count only to "1". The events/messages are consumed and the result is a dead active session in database, the process doesn't goes on and the client gets a delivery exception after the timeout is reached.
After this, no successful calls to the process are possible. The client sets the id to 'id="5"'. And every call generates a new session in database. The creditAgency is called, but when the result is returned the purchaseApp searches for a session with 'id="5"', which is active. The first is the old dead one and this is selected. Then the process checks if the session has the right combination of category/name in it. But the selected dead session has the wrong combination and the process stops with an exception. You have to delete the whole data of the dead session from database and then the client calls are successful again.
The search for a perfect match can cause a kind of table scan, so this might not be a good idea. I remember that the id-string should be a unique identifier. In the sample app it is always the same string. So in production the problem will not arise very often. But it can happen, consider a process which handles customer data, the id is the customerNo. This process fails and later the process is called again for this customer with the same customerNo as id.
It may be that the racing condition is caused by my MySQL-Datasource-Configuration, but I got dead sessions by other failures in my processes, too.
In summary, the blocked process is caused by a dead active session with an non unique id in combination with the select algorithm of sessions.
Did anybody consider similar problems?
Regards,
Bernd