I created a test case to simulate the issue we have in our production environment.
I have a 3 nodes cluster in distribution mode ten threads, each threads will start a loop doing following steps:
1) pick a random number between 0 and 3
2) lock the number in one of the three nodes
3) increase it and replace the number back
4) sometimes, in the same transaction the thread will try to lock the key in another cache and change the value of the number in the same transaction.
There are two problems I have with this test. First it seems the deadlock detector doesn't work. The exception I got is always timeout exception that complains can not acquire the lock. Actually for each transaction it will just lock the same key, so there shouldn't be issue with locking different resources at different order. In version before 4.2, the deadlockdetector sometimes works, and sometimes doesn't. But for now, it never works.
Secondly, it seems during this process some lock will be hold by some transaction, and then never got released. In the error log file I can see normally during the test there's only one or two key failed to accquire the lock. And in the shutdown method, I tried to lock all the resources again. At that time there will be only one thread accessing the resource, but I can still see the lock failure.
Hope somebody can help. Attached is the test case I have which is a bit complicated. I'm using 4.2.0.ALPHA5 for this test.
Thanks in advance.
Thanks for the excellent unit test.
Deadlock detection only runs within a single cache. I've created ISPN-767 to address this, also updated the documents. Not sure this enhancement will be part of 4.2 though. The workaround would be to put everything in a single cache.
Looking into the locking issue right now.