1 2 Previous Next 25 Replies Latest reply: Mar 26, 2009 12:37 PM by nikita tovstoles RSS

Lock striping broken for Second Level Cache use case

Brian Stansberry Master

The unit test dukehoops has posted on http://www.jboss.org/index.html?module=bb&op=viewtopic&t=151524&start=20 is showing the fundamental problem with using lock striping. Eventually you are going to get failures due to deadlocks as different txs acquire different stripes. But with some of the stuff 2LC does, particularly if query caching is enabled and a shared cache is used, the probability is quite high.

Assume you have a tx (all Fqns below are for example only):

writes to /ENTITIES/Contacts/1 locks a stripe
writes to /ENTITIES/Contacts/2 locks a stripe

beforeCompletion() by Hibernate and JBC

afterCompletion() by Hibernate tries to update the timestamps cache:
non-tx write to /TIMESTAMPS/Contacts

For sure within a relatively small # of txs, some Fqn /ENTITIES/Contacts/x is going to map to the stripe needed by /TIMESTAMPS/Contacts. Therafter, no tx involving that Fqn is every going to succeed.

The test dukehoops submitted runs 200 tx's involving about 205 fqns (plus structure). It fails with the default 500 concurrencyLevel 100% of the time.

If I increase concurrencyLevel to 5000, it fails with about 7500 txs, maybe less.

There needs to be some sort of lock-per-fqn scheme here.

  • 1. Re: Lock striping broken for Second Level Cache use case
    Manik Surtani Master

    This is always going to be a problem when you have shared locks, and increasing concurrency level will only mitigate the problem, not resolve it altogether.

    There is, unfortunately, a fundamental problem with a lock-per-fqn scheme, in that the lock needs to exist somewhere, even on a non-existent Fqn (for creates, or removes on nodes that don't exist).

    We had this problem in PL since locks (per node) were stored in the node, and it involved all sorts of hacks to get it to work, severely degrading performance and still exposing a few race conditions. Not pleasant stuff.

    Let me think about alternate approaches here, and what can be done.

  • 2. Re: Lock striping broken for Second Level Cache use case
    Manik Surtani Master

    One approach that may work is to use a ConcurrentMap as a lock container rather than an array of fixed size. Instead of mapping locks to a modularised hashcode of a given Fqn as we do now, we could use the Fqn itself as a key to a lock, and use concurrent methods on the map to locate the lock, creating new ones if absent, etc. It will perform worse (both in terms of memory and lock locating, since unnecessary locks may be created and thrown away) but it would prevent the deadlock issue mentioned above.

    Also, it would be simple enough to implement as an alternate LockContainer impl and plug in. Let me give this a bit more thought.

  • 4. Re: Lock striping broken for Second Level Cache use case
    Manik Surtani Master

    Ok, the solution I proposed above is broken. The map would grow infinitely and be a memory leak since there is no way to remove the lock from the concurrent map, unless there is some further synchronization, which will make things even less performant. More to think about.

  • 5. Re: Lock striping broken for Second Level Cache use case
    Manik Surtani Master

    Ok, this can be done, but it won't be altogether straightforward:

     getLock(Fqn) {
     1. create new lock.
     2. putIfAbsent on the concurrent map that holds locks, swapping to the actual lock in the map if needed
     3. acquire lock
     4. after acquisition, check again that the lock is the current lock in the concurrent map, if not, release lock and retry this method.
     }
    
     releaseLock(Fqn) {
     1. get lock from map
     2. if not null, and if owner, attempt to release lock
     3. remove lock from collection
     }
    


    While this will work, this will cause waiters to spin on getLock().4 until it has got the same lock it expects. In releaseLock().3, this will cause threads that release locks to clean up, but will cause additional spinning on getLock().4. Basically, in getLock(), a lock is only valid if you created it in getLock().1.

  • 6. Re: Lock striping broken for Second Level Cache use case
    Manik Surtani Master

    Actually, getLock().4 should read:

    4. after acquisition, test that lock acquired is same as lock created in 1, otherwise unluck, discard lock and try again.

  • 7. Re: Lock striping broken for Second Level Cache use case
    Manik Surtani Master

    Ok, I have checked in something that may work, have a look at the JIRA for details.

    This does not impact existing code in any way since it is controlled by a config switch which defaults to using lock striping. If anyone is interested in building trunk and giving this a try beyond my unit tests, feedback would be appreciated.

  • 8. Re: Lock striping broken for Second Level Cache use case
    Brian Stansberry Master

    I tried a locally built snapshot of trunk with dukehoops' test and the problem goes away.

    Can you use a putIfAbsent instead of a get to try to preserve the original lock? Idea being that if threads are waiting on the original lock you try to keep using it rather than having them queue up on a new lock.

    ### Eclipse Workspace Patch 1.0
    #P jbosscache-core
    Index: src/main/java/org/jboss/cache/util/concurrent/locks/PerElementLockContainer.java
    ===================================================================
    --- src/main/java/org/jboss/cache/util/concurrent/locks/PerElementLockContainer.java (revision 7915)
    +++ src/main/java/org/jboss/cache/util/concurrent/locks/PerElementLockContainer.java (working copy)
    @@ -56,7 +56,8 @@
     Lock l = getLock(object);
     l.lock();
     // now check that the lock is still valid...
    - if (l != locks.get(object))
    + Lock official = locks.putIfAbsent(object, l);
    + if (official != null && l != official)
     {
     // we acquired the wrong lock!
     l.unlock();
    


    (The above would be applied to the overloaded acquireLock(E, long. TimeUnit) as well.)

    There is still a possibility that after the lock owner removes the lock from the map in releaseLock() another thread could sneak in and create a new lock. But that's not incorrect, just not fair.

  • 9. Re: Lock striping broken for Second Level Cache use case
    Jason Greene Master

     

    "manik.surtani@jboss.com" wrote:
    Actually, getLock().4 should read:

    4. after acquisition, test that lock acquired is same as lock created in 1, otherwise unluck, discard lock and try again.


    Why is this step needed? If everything is using putIfAbsent you shouldn't have a problem.

  • 10. Re: Lock striping broken for Second Level Cache use case
    Jason Greene Master

     

    "jason.greene@jboss.com" wrote:
    "manik.surtani@jboss.com" wrote:
    Actually, getLock().4 should read:

    4. after acquisition, test that lock acquired is same as lock created in 1, otherwise unluck, discard lock and try again.


    Why is this step needed? If everything is using putIfAbsent you shouldn't have a problem.


    Ah I see this is to handle the case of a blocked/queued thread acquiring a lock. Hmm this kind of defeats the performance benefit of aqs lock queueing

  • 11. Re: Lock striping broken for Second Level Cache use case
    Jason Greene Master

     

    "bstansberry@jboss.com" wrote:

    Can you use a putIfAbsent instead of a get to try to preserve the original lock? Idea being that if threads are waiting on the original lock you try to keep using it rather than having them queue up on a new lock.


    Yes this is better, although still sucks when you have contended access on a Lock. I think it might make more sense to keep locks around and evict them later, either when the key is removed/evicted, or some other special lock eviction algorithm that uses timestamps.

  • 12. Re: Lock striping broken for Second Level Cache use case
    Brian Stansberry Master

    PerElementLockContainer.locks is using the default CHM c'tor which means 16 segments. That's low since this will be a very write-heavy map.

  • 13. Re: Lock striping broken for Second Level Cache use case
    Jason Greene Master

     

    "jason.greene@jboss.com" wrote:
    "bstansberry@jboss.com" wrote:

    Can you use a putIfAbsent instead of a get to try to preserve the original lock? Idea being that if threads are waiting on the original lock you try to keep using it rather than having them queue up on a new lock.


    Yes this is better, although still sucks when you have contended access on a Lock. I think it might make more sense to keep locks around and evict them later, either when the key is removed/evicted, or some other special lock eviction algorithm that uses timestamps.


    IMO A lazy LRU lock eviction alg that is executed on lock aquire would work very well for this.

  • 14. Re: Lock striping broken for Second Level Cache use case
    Manik Surtani Master

     

    "jason.greene@jboss.com" wrote:

    IMO A lazy LRU lock eviction alg that is executed on lock aquire would work very well for this.


    Yeah, but another thread? :/

    I suppose any existing eviction thread could be reused, if one exists, to mitigate this...


1 2 Previous Next