7 Replies Latest reply on Sep 4, 2015 1:32 PM by eric.wittmann

    Help with ISPN + apiman

    eric.wittmann

      Hi there!

       

      I'm seeking some help for using ISPN with:

       

        http://www.apiman.io/latest/

       

      We're using ISPN caches in a few places in the default configuration of apiman, specifically for:

       

      * Shared state for rate limiting

      * Storage of configuration data

      * Simple TTL cache

       

      We've seen some problems with a couple of these (difficult to reproduce) and I was hoping for some insight.

       

      As a very simple example, here is the ISPN implementation of our Rate Limiting interface:

       

      https://github.com/apiman/apiman/blob/master/gateway/engine/ispn/src/main/java/io/apiman/gateway/engine/ispn/InfinispanRateLimiterComponent.java

       

      The code basically stores a bucket object into the ISPN cache and then uses it as a resettable counter.

       

      What we're seeing *sometimes* is that the retrieval of the bucket returns a stale instance (which seemingly violates the Map contract?).

       

      Here is some sample output when running this code *sequentially* but quickly:

       

      https://gist.githubusercontent.com/jcechace/0dd444c664c4033bd1fb/raw/35770de283fedfb5818ffadc7d47046a4795ef15/rateLimitLog.log

       

      This is running in WF 8.2.0.Final on a single node (not clustered).

       

      Here is the cache configuration in standalone.xml:

       

      https://github.com/apiman/apiman/blob/master/distro/wildfly8/src/main/resources/overlay/standalone/configuration/standalone-apiman.xml#L244-L266

       

      Thanks!

        • 1. Re: Help with ISPN + apiman
          sannegrinovero

          Hi,

          I'll let the Infinispan core team answer on your main question as I don't know why it would return a stale value, but when looking at the InfinispanRateLimiterComponent I noticed a design issue: the javadoc seem to imply that this is meant to work in a multiple-nodes cluster configuration too, but you're doing a local only synchronization.

          Also you seem to be storing a mutable object in the Cache: keep in mind that the instance returned by Cache#get will be the same instance across multiple threads potentially reading from the cache, and another thread might need to transmit a copy of the instance "as is" to another node before your current thread gets to the explicit Cache#put operation. You cant' rely on your synchronization to protect you from such a race condition as there are other management threads internal to Infinispan which work on the entries, for example moving entries to different stores or nodes to balance resource usage.

           

          You probably want to rethink this to use Cache#putIfAbsent initially rather than using the mutex, and refactor to use immutable entries rather than mutating operations such as bucket.resetIfNecessary.

          • 2. Re: Help with ISPN + apiman
            eric.wittmann

            Thanks for the feedback!

            • 3. Re: Help with ISPN + apiman
              william.burns

              Eric,

               

              I am not sure what is going on here.  Unfortunately there isn't a whole lot to tell from the trace.  Is it possible you could run this printing out the arguments to the accept method as the first line of the method?  I have never heard of ISPN cache returning an older value like this (except with concurrent operations), especially from a LOCAL cache which is substantially simpler than a distributed one.

               

              Also if you are able enable tracing for ISPN should help shed some light as well.

              • 4. Re: Help with ISPN + apiman
                eric.wittmann

                Of course - I can work on building a simple reproducer, although I'm not sure how much luck I'll have (this problem doesn't always happen and is hard to reproduce). 

                 

                At the same time, can you point me at any ISPN documentation/sample that might exist which implements a simple cluster-wide shared counter?

                • 5. Re: Help with ISPN + apiman
                  sannegrinovero

                  Implementing a Counter is actually quite tricky on top of Infinispan's core model, and requires to think carefully about which requirements your counter has. There are plans to expose such a feature in some "easy way" on the Infinispan public API, but it's generally unclear what people mean exactly by "counter", and which trade-offs are not acceptable.

                   

                  As an example, here I implemented reference counting for a shared resource lock, storing an Integer and incrementing/decrementing it as needed with atomic operations:

                  - https://github.com/infinispan/infinispan/blob/master/lucene/lucene-directory/src/main/java/org/infinispan/lucene/readlocks/DistributedSegmentReadLocker.java#L137

                   

                  As far as I'm understanding you want to implement a rate limiter? That would probably need a different approach, for example you probably want a statistical estimate and not a fully accurate value?

                  To have an always-synchronized fully accurate value you would be generating quite some network traffic, so a better approach would be to have each node reserve some increment blocks depending on load.

                   

                  To implement such a model you probably want to go at the lower level, using JGroups's COUNTER protocol:

                  - Chapter 4. Building Blocks

                   

                  You can either get a reference to the JGroups channel being used by Infinispan, or you can open a dedicated one, or you can multiplex a channel using the FORK protocol. If you're running this on WildFly 10, the appserver is ready to manage FORK channels so that should be the simplest approach, as you can then add a dedicated COUNTER for your application, isolated from other apps using the same network transport.

                  • 6. Re: Help with ISPN + apiman
                    william.burns

                    Eric Wittmann wrote:

                     

                    Of course - I can work on building a simple reproducer, although I'm not sure how much luck I'll have (this problem doesn't always happen and is hard to reproduce). 

                    Sounds great, hopefully you can get lucky

                    Eric Wittmann wrote:

                     

                    At the same time, can you point me at any ISPN documentation/sample that might exist which implements a simple cluster-wide shared counter?

                    Unfortunately there really isn't anything that I am aware of.

                     

                    There are 2 main ways to do this.

                     

                    1. Pessimistic transactional cache which does a read lock - Pessimisstic.java · GitHub
                    2. Non transactional cache using concurrent write operations Optimistic.java · GitHub

                     

                    I tried to keep both examples very bare bones, but it should get the point across.

                     

                    The non transactional will be faster if there is low contention for the given key or if you are invoking it on the primary owner for that key (since it only goes remote after it acquires the lock locally).

                    • 7. Re: Help with ISPN + apiman
                      eric.wittmann

                      Great!  Thanks to both of you for the responses.  I'll be circling back around on this issue probably in a few weeks, at which point we'll have some time to hopefully try out these approaches.

                       

                      Thanks very much for the help - I should have solicited it months ago when the implementation was first done.