11 Replies Latest reply: Jul 5, 2012 9:09 PM by Galder Zamarreño RSS

data loss with a shared DIST cache store

Torben Jaeger Newbie

Hi,

 

I am using AS 7.1.1 with a cache configured like https://gist.github.com/2416139

JBoss is running with a domain HA profile (2 nodes).

 

Passivation is set to false, so that I have a write-thru configuration and everything is written to the database upon inserting into the cache.

 

The scenario:

 

I have a key distributed on these 2 nodes. The primary owner is server2. The key is updated on server1 though.

 

server1 is telling me:

 

22:47:30,380 TRACE [org.infinispan.interceptors.DistributionInterceptor] (http--127.0.0.1-8080-2) Not doing a remote get for key 100002075182001 since entry is mapped to current node (jboss1/mapper-cluster), or is in L1.  Owners are [jboss2/mapper-cluster, jboss1/mapper-cluster]

22:47:30,389 TRACE [org.infinispan.interceptors.locking.ClusteringDependentLogic] (http--127.0.0.1-8080-2) My address is jboss1/mapper-cluster. Am I main owner? - false

22:47:30,392 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor] (http--127.0.0.1-8080-2) Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

 

server2 reports:

 

22:47:30,486 TRACE [org.infinispan.interceptors.CacheStoreInterceptor] (OOB-20,null) Skipping cache store since the cache loader is shared and we are not the originator.

 

Ergo .. nothing is written to the cache store. In case of a server restart (or crash) data is lost.

 

If the primary owner is equal to the originator of the update everything works as expected and the record is persisted.

 

Is this the desired behavior?

 

Thx,

 

Torben

  • 1. Re: data loss with a shared DIST cache store
    Galder Zamarreño Master

    Hmmm, that smells like a bug. Could you please create an issue in http://issues.jboss.org/browse/ISPN and attach the configuration file and any other info you have there?

     

    Thanks!

  • 2. Re: data loss with a shared DIST cache store
    Galder Zamarreño Master

    The only way I can imagine this happening is if server2 is not configured with distribution, otherwise CacheStoreInterceptor.skip(InvocationContext, VisitableCommand) would not be called.

     

    If this is not the case, attach full test case. You might wanna try plugging Infinispan 5.1.4.FINAL into AS 7.1.1 to see if that solves your issue too.

  • 3. Re: data loss with a shared DIST cache store
    Torben Jaeger Newbie

    Galder Zamarreño wrote:

     

    If this is not the case, attach full test case. You might wanna try plugging Infinispan 5.1.4.FINAL into AS 7.1.1 to see if that solves your issue too.

     

    Galder,

     

    I have created https://github.com/jicken/ispn-shared-dist-cache

     

    It's an Arquillian cluster test. The repo is quite big as it contains two instances of JBoss AS7.1.1.Final. The test makes use of a PostgreSQL database; this is why u need to fix the connection settings in both nodes (ispn.xml) prior to executing the test.

     

    Done that, just run 'bash run.sh'.

     

    Please have a look at the Github README. I have mentioned the errors I came across when running this test. I didn't succeed to advance to step 3.) in the scenario as both keys were either distributed on just one node or on both nodes - but on the wrong one each.

     

    Desired distribution would be: key1 on jboss1, key2 on jboss2 to see step 3 failing

     

    If ispn.xml is configured with shared=false the testcases should be OK.

     

    If u have any problems just let me know.

  • 4. Re: data loss with a shared DIST cache store
    Juan Ignacio Barisich Newbie

    I have the same problem with Infinispan 5.1.5.CR1 in embedded mode. I have a cluster with 6 nodes (each one in a dedicated host) with this confinguration:

     

    <default>

            <jmxStatistics enabled="true" />

              <clustering mode="distribution">

                <hash numOwners="3"/>

                <sync />           

            </clustering>

            <locking useLockStriping="false" />

              <deadlockDetection enabled="true" spinDuration="500" />

            <transaction

                syncCommitPhase="true" syncRollbackPhase="true" useEagerLocking="false"

                  useSynchronization="false" eagerLockSingleNode="false">

                <recovery enabled="true" />

            </transaction>

            <loaders passivation="false" shared="true" preload="true">

                  <loader class="org.infinispan.loaders.jdbc.mixed.JdbcMixedCacheStore"

                    fetchPersistentState="false" ignoreModifications="false"

                    purgeOnStartup="false">

                      <properties>

                        <property name="tableNamePrefixForStrings" value="ISPN_MIXED_STR_TABLE" />

                        <property name="tableNamePrefixForBinary" value="ISPN_MIXED_BINARY_TABLE" />

                        <property name="idColumnNameForStrings" value="ID_COLUMN" />

                        <property name="idColumnNameForBinary" value="ID_COLUMN" />

                        <property name="dataColumnNameForStrings" value="DATA_COLUMN" />

                        <property name="dataColumnNameForBinary" value="DATA_COLUMN" />

                        <property name="timestampColumnNameForStrings" value="TIMESTAMP_COLUMN" />

                        <property name="timestampColumnNameForBinary" value="TIMESTAMP_COLUMN" />

                        <property name="timestampColumnTypeForStrings" value="BIGINT" />

                        <property name="timestampColumnTypeForBinary" value="BIGINT" />

                        <property name="connectionFactoryClass"

                            value="org.infinispan.loaders.jdbc.connectionfactory.ManagedConnectionFactory" />

                        <property name="datasourceJndiLocation" value="java:DB2XADS" />

                        <property name="databaseType" value="DB2"/>

                        <property name="idColumnTypeForStrings" value="VARCHAR(255)" />

                        <property name="idColumnTypeForBinary" value="VARCHAR(255)" />

                        <property name="dataColumnTypeForStrings" value="BLOB" />

                        <property name="dataColumnTypeForBinary" value="BLOB" />

                        <property name="dropTableOnExitForStrings" value="false" />

                        <property name="dropTableOnExitForBinary" value="false" />

                        <property name="createTableOnStartForStrings" value="false" />

                        <property name="createTableOnStartForBinary" value="false" />                   

                    </properties>

                </loader>

            </loaders>

            <expiration wakeUpInterval="-1"/>

        </default>

    At start, all nodes log somethig like this:

    2012-05-30 10:09:22,507 DEBUG [org.infinispan.interceptors.InterceptorChain:org.infinispan.interceptors.InterceptorChain.printChainInfo(InterceptorChain.java:76)] Interceptor chain is:

        >> org.infinispan.interceptors.InvocationContextInterceptor

        >> org.infinispan.interceptors.CacheMgmtInterceptor

        >> org.infinispan.interceptors.StateTransferLockInterceptor

        >> org.infinispan.interceptors.TxInterceptor

        >> org.infinispan.interceptors.NotificationInterceptor

        >> org.infinispan.interceptors.locking.OptimisticLockingInterceptor

        >> org.infinispan.interceptors.EntryWrappingInterceptor

        >> org.infinispan.interceptors.ClusteredCacheLoaderInterceptor

        >> org.infinispan.interceptors.DistCacheStoreInterceptor

        >> org.infinispan.interceptors.DeadlockDetectingInterceptor

        >> org.infinispan.interceptors.DistributionInterceptor

        >> org.infinispan.interceptors.CallInterceptor

     

    By putting an entry on the first node, the nodes 1, 2 and 5 changes its "numberOfEntries" variable to 1. But any node incrases the "CacheLoaderStores" jmx variable, and therefore, no entry is persisted in the database. Looking at the logs, I see the following:

    Node 1 logs:

    2012-05-30 10:18:41,053 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor:org.infinispan.interceptors.DistCacheStoreInterceptor.skipKey(DistCacheStoreInterceptor.java:185)] Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

    Node 2 logs:

    2012-05-30 10:18:41,570 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor:org.infinispan.interceptors.DistCacheStoreInterceptor.skipKey(DistCacheStoreInterceptor.java:185)] Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

    Node 5 logs:

    2012-05-30 10:18:41,845 TRACE [org.infinispan.interceptors.CacheStoreInterceptor:org.infinispan.interceptors.CacheStoreInterceptor.skip(CacheStoreInterceptor.java:121)] Skipping cache store since the cache loader is shared and we are not the originator.

     

    By setting the property shared="false", the entry is persisted, but I think this is not appropiated, right?

    Regards

  • 5. Re: data loss with a shared DIST cache store
    Galder Zamarreño Master

    Irrespective of what Juan Ignacio says, Torben, is your test right? You have 2 nodes, and 2 as number of owners too. I can't see how the ERROR flavours in the README make sense, because both nodes are owners in this case. However, if you switch to 1 owner, I've seen a test I've built in a similar way. I'm investigating.

     

    @Juan Ignacio, I've got a test failing. Let's see if a fix can be found and you can try it in your env.

  • 6. Re: data loss with a shared DIST cache store
    Torben Jaeger Newbie

    The intention of owners == 2 was to prevent database access in case of a server crash. Let's say we have 10 nodes, owners would still be 2 (or 3) for redundancy, but not 10.

     

    Your 2nd question:

    The grep in run.sh is wrong as it doesn't grep the following statements of the org.infinispan.interceptors.DistCacheStoreInterceptor:

     

    [torben@jit] ~/dev/oss/ispn-shared-dist-cache$ grep owner node?/jboss-as-7.1.1.Final/standalone/log/server.log

    node1/jboss-as-7.1.1.Final/standalone/log/server.log:18:25:38,899 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor] (http--127.0.0.1-9080-1) Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

    node1/jboss-as-7.1.1.Final/standalone/log/server.log:18:25:38,903 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor] (http--127.0.0.1-9080-1) Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

    node1/jboss-as-7.1.1.Final/standalone/log/server.log:18:25:41,418 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor] (OOB-17,null) Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

    node1/jboss-as-7.1.1.Final/standalone/log/server.log:18:25:41,419 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor] (OOB-17,null) Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

    [torben@jit] ~/dev/oss/ispn-shared-dist-cache$

     

    I've fixed that in run.sh.

     

    But here u can see: " ... and the caller is not the first owner of the key"

     

    It's not only about being owner, but being the first owner.

  • 7. Re: data loss with a shared DIST cache store
    Galder Zamarreño Master

    Torben, I've replicated this in a smaller scale , see https://issues.jboss.org/browse/ISPN-2089

  • 8. Re: data loss with a shared DIST cache store
    Galder Zamarreño Master

    Guys, I've a patch that solves the issue in https://github.com/galderz/infinispan/tree/t_dist_shared_5 - in case you wanna give it a go

  • 10. Re: data loss with a shared DIST cache store
    Fernando Wasylyszyn Newbie

    Galder: in the https://issues.jboss.org/browse/ISPN-2089 there is a reference to 5.1.x as one of the fix versions. There is a plan to release a 5.1.6 version that includes this patch. If there is, do you know when is planned to release it?

    Kind regards.

    Fernando.

  • 11. Re: data loss with a shared DIST cache store
    Galder Zamarreño Master

    I'm pretty sure we won't be doing any further 5.1.x releases. Either upgrade to 5.2.x, or buy a JBoss Data Grid (uses 5.1.x) support contract