14 Replies Latest reply: Apr 6, 2010 3:39 AM by Mihir Patel RSS

Error when clustering

John Ament Master

Hmm looks like I'm having cluster problems now.  I created a config based on the sample config file.  In both cases, it is reading my jgroups-tcp.xml file.  I have two apps running, both local on different glassfish instances.  When I start the first node up, it looks good.  When I bring the second online, I get the following exception when it tries to load the replicated cache.

 

Caught while requesting or applying state
org.infinispan.statetransfer.StateTransferException: Provider cannot provide state!
    at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:315)
    at org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:73)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:564)
    at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:657)
    at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:717)
    at org.jgroups.JChannel.up(JChannel.java:1413)
    at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:829)
    at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:489)
    at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:529)
    at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:468)
    at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:230)
    at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
    at org.jgroups.protocols.FC.up(FC.java:475)
    at org.jgroups.protocols.pbcast.GMS.up(GMS.java:890)
    at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:236)
    at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:596)
    at org.jgroups.protocols.UNICAST.up(UNICAST.java:275)
    at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:705)
    at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
    at org.jgroups.protocols.FD.up(FD.java:259)
    at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
    at org.jgroups.stack.Protocol.up(Protocol.java:340)
    at org.jgroups.protocols.Discovery.up(Discovery.java:277)
    at org.jgroups.protocols.PING.up(PING.java:67)
    at org.jgroups.protocols.MPING.up(MPING.java:173)
    at org.jgroups.protocols.TP.passMessageUp(TP.java:953)
    at org.jgroups.protocols.TP.access$100(TP.java:53)
    at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1457)
    at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1439)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)

 

Caught while requesting or applying state
org.infinispan.statetransfer.StateTransferException: java.io.StreamCorruptedException: Unexpected byte found when reading an object: 0
    at org.infinispan.statetransfer.StateTransferManagerImpl.assertDelimited(StateTransferManagerImpl.java:382)
    at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:309)
    at org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:73)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:564)
    at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:657)
    at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:717)
    at org.jgroups.JChannel.up(JChannel.java:1413)
    at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:829)
    at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:489)
    at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:529)
    at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:468)
    at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:230)
    at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
    at org.jgroups.protocols.FC.up(FC.java:475)
    at org.jgroups.protocols.pbcast.GMS.up(GMS.java:890)
    at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:236)
    at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:596)
    at org.jgroups.protocols.UNICAST.up(UNICAST.java:275)
    at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:705)
    at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
    at org.jgroups.protocols.FD.up(FD.java:259)
    at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
    at org.jgroups.stack.Protocol.up(Protocol.java:340)
    at org.jgroups.protocols.Discovery.up(Discovery.java:277)
    at org.jgroups.protocols.PING.up(PING.java:67)
    at org.jgroups.protocols.MPING.up(MPING.java:173)
    at org.jgroups.protocols.TP.passMessageUp(TP.java:953)
    at org.jgroups.protocols.TP.access$100(TP.java:53)
    at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1457)
    at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1439)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.StreamCorruptedException: Unexpected byte found when reading an object: 0
    at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:708)
    at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207)
    at org.jboss.marshalling.AbstractUnmarshaller.readObject(AbstractUnmarshaller.java:85)
    at org.infinispan.marshall.jboss.JBossMarshaller.objectFromObjectStream(JBossMarshaller.java:207)
    at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:171)
    at org.infinispan.statetransfer.StateTransferManagerImpl.assertDelimited(StateTransferManagerImpl.java:380)
    ... 32 more

  • 1. Re: Error when clustering
    Amin Abbaspour Newbie

    Seems like a serialization issue (which uses Rive based on stackTrace).

     

    Are you sure you have identical objects and infinispan version in both ends?

     

    see this: http://community.jboss.org/wiki/InfinispanTechnicalFAQs#Marshalling__Unmarshalling

    and this: http://docs.jboss.org/infinispan/4.0/apidocs/config.html#ce_global_serialization

  • 2. Re: Error when clustering
    John Ament Master

    In my case, both ends are running locally.  I have both pointing to same config file, different glassfish instances.  Each has its own jgroups-tcp.xml, where the only difference is the binding port.  I was able to try out the jgroups demo package and it worked fine.  I can see that when the second member comes online, it attempts to fetch the cache from the first.  The odd thing I am noticing is that the first member isn't loading the cache from the cache store.

     

    The config file I'm using is the all.xml that shipped with infinispan.  I tweaked persistentCache for testing purposes, which looks like this:

     

       <namedCache name="persistentCache">    <loaders shared="true" preload="true">      <loader          class="org.infinispan.loaders.file.FileCacheStore"          fetchPersistentState="true" ignoreModifications="false"          purgeOnStartup="false">            <!-- See the documentation for more configuration examples and flags. -->            <properties>               <property name="location" value="/devel/cache"/>            </properties>         </loader>      </loaders>
    
          <deadlockDetection enabled="true" spinDuration="1000"/>
    
       </namedCache>

     

     

    It would be odd to be a serializable issue, as the cache is simply a String,String cache.

     

    So it'll probably help to explain what it is my goal is.

     

    I want to be able to have a replicated cache that has file system storage.  The production environment will have 2 nodes.  Test also has 2 nodes.  Before I apply the config to test, I want to run it on my local machine.  Without replication, everything works well.  All of the named caches persist correctly on the file system.  What I want to do is once a second node comes online, have it pull the data from the cache locally.  The behavior of our cache is that it is a write once and read many system.  Day 1 of using the app the users will populate it, then going forward it will just be read a bunch of times (the cache will store data from spreadsheets and remote databases in it).

     

    The way I figure, I should point both instances running on my local machine to different paths on the file system to ensure that both locations get written to.  That didn't work.  Then I assume that both should point to the same location in the file system.  That doesn't work either.  When I modify them to remove all knowledge of clusters, it works fine, as long as they both point to the same file system

     

    Am I correct in understanding that global and default apply to all caches?

  • 3. Re: Error when clustering
    Galder Zamarreño Master

    Wrt to:

    Caused by: java.io.StreamCorruptedException: Unexpected byte found when reading an object: 0
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:708)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207)

     

    What Infinispan version are you using? Can you try using latest CR4?

  • 4. Re: Error when clustering
    John Ament Master
    I am using CR4.
  • 5. Re: Error when clustering
    Galder Zamarreño Master

    Hmmmm, could you to mimic what you're doing in a unit test? I mean, from what I understand, your unit test should include:

    - start first cache with your config, and do the operations you've been doing in the 1st instance.

    - start second cache with your config and it should fail with that StateTransferException

     

    Note that if you mark shared in your cache loader, the cache assumes that both caches are directed to the same file path. If you want them to point to different paths, shared should be set to false. See the configuration documentation in http://infinispan.sourceforge.net/4.0/apidocs/config.html#ce_loaders_loader

  • 6. Re: Error when clustering
    Mihir Patel Novice

    Hello,

     

    Any update on this issue? I am getting the same exception.

     

    Caused by: java.io.StreamCorruptedException: Unexpected byte found when reading an object: 0
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:708)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207)
        at org.jboss.marshalling.river.RiverUnmarshaller.readFields(RiverUnmarshaller.java:1637)
        at org.jboss.marshalling.river.RiverUnmarshaller.doInitSerializable(RiverUnmarshaller.java:1553)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadNewObject(RiverUnmarshaller.java:1202)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:270)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207)
        at org.jboss.marshalling.river.RiverUnmarshaller.readFields(RiverUnmarshaller.java:1637)
        at org.jboss.marshalling.river.RiverUnmarshaller.doInitSerializable(RiverUnmarshaller.java:1553)
        at org.jboss.marshalling.river.RiverUnmarshaller.doInitSerializable(RiverUnmarshaller.java:1517)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadNewObject(RiverUnmarshaller.java:1202)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:270)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207)
        at org.jboss.marshalling.AbstractUnmarshaller.readObject(AbstractUnmarshaller.java:85)
        at org.infinispan.marshall.exts.ReplicableCommandExternalizer.readObject(ReplicableCommandExternalizer.java:64)
        at org.infinispan.marshall.jboss.ConstantObjectTable$ExternalizerAdapter.readObject(ConstantObjectTable.java:264)
        at org.infinispan.marshall.jboss.ConstantObjectTable.readObject(ConstantObjectTable.java:251)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:357)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207)
        at org.jboss.marshalling.AbstractUnmarshaller.readObject(AbstractUnmarshaller.java:85)
        at org.infinispan.marshall.exts.ReplicableCommandExternalizer.readObject(ReplicableCommandExternalizer.java:64)
        at org.infinispan.marshall.jboss.ConstantObjectTable$ExternalizerAdapter.readObject(ConstantObjectTable.java:264)
        at org.infinispan.marshall.jboss.ConstantObjectTable.readObject(ConstantObjectTable.java:251)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:357)
        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207)
        at org.jboss.marshalling.AbstractUnmarshaller.readObject(AbstractUnmarshaller.java:85)
        at org.infinispan.marshall.jboss.JBossMarshaller.objectFromObjectStream(JBossMarshaller.java:207)
        at org.infinispan.marshall.VersionAwareMarshaller.objectFromByteBuffer(VersionAwareMarshaller.java:109)
        at org.infinispan.remoting.transport.jgroups.MarshallerAdapter.objectFromByteBuffer(MarshallerAdapter.java:26)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:148)
        ... 27 more

    Caused by: an exception which occurred:
        in field authentication
        in object of type org.jasig.cas.ticket.TicketGrantingTicketImpl
        in field ticketGrantingTicket
        in object of type org.jasig.cas.ticket.ServiceTicketImpl

     

     

    I am using Ja-sig's CAS version 3.3.5 and trying to add Infinispan 4.0.0.FINAL support to that and following test case is failing:

     

    @Test
        public void testPutTicket() {
            Principal principal = new SimpleWebApplicationServiceImpl("someurl");
            Authentication authentication = new ImmutableAuthentication(principal);
            TicketGrantingTicket tgt = new TicketGrantingTicketImpl("testId", authentication, new TimeoutExpirationPolicy(3600000));
            this.registry1.addTicket(tgt);
            ServiceTicket st = tgt.grantServiceTicket("ST-ID", new SimpleWebApplicationServiceImpl("CASifiedAppUrl"), new MultiTimeUseOrTimeoutExpirationPolicy(1, 300000), Boolean.TRUE);
            this.registry1.addTicket(st);
        }

    Surprisingly, this.registry1.addTicket(tgt) operation succeeds whereas this.registry.addTicket(st) fails in deserializing TicketGrantingTicket inside ServiceTicket.

     

    I debugged RiverMarshaller and RiverUnmarshaller object's doWrite, doWriteFields, doInitSerializable & doRead, doReadFields, doWriteSerializableObject methods and realized one difference between marshaller and unmarshaller. I feel that the code in doInitSerializable & doWriteSerializableObject is causing the issue as the marshaller serializes the parent class's members first and then the child class's members second whereas unmarshaller skips parent and tries to deserialize fields of child class. Marshaller serializes fields of AbstractTicket first and then fields of child class TicketGrantingTicketImpl whereas unmarshaller tries to deserialze fields of TicketGrantingTicketImpl without deserializing fields of AbstractTicket.

     

    Something to do with following lines of code:

     

    RiverMarshaller.java:

    protected void doWriteSerializableObject(final SerializableClass info, final Object obj, final Class<?> objClass) throws IOException {

            final Class<?> superclass = objClass.getSuperclass();
            if (Serializable.class.isAssignableFrom(superclass)) {
                doWriteSerializableObject(registry.lookup(superclass), obj, superclass);
            }

            :

            :

    }

     

    RiverUnmarshaller.java:

    private void doInitSerializable(final Object obj, final SerializableClassDescriptor descriptor) throws IOException, ClassNotFoundException {
            final Class<?> type = descriptor.getType();
            final SerializableClass info = registry.lookup(type);
            final ClassDescriptor superDescriptor = descriptor.getSuperClassDescriptor();
            if (superDescriptor instanceof SerializableClassDescriptor) {
                final SerializableClassDescriptor serializableSuperDescriptor = (SerializableClassDescriptor) superDescriptor;
                doInitSerializable(obj, serializableSuperDescriptor);
            }

            :

            :

    }

     

    AbstractTicket's ClassDescriptor is not an instance of SerializableClassDescriptor hence, deserialization of AbstractTicket's fields are skipped (?) and just TicketGrantingTicketImpl's fields are being deserialized which results into corrupted input stream.

     

    FWIW, we are currently using JBossCache 3.2.1.GA and we have no such issue with it.

    Let me know if you have any questions and I can be of any help to resolve the issue quicker (assuming that what I am saying above is the real issue).

     

    Thanks,

    Mihir

  • 7. Re: Error when clustering
    Mihir Patel Novice

    Here is the TestCase which fails.

     

    ----------------------------------------

    InfinispanCacheTests.java

    ----------------------------------------

    import junit.framework.TestCase;

     

    import org.infinispan.Cache;
    import org.infinispan.manager.CacheManager;
    import org.infinispan.manager.DefaultCacheManager;
    import org.junit.Test;
    import org.springframework.core.io.ClassPathResource;
    import org.springframework.core.io.Resource;

     

    public class InfinispanCacheTests extends TestCase {
        CacheManager cacheManager1;
        CacheManager cacheManager2;
        Cache<String, Object> cache1;
        Cache<String, Object> cache2;

     

        @Override
        protected void setUp() throws Exception {
            Resource resource = new ClassPathResource("infinispanTicketCache.xml");
            cacheManager1 = new DefaultCacheManager(resource.getInputStream());
            cache1 = cacheManager1.getCache();
           
            cacheManager2 = new DefaultCacheManager(resource.getInputStream());
            cache2 = cacheManager2.getCache();
        }
       
        @Test
        public void testPut() {
            Child1 child1Obj = new Child1(1234, "1234");
            Child2 child2Obj = new Child2(2345, "2345", child1Obj);

            this.cache1.put(child1Obj.getId(), child1Obj); //Works fine as the Child object inside Parent is NULL
            this.cache1.put(child2Obj.getId(), child2Obj); //Fails because Parent has object which is Child of its own
        }
       
    }

     

    ----------------------------------------

    Parent.java

    ----------------------------------------

    public class Parent implements Serializable {
        private String id;
        private Child1 child1Obj;
       
        public Parent(String id, Child1 child1Obj) {
            this.id = id;
            this.child1Obj = child1Obj;
        }
       
        public String getId() {
            return id;
        }
        public Child1 getChild1Obj() {
            return child1Obj;
        }
    }

     

    ----------------------------------------

    Child1.java

    ----------------------------------------

    public class Child1 extends Parent {
        private int someInt;
       
        public Child1(int someInt, String parentStr) {
            super(parentStr, null);
            this.someInt = someInt;
        }
       
    }

     

    ----------------------------------------

    Child2.java

    ----------------------------------------

    public class Child2 extends Parent {
        private int someInt;
       
        public Child2(int someInt, String parentStr, Child1 child1Obj) {
            super(parentStr, child1Obj);
            this.someInt = someInt;
        }
    }

     

    Note: See attached cache config. sync replTimeout is set high just so that I have more time to debug.

     

    I found that RiverMarshaller.writeKnownClass passes relative difference to get the class from the classCache with ID_REPEAT_CLASS_NEAR instead of serializing it again, on the unmarshaller side, the ID_REPEAT_CLASS_NEAR tries to read the Class information from the classCache but the problem is that the Parent class is still being "understood" and the reference to it is "IncompleteClassDescriptor" which gets used as the superDescriptor for Child1 (which is also part of Parent) which is causing the issue and following if condition return false and corrupts stream.

     

    RiverUnmarshaller.java

    private void doInitSerializable(final Object obj, final SerializableClassDescriptor descriptor) throws IOException, ClassNotFoundException {
            final Class<?> type = descriptor.getType();
            final SerializableClass info = registry.lookup(type);
            final ClassDescriptor superDescriptor = descriptor.getSuperClassDescriptor();
            if (superDescriptor instanceof SerializableClassDescriptor) { //<== superDescriptor is IncompleteClassDescriptor as it was read from classCache while it was still being interpreted from input stream
                final SerializableClassDescriptor serializableSuperDescriptor = (SerializableClassDescriptor) superDescriptor;
                doInitSerializable(obj, serializableSuperDescriptor);
            }

            :

            :

    }

     

    Anyone can confirm my understanding?

     

    Thanks,

    Mihir

  • 8. Re: Error when clustering
    Galder Zamarreño Master

    Mihir, thanks for attaching the unit test. I'm currently looking into it.

  • 9. Re: Error when clustering
    Galder Zamarreño Master

    Mihir, this appears to be a bug, see  https://jira.jboss.org/jira/browse/JBMAR-106. Your suspicions look correct as well. We'll address the issue asap.

  • 10. Re: Error when clustering
    Mihir Patel Novice

    Thanks Galder. Any idea as to when this issue might get fixed? Asking so that I can plan moving to infinispan accordingly. Not sure how common is to have such class structure!

  • 11. Re: Error when clustering
    Manik Surtani Master

    You should create an ISPN jira for this, and link it to the JBMAR one.  And vote for it.    It will certainly get in for 4.1.0, whether we release a 4.0.1 with this fix is what we need to think about, based on how quick JBMAR gets patched.  What are your timescales?

  • 12. Re: Error when clustering
    Mihir Patel Novice

    Hi Manik,

     

    Our current deployment is working fine with JBossCache 3.2.1.GA, but in the later part of our implementation I realized that Infinispan is the better choice to use for future and I was really excited to try out the Distributed Replication in Infinispan as it promises to have scalability as number of nodes in the cluster increases. We are using jasig's CAS with JBossCache but I see the limitation of having same copy of data on all servers, and exactly as you mentioned in one of your article about distributed replication in Infinispan, we would not get more RAM to store more data as we increase nodes, and I would like to get rid of that limitation by trying out Infinispan's distributed replication with may be 2 or 4 num copies.

     

    Once I test out Infinispan, plan is to add Infinispan support to jasig's CAS and we don't have any hard deadlines there as well, but it would be good to have this issue fixed in a month or two.

     

    I created https://jira.jboss.org/jira/browse/ISPN-389 (left the fix version blank) as you suggested but cannot vote as it says "You cannot vote for an issue you have reported".

     

    Thanks,

    Mihir

  • 13. Re: Error when clustering
    Galder Zamarreño Master

    Mihi, I made a note in the JIRA with instructions on how to get the fix. Basically, the JBMAR issue is closed and fixed, so simply build the 1.2.x branch of JBMAR and you'll have the fix.

  • 14. Re: Error when clustering
    Mihir Patel Novice

    Great. I built jar from the 1.2 branch and issue seems to be fixed now.

    Thanks for all your help.