Distributed Cache - Join Aborts
daver32 Jul 1, 2011 4:04 PMWe are running Infinispan 4.2.1.FINAL within JBoss AS 5.1.0.GA on two systems. Our application accepts JMS messages and places them in the Infinispan cache. Listeners to the cache on each system awaken and try to lock the just added node, process it and delete it. The plan is to be handling many JMS messages per second.
The trouble we've had is that upon the arrival of the first JMS message, Infinispan's attempt to Join the two instances fails, logging:
2011-07-01 12:54:22,876 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (WorkManager(2)-3) Starting JGroups Channel
2011-07-01 12:54:22,907 INFO [org.jgroups.JChannel] (WorkManager(2)-3) JGroups version: 2.12.0.CR5
2011-07-01 12:54:22,907 DEBUG [org.jgroups.conf.ClassConfigurator] (WorkManager(2)-3) Using jg-magic-map.xml as magic number file and jg-protocol-ids.xml for protocol IDs
2011-07-01 12:54:23,704 DEBUG [org.jgroups.protocols.FD] (Timer-1,xxx.xxx.xxx.61:7900) sending are-you-alive msg to xxx.xxx.xxx.111:7900 (own address=xxx.xxx.xxx.61:7900)
2011-07-01 12:54:23,766 DEBUG [org.jgroups.protocols.FD] (Timer-6,xxx.xxx.xxx.61:1474) sending are-you-alive msg to xxx.xxx.xxx.111:1660 (own address=xxx.xxx.xxx.61:1474)
2011-07-01 12:54:23,766 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) changed role to org.jgroups.protocols.pbcast.ClientGmsImpl
2011-07-01 12:54:23,797 DEBUG [org.jgroups.stack.Configurator] (WorkManager(2)-3) set property TCP.diagnostics_addr to default value /224.0.75.75
2011-07-01 12:54:23,813 DEBUG [org.jgroups.protocols.FRAG2] (WorkManager(2)-3) received CONFIG event: {bind_addr=/xxx.xxx.xxx.61}
2011-07-01 12:54:23,813 DEBUG [org.jgroups.protocols.MPING] (WorkManager(2)-3) bind_addr=/xxx.xxx.xxx.61 mcast_addr=/229.11.12.12, mcast_port=45699
2011-07-01 12:54:23,829 DEBUG [org.jgroups.protocols.FRAG2] (WorkManager(2)-3) received CONFIG event: {flush_supported=true}
2011-07-01 12:54:23,844 WARN [org.jgroups.protocols.UDP] (Incoming-19,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.111:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded
2011-07-01 12:54:23,844 WARN [org.jgroups.protocols.UDP] (Incoming-16,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.61:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded
2011-07-01 12:54:24,172 DEBUG [org.jgroups.protocols.FD] (Timer-10,xxx.xxx.xxx.61:1474) sending are-you-alive msg to xxx.xxx.xxx.111:1660 (own address=xxx.xxx.xxx.61:1474)
2011-07-01 12:54:24,219 DEBUG [org.jgroups.protocols.FD] (Timer-6,xxx.xxx.xxx.61:1474) sending are-you-alive msg to xxx.xxx.xxx.111:1660 (own address=xxx.xxx.xxx.61:1474)
2011-07-01 12:54:25,360 WARN [org.jgroups.protocols.UDP] (Incoming-18,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.61:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded
2011-07-01 12:54:26,860 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) initial_mbrs are [own_addr=SYSB-38243, view id=null, is_server=false, is_coord=false, logical_name=SYSB-38243, physical_addrs=xxx.xxx.xxx.111:1234]
2011-07-01 12:54:26,860 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) election results: {}
2011-07-01 12:54:27,016 WARN [org.jgroups.protocols.UDP] (Incoming-1,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.111:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded
2011-07-01 12:54:27,360 WARN [org.jgroups.protocols.UDP] (Incoming-20,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.61:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded
2011-07-01 12:54:27,360 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) initial_mbrs are [own_addr=SYSB-38243, view id=[SYSB-38243|0], is_server=true, is_coord=true, logical_name=SYSB-38243, physical_addrs=xxx.xxx.xxx.111:1234]
2011-07-01 12:54:27,360 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) election results: {SYSB-38243=1}
2011-07-01 12:54:27,360 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) sending handleJoin(SYSA-18253) to SYSB-38243
2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.NAKACK] (WorkManager(2)-3)
[setDigest()]
existing digest: []
new digest: SYSB-38243: [0 : 2 (2)], SYSA-18253: [0 : 0]
resulting digest: SYSB-38243: [0 : 2 (2)], SYSA-18253: [0 : 0 (0)]
2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) [SYSA-18253]: JoinRsp=[SYSB-38243|1] [SYSB-38243, SYSA-18253] [size=2]
2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) new_view=[SYSB-38243|1] [SYSB-38243, SYSA-18253]
2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) SYSA-18253: view is [SYSB-38243|1] [SYSB-38243, SYSA-18253]
2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.FD_SOCK] (WorkManager(2)-3) VIEW_CHANGE received: [SYSB-38243, SYSA-18253]
2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.STABLE] (WorkManager(2)-3) [ergonomics] setting max_bytes to 800KB (2 members)
2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,GSIN_DJR_WAS_HERE,SYSA-18253) ping_dest is SYSB-38243, pingable_mbrs=[SYSB-38243, SYSA-18253]
2011-07-01 12:54:27,438 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (WorkManager(2)-3) Received new cluster view: [SYSB-38243|1] [SYSB-38243, SYSA-18253]
2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.FLUSH] (WorkManager(2)-3) SYSA-18253: installing view [SYSB-38243|1] [SYSB-38243, SYSA-18253]
2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) SYSA-18253:
2011-07-01 12:54:27,532 DEBUG [org.jgroups.protocols.pbcast.FLUSH] (Incoming-2,GSIN_DJR_WAS_HERE,SYSA-18253) SYSA-18253: received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
2011-07-01 12:54:27,532 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (WorkManager(2)-3) Cache local address is SYSA-18253, physical addresses are [xxx.xxx.xxx.61:1234]
2011-07-01 12:54:27,532 INFO [org.infinispan.factories.GlobalComponentRegistry] (WorkManager(2)-3) Infinispan version: Infinispan 'Ursus' 4.2.1.FINAL
2011-07-01 12:54:27,563 ERROR [org.infinispan.distribution.JoinTask] (Rehasher-SYSA-18253) Caught exception! Aborting join.
org.infinispan.CacheException: java.lang.RuntimeException: Failure to marshal argument(s)
at org.infinispan.util.Util.rewrapAsCacheException(Util.java:468)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:118)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:407)
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:102)
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:126)
at org.infinispan.distribution.JoinTask.retrieveOldConsistentHash(JoinTask.java:183)
at org.infinispan.distribution.JoinTask.getPermissionToJoin(JoinTask.java:66)
at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:86)
at org.infinispan.distribution.RehashTask.call(RehashTask.java:75)
at org.infinispan.distribution.RehashTask.call(RehashTask.java:41)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Failure to marshal argument(s)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.marshallCall(CommandAwareRpcDispatcher.java:210)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:248)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:116)
... 13 more
Caused by: java.lang.NullPointerException
at org.infinispan.marshall.jboss.GenericJBossMarshaller$1.initialValue(GenericJBossMarshaller.java:75)
at org.infinispan.marshall.jboss.GenericJBossMarshaller$1.initialValue(GenericJBossMarshaller.java:71)
at java.lang.ThreadLocal.setInitialValue(Unknown Source)
at java.lang.ThreadLocal.get(Unknown Source)
at org.infinispan.marshall.jboss.GenericJBossMarshaller.startObjectOutput(GenericJBossMarshaller.java:130)
at org.infinispan.marshall.VersionAwareMarshaller.startObjectOutput(VersionAwareMarshaller.java:126)
at org.infinispan.marshall.VersionAwareMarshaller.objectToBuffer(VersionAwareMarshaller.java:91)
at org.infinispan.marshall.AbstractMarshaller.objectToBuffer(AbstractMarshaller.java:31)
at org.infinispan.remoting.transport.jgroups.MarshallerAdapter.objectToBuffer(MarshallerAdapter.java:22)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.marshallCall(CommandAwareRpcDispatcher.java:208)
... 15 more
One type of error that jumps out relates to the Incoming-1,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.111:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded
We've not had this the entire time, but we've been trying many different configuation parameters. Can anyone tell us what we need to do to get rid of this?
Clearly, the big question is why is the Join failing? Here is our configuration information:
GlobalConfiguration gc = GlobalConfiguration.getClusteredDefault();
gc.setClusterName( "DJR_WAS_HERE" );
Configuration c = new Configuration();
c.setCacheMode( Configuration.CacheMode.DIST_SYNC);
Properties p = new Properties();
p.setProperty("configurationFile", "jgroups-tcp.xml");
gc.setTransportProperties(p);
c.setTransactionManagerLookupClass(
"org.infinispan.transaction.lookup.GenericTransactionManagerLookup" );
EmbeddedCacheManager cm = new DefaultCacheManager( gc, c );
We start JBoss with:
%JBOSS_HOME%\bin\run.bat -c all -b xxx.xxx.xxx.61 -Djgroups.udp.mcast_addr=229.11.12.12 -Djgroups.udp.mcast_port=45699 -Djgroups.tcp.address=xxx.xxx.xxx.61 -Djgroups.tcp.port=1234 -g PROJ_Cluster_1_Partition -Djboss.service.binding.set=ports-01 -Djboss.messaging.ServerPeerID=67
We have also verified that multicasting is working via the testers:
C:\jboss-5.1.0.GA>java -cp server\all\lib\jgroups.jar org.jgroups.tests.McastReceiverTest -mcast_addr 229.11.12.12 -port 45699 C:\jboss-5.1.0.GA>java -cp server\all\lib\jgroups.jar org.jgroups.tests.McastSenderTest -mcast_addr 229.11.12.12 -port 45699
The receivers saw the input from the sender on both nodes in the cluster very quickly and without issue.