-
1. Re: 5.1.x questions regarding error handling and behaviour
galder.zamarreno Nov 7, 2011 10:01 AM (in response to tfromm)Re 1) Maybe NodeB dropped? We'd have to see some logs with TRACE on org.infinispan package to verify.
Re 2a) First of all, why are you casting to CacheImpl? You shouldn't need to do that. AdvancedCache, which you retrieve via cache.getAdvancedCache() has the lock() API you're after.
Maybe you can try running with a single local node and a JDBC cache store, and try to get some thread dumps after modification and again, some TRACE logs... Please also post the config you're using.
Re 2b) Those deadlock exceptions might be due to modification of the same entry from different nodes. This problem is going away in Infinispan 5.1 because we'll only acquire locks on a single node in the cluster: http://community.jboss.org/wiki/SingleNodeLockingModel
I dunno what those NotSupportedExceptions are about. Post the stacktrace or log file.
-
2. Re: 5.1.x questions regarding error handling and behaviour
tfromm Nov 8, 2011 1:51 AM (in response to galder.zamarreno)Sorry, for not attaching these informations :-)
The configuration for all nodes is attached. I use 5.1.0 Beta3
To 1)
The trace to problem 1 is attached as 1.log, the lines starting with "View changed" is where I write to System.out the events I get at the listener.
As you can see, these events appear very early when I start the 2nd node. No more events are fired.
To 2) I changed it to getAdvancedCache().lock(..)
To 2a) I've attached the E01.java and the E01.xml as configuration. In the following configuration, the E01 hangs.
It will work if you remove the transport-element and/or use a local loader e.g. h2 mem or file.
<clustering mode="local"/> for LOCALCS cache changes also nothing.
Note: I tested it today with Mysql and there it works. It seems to be an oracle related issue.
To 2b) Yes, this node should be modified from different nodes. Then I'll wait for BETA 4 and hope the best :-) I thought 1st the option eagerLockSingleNode was mentioned with that.
-
1.log.zip 3.2 KB
-
infinispan.xml 10.7 KB
-
E01.java.zip 505 bytes
-
E01.xml 4.0 KB
-
E01.log.zip 1.5 KB
-
-
3. Re: 5.1.x questions regarding error handling and behaviour
galder.zamarreno Nov 8, 2011 3:39 AM (in response to tfromm)Re 1) What other events are you expecting? A new view is installed with obelix-39463 and then this node leaves (I dunno why, you'd need TRACE logging on org.jgroups to find that out exactly). Not other views are set.
Re 2a) As said earlier, to figure out what's wrong with oracle, we need a log with TRACE on org.infinispan *and* thread dumps (i.e. kill -3 <pid>) when the system hangs
Re 2b) The alternative at the moment, till single lock owner is in, is to retry operations/transactions.
-
4. Re: 5.1.x questions regarding error handling and behaviour
tfromm Nov 8, 2011 5:11 AM (in response to galder.zamarreno)1) Traces infinispan+jgroups+example source are added E03*
Inside the trace I see, that the new node joins and leaves the cluster short time after. For these things I get events..
To the time of event with viewId=2 the cluster size is 1. Few seconds later, the cluster size is 2, but I dont have received a event for that.
The startup of the 1st node finishes at
2011-11-08 11:01:32,585 [DEBUG] org.infinispan.CacheImpl - Started cache DISTCS on obelix-11061
2a) Attached the Threaddump. E01.dmp.zip
2b) Ok.
-
E01.dmp.zip 3.1 KB
-
E03.xml 1.4 KB
-
E03.java.zip 812 bytes
-
E03.log.zip 7.9 KB
-
-
5. Re: 5.1.x questions regarding error handling and behaviour
galder.zamarreno Nov 9, 2011 5:31 AM (in response to tfromm)Re 1) The reason they split is because FD_SOCK cannot open a socket between the two nodes:
2011-11-08 11:02:02,191 [DEBUG] org.jgroups.protocols.FD_SOCK - could not create socket to obelix-2176
So, it thinks that the other node is down. You can either disable the firewall for these tests, or tie the socket to particular port that's open in both nodes.
So, what happens afterwards is that they merge:
2011-11-08 11:02:09,272 [DEBUG] org.jgroups.protocols.pbcast.GMS - obelix-11061: view is MergeView::[obelix-2176|3] [obelix-2176, obelix-11061], subgroups=[[obelix-11061|1] [obelix-2176], [obelix-11061|2] [obelix-11061]]
To deal with merges, you need to handle @Merged too, so you could do:
@Merged
@ViewChanged
public void handleViewChange(final ViewChangedEvent e) {
....
Re 2a) The thread dump seems to indicate that Infinispan is reading an entry from a binary stream connecting to the database. How big are the objects you're storing? Could you get several thread dumps? I.e. every 30 seconds or so.
-
6. Re: 5.1.x questions regarding error handling and behaviour
tfromm Nov 9, 2011 9:48 AM (in response to galder.zamarreno)