2 Replies Latest reply on Oct 3, 2007 3:30 PM by brian.stansberry

    NestedBeanUnitTestCase failure with buddy replication enable

    brian.stansberry

      Carlo,

      Re the Branch_4_2 failure.

      This is a can of worms.

      Worm 1)

      The test bean tracks when @PrePassivate and @PostActivate are called by incrementing counters; test checks for expected values as replication/passivation occur.

      Problem: if @Remote bean is accessed on node0 and then passivated, counter is incremented in @PrePassivate. But the passivation does not trigger replication; updated counter does not get replicated. So, if the call to check the passivation count gets executed on node1, the counter is incorrect and we get assertion failure. This is the fundamental cause of the intermittent failures in hudson.

      Solution: make sure the calls all happen on the same node.

      Question: since an @PrePassivate call could affect non-transient bean state, should we replicate the bean state after the @PrePassivate but before the actual passivation? Note this is only an issue if the bean is configured to disable the @PrePassivate call before replication; in the default case we don't call @PrePassivate before passivation since we already called it on the last request before replication. This is more a question for the new impl in trunk; don't think it's doable in Branch_4_2. I vote -0; seems like a real edge case.

      Worm 2)

      With buddy replication enabled, when data is replicated, JBC stores it on the remote nodes in a separate "backup" area of the cache tree. The JBC passivation code does not work properly in this area:

      http://jira.jboss.com/jira/browse/JBCACHE-1190
      http://jira.jboss.com/jira/browse/JBCACHE-1192

      Not fixable for 4.2.2 unless the webservice change screws everything up so bad that it gets delayed a long time. *Must* be fixed for EAP 4.2.0.CP02.

      Impact on 4.2.2:

      a) Buddy replication isn't the default and isn't even mentioned in the cache config; people using it will have to go out of their way to set it up.
      b) If they do use buddy replication, JBCACHE-1190 means the backup copies of SFSBs will never get passivated from the cache to disk.
      c) By default, SFSBs have a remove timeout of 0; container never removes them. This combined with b) == memory leak.
      d) Workaround to this is for me to change the std cache config so it will evict things from the "backup" region based on a global set of rules. JBCACHE-1192 means that if beans are evicted that could cause problems. So, my change would configure the eviction to only occur if a node is idle for a long time (say 30 mins). If a bean is inactive longer than that, and the client makes another request, AND that request fails over, then they'll get NoSuchEJBException.