Issue with JMS message loss on JBoss shutdown
rwest_bsg Apr 1, 2010 2:38 PMI've been working on developing an approach to improve our JMS infrastructure. At present, we still use the default JMS broker internal to JBoss for our main production application, in spite of having 10 servers running for different purposes and clients. None are clustered, so each server individually processes all of the messages that server publishes. Additionally, we have additional products that are seeing a need to use a messaging infrastructure, and anticipate the need to have our various products communicating with each other using a messaging infrastructure.
As a result, I've been putting together an approach to implement an external messaging broker that our Java and non-Java systems can share, ensuring HA for the level of reliability we need for our applications. I've been evaluating both HornetQ and ActiveMQ for these purposes, experimenting with configurations, failover, reliability, ease of monitoring, etc. However, I've come across a rather difficult problem with both, which I suspect may have as much or more to do with JBoss than either of the broker implementations (hence my post here).
In both cases, for one of my tests I configured a single standalone external broker with a standard JCA in JBoss. I deploy (among other things) a simple MDB that receives a text message, grabs a couple of properties along with the text, and simply logs the message. The bean uses AUTO_ACKNOWLEDGE and container managed transactions, with a persistent queue. I have a simple web page that allows me to publish an arbitrary number of messages at a time. For some simplistic variability in the processing speed, all even messages take an extra second to process (via a Thread.sleep() call).
When I start up the broker and JBoss and publish messages, everything runs fine. However, if I shut down JBoss after all messages have been published but before all messages have been processed, during the shutdown process I will receive some number of errors from the broker's JCA code indicating an error trying to deliver a message to the MDB subsystem because the EJB container is shutting down/shut down. The number of errors can vary slightly due to timing issues, and obviously is slightly different between ActiveMQ and HornetQ, but the implication is the same in both cases: every message that results in one of these errors is lost. They are not sent back to the broker to be tried again, but are treated as if the message was properly acknowledged.
If only one broker had an issue, I would suspect an error in that broker's JCA implementation. However, since both brokers fail in nearly the exact same way (the only practical difference seems to be that ActiveMQ either grabs more messages at once or is faster at grabbing more, resulting in most cases in losing all of the messages, as opposed to only 1-4 lost with HornetQ), it seemed to make more sense to start out here. I can cross-post to the broker's forums if people feel that is necessary.
I've attached two tar files, one for HornetQ and one for ActiveMQ. Both contain the broker configuration used, everything that was deployed to the JBoss deploy directory in terms of the JCA functionality, the JBoss server log from a representative test run and Eclipse projects for my testing ear with full source code. The only material bit of JBoss config not contained in the tar files is that I added "-Dorg.hornetq.logger-delegate-factory-class-name=org.hornetq.integration.logging.Log4jLogDelegateFactory" to JAVA_OPTS in run.sh along with DEBUG logging for org.hornetq and WARN logging for org.hornetq.utils.UTF8Util in the jboss-log4j.xml config file (UTF8Util is extremely spammy in terms of logging, and doesn't seem to be relevant for this test). Without that configuration, the logging details indicating the messages were failed to be delivered will not display (probably because they're going to a hidden/not configured JUL logger).
I'm hoping there's something simple wrong with my configuration, but I've spent two days trying various things and haven't been able to track anything down as of yet. I've tried clustered and non-clustered brokers, I've tried tweaking various ActivationConfig properties and JCA properties, setting values that are supposed to be the default and tweaking prefetch sizes and the like, but nothing has had any impact on this issue. I could probably be content with some duplicate delivery, but missing messages are not an option for the kinds of messages we're passing around. If I've missed any potentially relevant configuration files, please let me know.
Version info:
JBoss 5.1.0 GA (we use a local build, and have updated JBoss Cache to 3.2.1.GA, JBossTS to 4.6.1.GA_CP03 and JGroups to 2.6.13.GA to pick up bug fixes for issues we have encountered in production)
ActiveMQ 5.3.0 GA (tried 5.3.1 GA, but there is a mutex bug that was preventing MDB processing from picking up after a broker failover)
HornetQ 2.0.0 GA
Java 1.6.0_17
Been runinng my tests on Mac OS 10.6.2. Can get them run on a Linux distro if someone thinks that might be relevant. We use CentOS in production, so that's the eventual target platform.
-
hornetq-error.tar 975.0 KB
-
activemq-error.tar 7.6 MB