1 2 3 4 Previous Next 57 Replies Latest reply on Jul 22, 2010 2:55 AM by azserve.luca Go to original post
      • 15. Re: IllegalStateException when loading journal on startup
        bjchip

        I am going to chime in here as I have had this happen to me several times.  Perhaps the additional context will help.

         

        Java 1.6

        JBoss 4.2.3.GA

        Hornetq 2.0.0.GA inside the JBoss.  The datastore is local, not shared.

         

        The two instances of JBoss are clustered. 

         

        Initially this error appeared while the Hornetq was configured as a cluster but the multicast addresses were blocked without our knowing of it.

         

        Renamed the clusters and wiped the data stores.  It was difficult to correct.   The proximate cause was faulty communications between two instances.

         

        The most recent occurrence is with the Hornetq configured  live-backup.    For convenience JBoss1 HQ(live)  and JBoss 2 HQ(backup).

         

        The JBoss apps are load-balanced and JBoss 2 is using the HQ on JBoss 1.  HAJNDI.

         

        The network hiccupped.  I am still trying to discover just what the network logs say.  JBoss 2 HQ logged several warnings about network connectivity.  There were other problems with JBoss 1 and the logs there are not readable.  The server was however, running.

         

         

        2010-05-27 09:12:17,729 WARN  [org.hornetq.core.remoting.impl.RemotingConnectionImpl] Connection failure has been detected: Did not receive data from server for org.hornetq.integration.transports.netty.NettyConnection@1fc0b1db[local= /10.65.90.230:43743, remote=/10.65.90.229:5445] [code=3]

        .

        .

        2010-05-27 09:12:17,775 WARN  [org.hornetq.core.remoting.impl.RemotingConnectionImpl] Connection failure has been detected: Did not receive data from server for org.hornetq.integration.transports.netty.NettyConnection@4fd2e906[local= /10.65.90.230:43744, remote=/10.65.90.229:5445] [code=3]

        2010-05-27 09:12:17,778 WARN  [org.hornetq.core.remoting.impl.RemotingConnectionImpl] Connection failure has been detected: Did not receive data from server for org.hornetq.integration.transports.netty.NettyConnection@2a5aac05[local= /10.65.90.230:35233, remote=/10.65.90.229:5445] [code=3]

        2010-05-27 09:12:17,818 WARN  [org.hornetq.core.remoting.impl.RemotingConnectionImpl] Connection failure has been detected: Did not receive data from server for org.hornetq.integration.transports.netty.NettyConnection@23dcb126[local= /10.65.90.230:35228, remote=/10.65.90.229:5445] [code=3]

         

        This was followed by:

        .

        .

         

        2010-05-27 09:12:17,910 DEBUG [com.hyro.hibernate.genericdao.impl.GenericDaoHibernateImpl] Looking up the Hibernate session bo
        und to the current thread via the key: org.hibernate.impl.SessionFactoryImpl@4c48d0c9
        2010-05-27 09:12:17,916 ERROR [STDERR] May 27, 2010 9:12:17 AM org.hornetq.core.logging.impl.JULLogDelegate info
        INFO: Activating backup server

        2010-05-27 09:12:17,916 INFO  [org.hornetq.core.persistence.impl.journal.JournalStorageManager] Using AIO Journal

        .

        .(pay no attention to the fragmentation, I cannot get files from these servers directly without shutting down other servers)

        .

        010-05-27 09:12:20,031 ERROR [org.hornetq.core.server.impl.HornetQPacketHandler] Failed to reattach session

         

        java.lang.IllegalStateException: Cannot find message 6542867308

                at org.hornetq.core.persistence.impl.journal.JournalStorageManager.loadMessageJournal(JournalStorageManager.java:874)
                 at org.hornetq.core.server.impl.HornetQServerImpl.loadJournals(HornetQServerImpl.java:1246)
                 at org.hornetq.core.server.impl.HornetQServerImpl.initialisePart2(HornetQServerImpl.java:1114)
                 at org.hornetq.core.server.impl.HornetQServerImpl.checkActivate(HornetQServerImpl.java:964)
                 at org.hornetq.core.server.impl.HornetQServerImpl.reattachSession(HornetQServerImpl.java:551)
                 at org.hornetq.core.server.impl.HornetQPacketHandler.handleReattachSession(HornetQPacketHandler.java:169)
                 at org.hornetq.core.server.impl.HornetQPacketHandler.handlePacket(HornetQPacketHandler.java:82)
                 at org.hornetq.core.remoting.impl.ChannelImpl.handlePacket(ChannelImpl.java:466)
                 at org.hornetq.core.remoting.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:423)
                 at org.hornetq.core.remoting.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:389)
                 at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:432)
                 at org.hornetq.integration.transports.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:67)
                 at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:98)
                 at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
                 at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:796)
                 at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:358)
                 at org.hornetq.integration.transports.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:171)
                 at org.hornetq.integration.transports.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:136)
                 at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
                 at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
                 at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
                 at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
                at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
                 at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
                 at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
                 at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
                 at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
                 at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
                 at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:179)
                 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                 at java.lang.Thread.run(Unknown Source)

         

        2010-05-27 09:12:20,086 ERROR [org.hornetq.core.client.impl.ClientSessionImpl] Failed to handle failover
        HornetQException[errorCode=0 message=null]
                 at org.hornetq.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:280)
                 at org.hornetq.core.client.impl.ClientSessionImpl.handleFailover(ClientSessionImpl.java:863)
                 at org.hornetq.core.client.impl.FailoverManagerImpl.reconnectSessions(FailoverManagerImpl.java:785)
                 at org.hornetq.core.client.impl.FailoverManagerImpl.failoverOrReconnect(FailoverManagerImpl.java:682)
                 at org.hornetq.core.client.impl.FailoverManagerImpl.handleConnectionFailure(FailoverManagerImpl.java:548)
                 at org.hornetq.core.client.impl.FailoverManagerImpl.access$600(FailoverManagerImpl.java:69)
                 at org.hornetq.core.client.impl.FailoverManagerImpl$DelegatingFailureListener.connectionFailed(FailoverManagerImpl.java:1111)
                 at org.hornetq.core.remoting.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:445)

                at org.hornetq.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:250)

                at org.hornetq.core.client.impl.FailoverManagerImpl$PingRunnable$1.run(FailoverManagerImpl.java:1169)

                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

                at java.lang.Thread.run(Unknown Source)

         

         

        My theory at present, is that this message that is broken was originating on JBoss 2 and had the misfortune of encountering the network glitch.  (Again faulty communications between running instances of HQ)

         

        This resulted in it not being completely written anywhere but the attempt being embedded in the datastore. 

         

        That's just a swag.

         

        To date the only way I have been able to recover is by wiping the datastore.  I believe that the configuration used is itself a principle cause of this fault/vulnerability, but the inability to recover from it without very drastic measures does cause a lot of pain.

         

        I hope this helps suggest where to look. 

         

        I am going to look to configuration issues.  JBoss should be using a LOCAL copy of HQ as its primary resource somehow.   Clustered or else.   HA backups being separate from the cluster def. 

         

        respectfully

        BJ

        • 16. Re: IllegalStateException when loading journal on startup
          clebert.suconic

          This seems a different issue.

           

          I'm not sure what caused it. But the communication between live and backup needs to be reliable up to the point of the failure. You can't have hick ups between live and backup.

          • 17. Re: IllegalStateException when loading journal on startup
            bjchip

            It may be different but the ability to fix it in a more controlled/limited way would be really useful.

             

            I would think that the ways that this error can be triggered are numerous but the mechanism of the problem is the same one in all cases.  It thinks it should have a message that is not present.    There's a common thread ...  I think. .

             

             

            Question worth acking:

             

            What config would you recommend when the servers are more reliable than the net connection? 

             

             

             

            I am stuck at present because I am on a different project and can't even install an HAJNDI in my standalone HQ.   Not sure how to put more than one HQ into a JBoss server either.    It is 3 AM and I am still at work.  Ppphhhht!

             

            I'll come back tomorrow evening.

             

            respectfully

            BJ

            • 18. Re: IllegalStateException when loading journal on startup
              dfisher

              A brief explanation of our setup is probably in order:

              • JBoss instance with a topic and a core netty bridge, this is where message are enqueued
              • Another JBoss instance with several topics, one of which receives messages from the bridge. The listener on the bridged topic then enqueues messages on the other topics based on message content.

               

              What kind of load do you have?

              We're pushing ~10 messages per sec.

               

              what is the average size of your messages?

              We gzip an XML payload, they average around 10K.

               

              What partition type are you using where the journal is located?

              ext3 and we are using libaio

              • 19. Re: IllegalStateException when loading journal on startup
                dfisher

                Thanks for chiming in BJ.

                I've been unsuccessful at reproducing this in a simplified configuration and it may be that these issues are somehow related to network blips.

                The places we've seen this problem use a core netty bridge so I'll expand my configuration and see what happens.

                 

                Did you happen to save any instances of your corrupted data directory?

                If so you could test the latest code and see if your journal is recoverable.

                • 20. Re: IllegalStateException when loading journal on startup
                  timfox

                  Most likely cause of this is you have both live and backup activated and pointing at the same data directory at the same time

                  • 21. Re: IllegalStateException when loading journal on startup
                    dfisher

                    We are not using live-backup pairs.

                     

                    Wouldn't it be odd to configure both live and backup on the same host?

                    Just curious.

                    • 22. Re: IllegalStateException when loading journal on startup
                      timfox

                      They don't need to be on the same host.

                       

                      The journal could be located on a shared file system accessible by different nodes

                      • 23. Re: IllegalStateException when loading journal on startup
                        clebert.suconic

                        @dfisher:

                         

                        BTW: the journal directory must be on a real disk. No NFS please!

                         

                        There was a question a few posts ago about your load and what file system are you using:

                         

                        "What kind of load do you have?

                         

                        what is the average size of your messages?

                         

                        What partition type are you using where the journal is located?"

                        • 24. Re: IllegalStateException when loading journal on startup
                          clebert.suconic

                          Never mind.. I missed your answer.

                          • 25. Re: IllegalStateException when loading journal on startup
                            dfisher

                            BTW: the journal directory must be on a real disk. No NFS please!

                            Ok, we're using a local ext3 disk partition.

                            • 26. Re: IllegalStateException when loading journal on startup
                              dfisher

                              We saw this exception again, this time with TRACE logging.

                              Test run that enqueues 100K messages, JBoss shutdown with ~22K messages in the journal.

                               

                              HornetQ reports the following at shutdown:

                              2010-05-27 11:24:57,206 ERROR [org.hornetq.ra.inflow.HornetQMessageHandler] (Thread-20 (group:HornetQ-client-global-threads-799840866)) Failed to deliver message
                              java.lang.reflect.UndeclaredThrowableException
                                      at $Proxy275.onMessage(Unknown Source)
                                      at org.hornetq.ra.inflow.HornetQMessageHandler.onMessage(HornetQMessageHandler.java:259)
                                      at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:809)
                                      at org.hornetq.core.client.impl.ClientConsumerImpl.access$100(ClientConsumerImpl.java:46)
                                      at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:927)
                                      at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:96)

                              ...

                              Caused by: org.jboss.aop.DispatcherConnectException: EJB container is not completely started, or is stopped.
                                      at org.jboss.ejb3.BlockContainerShutdownInterceptor.invoke(BlockContainerShutdownInterceptor.java:62)

                              We see this exception ~30 times and as far as I know it's typical of any shutdown during message delivery.

                               

                              Mixed in with the previous exception is the following, which I've never noticed before:

                              2010-05-27 11:24:57,170 DEBUG [org.hornetq.ra.inflow.HornetQMessageHandler] (JBoss Shutdown Hook) Error releasing endpoint jboss.j2ee:ear=test-msg.ear,jar=test-messaging.jar,name=MessageBridgeBean,service=EJB3
                              java.lang.IllegalStateException: This message endpoint + MessageBridgeBean is already in use by another thread Thread[Thread-12 (group:HornetQ-client-global-threads-799840866),5,HornetQ-client-global-threads-799840866]
                                      at org.jboss.ejb3.mdb.inflow.MessageInflowLocalProxy.invoke(MessageInflowLocalProxy.java:117)
                                      at $Proxy275.release(Unknown Source)
                                      at org.hornetq.ra.inflow.HornetQMessageHandler.teardown(HornetQMessageHandler.java:199)
                                      at org.hornetq.ra.inflow.HornetQActivation.teardown(HornetQActivation.java:314)
                                      at org.hornetq.ra.inflow.HornetQActivation.stop(HornetQActivation.java:264)
                                      at org.hornetq.ra.HornetQResourceAdapter.endpointDeactivation(HornetQResourceAdapter.java:160)

                               

                              Then we see ~22K of this message:

                              2010-05-27 11:24:57,338 TRACE [org.hornetq.core.server.impl.ScheduledDeliveryHandlerImpl] (Thread-10 (group:HornetQ-remoting-threads46292925-656110261)) Scheduling delivery for Reference[2785815]:RELIABLE to occur at 1274973902338

                              Followed by normal shutdown logs:

                              2010-05-27 11:25:16,654 TRACE [org.hornetq.core.journal.impl.JournalImpl] (JBoss Shutdown Hook) Stopping the journal 2010-05-27 11:25:16,778 FINE  [org.hornetq.core.server.impl.HornetQServerImpl]
                              (JBoss Shutdown Hook) Waiting for java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@721173ee
                              2010-05-27 11:25:16,782 FINE  [org.hornetq.core.server.impl.HornetQServerImpl] (JBoss Shutdown Hook) Waiting for java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3fba3a29
                              2010-05-27 11:25:16,782 INFO  [org.hornetq.core.server.impl.HornetQServerImpl] (JBoss Shutdown Hook) HornetQ Server version 2.1.0.BETA3 (Hungry Hornet, 117) stopped

                               

                              Server appears to shutdown cleanly, then on startup the hornetq logs don't show anything unusual, but the jboss logs show:

                              2010-05-27 11:27:33,398 ERROR [org.jboss.kernel.plugins.dependency.AbstractKernelController] (main) Error installing to Start: name=JMSServerManager state=Create
                              java.lang.IllegalStateException: Cannot find message 1926725
                                      at org.hornetq.core.persistence.impl.journal.JournalStorageManager.loadM
                              essageJournal(JournalStorageManager.java:881)
                                      at org.hornetq.core.server.impl.HornetQServerImpl.loadJournals(HornetQSe
                              rverImpl.java:1206)
                                      at org.hornetq.core.server.impl.HornetQServerImpl.initialisePart2(Hornet
                              QServerImpl.java:1056)
                                      at org.hornetq.core.server.impl.HornetQServerImpl.start(HornetQServerImp
                              l.java:310)
                                      at org.hornetq.jms.server.impl.JMSServerManagerImpl.start(JMSServerManag
                              erImpl.java:233)

                               

                              @Clebert: let me know if you have any questions about specific trace logging.

                              I have a bunch, but I didn't notice anything before shutdown that looked unusual.

                              • 27. Re: IllegalStateException when loading journal on startup
                                clebert.suconic

                                Can you share the journal data with me for this?

                                • 28. Re: IllegalStateException when loading journal on startup
                                  timfox

                                  According to the logs, scheduled messages are being used.

                                   

                                  Can you provide a test program that replicates the case?

                                  • 29. Re: IllegalStateException when loading journal on startup
                                    bjchip

                                    Not sure how that would happen.   We don't share a data directory. the disks are local.  

                                     

                                    I am  trying to get better information about what happened on "JBoss 1" as it was doing other untoward things before it dropped us in the soup and some permission, but as there is private data in the messages on the corrupted data it is not likely.   It has never been seen (if I understand this properly) in any non-clustered and non-live-backup configuration.

                                     

                                    The half-connected configuration did a fair job of getting into this state.   I think dfisher is correct in surmising that the issue is related to network connectivity in general.   Which begs the question of how best to cope with such a situation.

                                     

                                    I have been pushing the team to either isolated standalone instances with the clustering solely at the LB,  or to having a full cluster with redundancy and a backup behind each clustered live node.   There are reasons having to do with message sequencing, that they are reluctant to do the first.

                                     

                                    ... or running the backup nodes on machines not otherwise involved in the JBoss cluster. 

                                     

                                    My very first order of business once I get out of my other project crisis mode, is to try to put HAJINDI in a standalone HQ.   I'm going to put that into the JIRA as a request sometime in the next day or two.   Being able to tightly integrate with JBoss but without being embedded in and at the mercy of the rest of the applications (or having them at the mercy of my need to restart HQ)... that would be desirable even when we get to JBoss 6. 

                                     

                                    respectfully

                                    BJ