1 2 Previous Next 27 Replies Latest reply on Feb 6, 2007 1:27 PM by clebert.suconic Go to original post
      • 15. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
        timfox

         

        "clebert.suconic@jboss.com" wrote:


        That's what FailoverTest::testFailureRightBeforeSend and FailoverTest::testFailureRightAfterSend are about.



        Ummmm... but you said they didn't fail.

        If they didn't fail then they're not working correctly.


        • 16. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
          timfox

          Also, they are in a clustered environment (not necessary)

          • 17. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
            timfox

            What you need is this:

            In a *non clustered environment*:

            Send a single message, get the server to crash after the send but before the response is written.

            Try to send the message again

            You should get a PK violation.

            If you don't then you've written the test wrong.

            Then, create a JIRA task, and fix it as per my comments at the beginning of the thread.

            I would recommend a flag in the pm config which ignores duplicate sends and is on in a clustered config, but off otherwise.

            I am open to other ideas.

            • 18. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
              clebert.suconic

              I found out why testFailureRightAfterSend was not failing.

              I was expecting the message to be delivered only on the failedOver consumer, but it is also being delivered to a pre-existent consumer on the new node.

              So.. I have changed the testcase and I can verify a duplication. It's not throwing a PrimaryKey exception as the server is creating an extra reference. (one for the failedOver queue, and one for the original queue on the new server).

              • 19. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                timfox

                If you're not getting a PK violation, either the message is non persistent, or it's not the same message you're sening. Is the message id the same?

                • 20. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                  clebert.suconic

                  The message is persistent...

                  It's creating a new reference to the same message.


                  I commented out the receives from the testcase and printed the queries:

                  mysql> select * from jms_message;
                  +-----------+----------+------------+---------------+----------+-------------+--------------------+--------------+------+---------+---------------+---------------------+-----------------------------------+---------+---------------+
                  | MESSAGEID | RELIABLE | EXPIRATION | TIMESTAMP | PRIORITY | COREHEADERS | PAYLOAD | CHANNELCOUNT | TYPE | JMSTYPE | CORRELATIONID | CORRELATIONID_BYTES | DESTINATION | REPLYTO | JMSPROPERTIES |
                  +-----------+----------+------------+---------------+----------+-------------+--------------------+--------------+------+---------+---------------+---------------------+-----------------------------------+---------+---------------+
                  before-poison | 2 | 5 | NULL | NULL | NULL | QJBossQueue[testDistributedQueue] | NULL | NULL |
                  +-----------+----------+------------+---------------+----------+-------------+--------------------+--------------+------+---------+---------------+---------------------+-----------------------------------+---------+---------------+
                  1 row in set (0.00 sec)
                  
                  mysql> select * from jms_message_reference;
                  +-----------+-----------+---------------+-------+-------------------+----------+---------------+----------+--------+----------------+
                  | CHANNELID | MESSAGEID | TRANSACTIONID | STATE | ORD | PAGE_ORD | DELIVERYCOUNT | RELIABLE | LOADED | SCHED_DELIVERY |
                  +-----------+-----------+---------------+-------+-------------------+----------+---------------+----------+--------+----------------+
                  | 0 | 4352 | NULL | C | 38361815582113792 | NULL | 0 | Y | NULL | 0 |
                  | 10 | 4352 | NULL | C | 38361815340220416 | NULL | 0 | Y | NULL | 0 |
                  +-----------+-----------+---------------+-------+-------------------+----------+---------------+----------+--------+----------------+
                  2 rows in set (0.00 sec)
                  
                  mysql> select * from jms_postoffice;
                  +-----------------+---------+----------------------+----------------------------+----------+------------+----------------+----------------+
                  | POSTOFFICE_NAME | NODE_ID | QUEUE_NAME | COND | SELECTOR | CHANNEL_ID | IS_FAILED_OVER | FAILED_NODE_ID |
                  +-----------------+---------+----------------------+----------------------------+----------+------------+----------------+----------------+
                  | Clustered JMS | 0 | testDistributedQueue | queue.testDistributedQueue | NULL | 0 | N | NULL |
                  | Clustered JMS | 0 | testDistributedQueue | queue.testDistributedQueue | NULL | 10 | Y | 1 |
                  +-----------------+---------+----------------------+----------------------------+----------+------------+----------------+----------------+
                  2 rows in set (0.00 sec)
                  


                  • 21. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                    clebert.suconic

                    The query output is kind of hard to read here on the forum... but all that matters is.. you have one message... and two references.. (one for queueId=0 and anotheron to queuedId=10)

                    10 is the failedOver queue, and 0 is the original queue on the node.

                    • 22. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                      timfox

                      Ah right.

                      This is a clustered setup where you have multiple partial queues. I was thinking you were retrying on the same partial queue.

                      • 23. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                        clebert.suconic

                         

                        "timfox" wrote:
                        Ah right.
                        I was thinking you were retrying on the same partial queue.


                        That's what I was expecting.. only the failedOver queue being used on the new server. But then I discovered the message being delivered somewhere I was not expecting.



                        • 24. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                          timfox

                          One idea:

                          When retrying a send (or a send transaction) after failover, you could add a flag in the request (checkForDuplicates = true).

                          On the server side, when it processes the request, if this flag is true, then before inserting the messages, you can check in the database to see if the messages already exist (pm.referenceExists()) and if so, you silently ignore the request.

                          This isn't perfect, and doesn't cope with the case where the message is acked before the duplicate arrives, but to deal with that we need duplicate message detection via a cache or something.

                          • 25. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                            clebert.suconic

                            that works (besides the other synchronized case where ACKed already happened).

                            I'm doing these changes:

                            - Adding a method on the postOffice, as I didn't want to deal with PM on the ServerConnectionEndpoint neither to expose PM on PostOffice:

                            boolean messageExists(long messageId) throws Exception;


                            - On FailoverValveInterceptor, I'm adding a header to the JBossMessage when a failure occurs.

                            - I will also set a flag on TransactionRequest (protected boolean retry) but I will need to change the wireFormat a little bit to cope with that.

                            • 26. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                              timfox

                               

                              "clebert.suconic@jboss.com" wrote:
                              that works (besides the other synchronized case where ACKed already happened).

                              I'm doing these changes:

                              - Adding a method on the postOffice, as I didn't want to deal with PM on the ServerConnectionEndpoint neither to expose PM on PostOffice:


                              It's ok to talk to the pm directly. I don't see the value of adding methods to the postoffice if they then just talk directly to the pm.


                              - On FailoverValveInterceptor, I'm adding a header to the JBossMessage when a failure occurs.

                              - I will also set a flag on TransactionRequest (protected boolean retry) but I will need to change the wireFormat a little bit to cope with that.


                              I would prefer send() and sendTransaction() to be dealt with consistently.

                              I.e. add a new parameter in both cases rather than a header.

                              Yes, this would involve changes (add a boolean) to the wireformat and change of the signature of send() and sendTransaction().


                              • 27. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                                clebert.suconic

                                I'm about to commit the fix for http://jira.jboss.org/jira/browse/JBMESSAGING-809

                                But I realized something..

                                If I disable the retry detection (I mean... the query to verify the duplication). I get a PrimaryKey duplication exception... but the JDBCPersistentManager can always perform the update after a retry:

                                I don't have any other thread receiving messages (the receive would happen afterwards the send is completed). I don't know what would delete the reference allowing the retry to work.

                                11:23:01,267 WARN @WorkerThread#1[127.0.0.1:45094] [JDBCPersistenceManager] SQLException caught - assuming deadlock detected, try:1
                                java.sql.BatchUpdateException: Duplicate entry '10-4353' for key 1
                                 at com.mysql.jdbc.ServerPreparedStatement.executeBatch(ServerPreparedStatement.java:648)
                                 at org.jboss.resource.adapter.jdbc.WrappedStatement.executeBatch(WrappedStatement.java:517)
                                 at org.jboss.messaging.core.plugin.JDBCPersistenceManager.updateWithRetry(JDBCPersistenceManager.java:3454)
                                 at org.jboss.messaging.core.plugin.JDBCPersistenceManager.updateWithRetryBatch(JDBCPersistenceManager.java:3337)
                                 at org.jboss.messaging.core.plugin.JDBCPersistenceManager.handleBeforeCommit1PC(JDBCPersistenceManager.java:2021)
                                 at org.jboss.messaging.core.plugin.JDBCPersistenceManager$TransactionCallback.beforeCommit(JDBCPersistenceManager.java:3709)
                                 at org.jboss.messaging.core.tx.Transaction.commit(Transaction.java:201)
                                 at org.jboss.jms.server.endpoint.ServerConnectionEndpoint.sendTransaction(ServerConnectionEndpoint.java:436)
                                 at org.jboss.jms.server.endpoint.advised.ConnectionAdvised.org$jboss$jms$server$endpoint$advised$ConnectionAdvised$sendTransaction$aop(ConnectionAdvised.java:99)
                                 at org.jboss.jms.server.endpoint.advised.ConnectionAdvised$sendTransaction_N3268650789275322226.invokeNext(ConnectionAdvised$sendTransaction_N3268650789275322226.java)
                                 at org.jboss.jms.server.container.ServerLogInterceptor.invoke(ServerLogInterceptor.java:105)
                                 at org.jboss.jms.server.endpoint.advised.ConnectionAdvised$sendTransaction_N3268650789275322226.invokeNext(ConnectionAdvised$sendTransaction_N3268650789275322226.java)
                                 at org.jboss.jms.server.endpoint.advised.ConnectionAdvised.sendTransaction(ConnectionAdvised.java)
                                 at org.jboss.jms.wireformat.ConnectionSendTransactionRequest.serverInvoke(ConnectionSendTransactionRequest.java:81)
                                 at org.jboss.jms.server.remoting.JMSServerInvocationHandler.invoke(JMSServerInvocationHandler.java:125)
                                 at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:715)
                                 at org.jboss.remoting.transport.socket.ServerThread.processInvocation(ServerThread.java:553)
                                 at org.jboss.remoting.transport.socket.ServerThread.dorun(ServerThread.java:378)
                                 at org.jboss.remoting.transport.socket.ServerThread.run(ServerThread.java:158)
                                11:23:01,268 WARN @WorkerThread#1[127.0.0.1:45094] [JDBCPersistenceManager] Trying again after a pause
                                11:23:01,765 WARN @WorkerThread#1[127.0.0.1:45094] [JDBCPersistenceManager] Update worked after retry
                                11:23:01,766 TRACE @WorkerThread#1[127.0.0.1:45094] [JDBCPersistenceManager] Batch update INSERT INTO JMS_MESSAGE_REFERENCE (CHANNELID, MESSAGEID, TRANSACTIONID, STATE, ORD, PAGE_ORD, DELIVERYCOUNT, RELIABLE, SCHED_DELIVERY) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?), inserted
                                total of 0 rows
                                


                                1 2 Previous Next