-
1. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
timfox Feb 2, 2007 5:57 PM (in response to clebert.suconic)Yes.
We should not barf if an ack is received and the message can't be found.
Actually this is valid even without failover.
In general any invocation can fail as the response is being written to the caller, but after the actual deed has been done on the server, this applies to sends as well as acks.
In the case of sends this means the call to invoke() throws an exception but the message has actually reached the queue, in which case you don't know whether to retry or not since you don't want duplicate messages in the queue. This is where duplicate message detection becomes useful.
In the case of an ack it is always safe to retry the ack, *as long* as the server silently ignores the ack if the message can't be found. If we're not doing that already we should. -
2. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
timfox Feb 2, 2007 6:02 PM (in response to clebert.suconic)In other words, ideally all our operations should be idempotent. This is easy for acks, but not so easy for sends.
-
3. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
clebert.suconic Feb 2, 2007 6:58 PM (in response to clebert.suconic)As part of this discussion I'm also adding testFailureRightBeforeSend and testFailureRightAfterSend
-
4. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
clebert.suconic Feb 2, 2007 8:43 PM (in response to clebert.suconic)I have just committed a fix for this.
Please Tim and Ovidiu.. if you could take a look...
Especially on JDBCPersistenceManager... MultiThreadFailoverTest was failling on the changed line. If you think this change is not ok I can investigate why this was happening.
I have used SVN comment as "http://jira.jboss.com/jira/browse/JBMESSAGING-808 - fix". You could locate the changes with this. -
5. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
timfox Feb 3, 2007 7:44 AM (in response to clebert.suconic)Looks good.
But when you make changes don't just comment things out, get rid of them if they are not needed.if (rows != 1) { // http://jira.jboss.com/jira/browse/JBMESSAGING-808 log.warn("Failed to remove row for: " + ref); return; //throw new IllegalStateException("Failed to remove row for: " + ref); }
Otherwise we end up with the code scattered with rubbish.
If we want to retrieve the previous version, it's in version control.
I'm not sure if the log.warn is really necessary either.
log.warn would imply there is probably something wrong. Is this the case? -
6. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
clebert.suconic Feb 3, 2007 1:37 PM (in response to clebert.suconic)"Tim Fox" wrote:
But when you make changes don't just comment things out, get rid of them if they are not needed.
Fair enough... I will remove these comments later (if you haven't done so yet)."Tim Fox" wrote:
I'm not sure if the log.warn is really necessary either.
log.warn would imply there is probably something wrong. Is this the case?
It's not a problem.
I just wanted to log ACKs not found case we get lots of them.. (if something is wrong with our code at some point for example).
And I kept it as log.warn because I thought if something is wrong wit a config in a Production system. (Say... for example if someone deleted information from the database in the middle of an operation).
But we could use log.debug if you think it's better. -
7. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
clebert.suconic Feb 3, 2007 1:47 PM (in response to clebert.suconic)As part of the investigation of this issue, and per what you said about producers, I have found this other one:
http://jira.jboss.com/jira/browse/JBMESSAGING-809
It has to do with sending messages now. It's a rare event but it can happen if you crash the server under high load. (>30% of probability on MultiThreadFailover if you have numberOfThreadConsumers>numberOfThreadProducers). (you have two properties you can change for these number of threads on MultiThreadFailover) -
8. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
timfox Feb 5, 2007 7:43 AM (in response to clebert.suconic)Well... there's no much you can do about this other than implement duplicate message detection (there's already a task for this).
What you could do though, is isolate the exact case in a test.
This should be easy: Use the PoisonInterceptor to crash the server as a call to send() is returning. Send a persistent message, then we know the message is in storage after the send, but the client will receive an exception, then try and send the message again, and you'll get an exception (probably PK violation).
This would be solved properly by duplicate message detection.
Also, for a partial solution, we could introduce a flag "ignore PK violations" which ignores the send if it's already in the database.
That solution would only be partial since you could still get duplicate messages sent since the original one might be acked before the second copy is sent, but at least it means that failover will work ok. -
9. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
clebert.suconic Feb 5, 2007 9:56 AM (in response to clebert.suconic)I forgot to say that I had already created the testcase.. (Using the PoisonInterceptor)
MultiThreadFailoverTest::testFailureOnSendReceiveSynchronized
It crashes the server when you have two threads... each one in a receiver/send method. It always replicate the problem. -
10. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
timfox Feb 5, 2007 10:00 AM (in response to clebert.suconic)You could replicate this even more simply with just a single send, no receive necessary
-
11. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
clebert.suconic Feb 5, 2007 10:07 AM (in response to clebert.suconic)I created MultiThreadFailoverTest::testFailureOnSendReceiveSynchronized on Friday because FailoverTest::testFailureRightAfterSend and FailoverTest::testFailureRightAfterSend were not failure.
-
12. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
timfox Feb 5, 2007 10:10 AM (in response to clebert.suconic)I'm not sure what those tests do, but you just need to do this:
1. send message 2. crash the server in the poison interceptor after the send has been handled but before the response is written 3. client will get an exception 4. try and send the message again - should give a PK violation.
-
13. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
timfox Feb 5, 2007 10:12 AM (in response to clebert.suconic)Also, no failover is necessary.
The test should can be done in a non clustered environment -
14. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
clebert.suconic Feb 5, 2007 10:25 AM (in response to clebert.suconic)"timfox" wrote:
I'm not sure what those tests do, but you just need to do this:1. send message 2. crash the server in the poison interceptor after the send has been handled but before the response is written 3. client will get an exception 4. try and send the message again - should give a PK violation.
That's what FailoverTest::testFailureRightBeforeSend and FailoverTest::testFailureRightAfterSend are about.