2 Replies Latest reply: Aug 13, 2009 11:17 AM by Tim Fox RSS

Ping issues under stress

Jeff Mesnil Master

I managed to get ping issues and clients stuck doing a soak test on our messaging Labs.

Setup:
1 server (messaging-01) with 1 JMS topic
10 producers + 17 non-durable subscribers (splitted on 7 machines)

server config:

 Netty:
 <param key="jbm.remoting.netty.tcpnodelay" value="false" type="Boolean"/>
 <param key="jbm.remoting.netty.tcpsendbuffersize" value="1048576" type="Integer"/>
 <param key="jbm.remoting.netty.tcpreceivebuffersize" value="1048576" type="Integer"/>

 <security-enabled>false</security-enabled>
 <persistence-enabled>false</persistence-enabled>



JMS config:

<connection-factory name="ConnectionFactory">
<connector-ref connector-name="netty-connector"/>



<retry-interval>1000</retry-interval>
<retry-interval-multiplier>1.0</retry-interval-multiplier>
<reconnect-attempts>-1</reconnect-attempts>
<failover-on-server-shutdown>true</failover-on-server-shutdown>
<call-timeout>5000</call-timeout>
<block-on-non-persistent-send>false</block-on-non-persistent-send>
<block-on-persistent-send>true</block-on-persistent-send>
<pre-acknowledge>true</pre-acknowledge>
</connection-factory>







soak.properties:
duration-in-minutes=120
num-warmup-messages=100
message-size=1024
durable=false
transacted=false
batch-size=1000
drain-queue=false
destination-lookup=soakTopic
connection-factory-lookup=/ConnectionFactory
throttle-rate=-1
dups-ok-acknowledge=false
disable-message-id=true
disable-message-timestamp=true


The clients (both producers and consumers) are connected to reconnect to the server
when they are notified by a JMSExceptionListener. Their connection factories will indefinitely try
to reconnect to the server.

1. I start the server. No activity:

 [java] **** Server Dump ****
 [java] date: Thu Aug 13 08:40:56 EDT 2009
 [java] heap memory: used=60.09 MB, max=1.92 GB
 [java] non-heap memory: used=60.09 MB, max=1.92 GB
 [java] # of thread: 16
 [java] # of conns: 0
 [java] JMS topics:
 [java] soakTopic: 0 subscription (0 non-durable) receiving 0 message
 [java] ********************


2. I then start 10 producers and 17 non-durable subscribers
At the beginning, all producers are producing and all subscribers are subscribing

3. But after a few seconds, the subscribers starts to warn about missing ping servers and failures to failover

 [java] Aug 13, 2009 8:51:30 AM org.jboss.messaging.core.logging.Logger warn
 [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@128ae45a[local= /172.16.8.12:41708, remote=/172.16.8.10:5445] [code=3]
 [java] Aug 13, 2009 8:51:30 AM org.jboss.messaging.core.logging.Logger warn
 [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@128ae45a[local= /172.16.8.12:41706, remote=/172.16.8.10:5445] [code=3]
 [java] Aug 13, 2009 8:51:36 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage
 [java] INFO: received 10000 messages in 11.60s (total: 117s)
 [java] Aug 13, 2009 8:51:53 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage
 [java] INFO: received 10000 messages in 16.80s (total: 133s)
 [java] Aug 13, 2009 8:52:09 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage
 [java] INFO: received 10000 messages in 16.07s (total: 150s)
 [java] Aug 13, 2009 8:52:15 AM org.jboss.messaging.core.logging.Logger warn
 [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@5f7779e3[local= /172.16.8.12:41735, remote=/172.16.8.10:5445] [code=3]
 [java] Aug 13, 2009 8:52:15 AM org.jboss.messaging.core.logging.Logger warn
 [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@1ce3570c[local= /172.16.8.12:41737, remote=/172.16.8.10:5445] [code=3]
 [java] Aug 13, 2009 8:52:20 AM org.jboss.messaging.core.logging.Logger error
 [java] SEVERE: Failed to handle failover
 [java] MessagingException[errorCode=3 message=Timed out waiting for response when sending packet 32]
 [java] at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:312)
 [java] at org.jboss.messaging.core.client.impl.ClientSessionImpl.handleFailover(ClientSessionImpl.java:751)
 [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.reattachSessions(ConnectionManagerImpl.java:794)
 [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.failoverOrReconnect(ConnectionManagerImpl.java:647)
 [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.handleConnectionFailure(ConnectionManagerImpl.java:518)
 [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.access$600(ConnectionManagerImpl.java:81)
 [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl$DelegatingFailureListener.connectionFailed(ConnectionManagerImpl.java:1218)
 [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:393)
 [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:244)
 [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl$FailedConnectionAction$1.run(ConnectionManagerImpl.java:1174)
 [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 [java] at java.lang.Thread.run(Thread.java:619)
 [java] javax.jms.JMSException: Connection is destroyed
 [java] at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:240)
 [java] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.doCleanUp(ClientConsumerImpl.java:745)
 [java] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.close(ClientConsumerImpl.java:315)
 [java] at org.jboss.messaging.jms.client.JBossMessageConsumer.close(JBossMessageConsumer.java:153)
 [java] at org.jboss.messaging.jms.client.JBossSession.close(JBossSession.java:275)
 [java] at org.jboss.messaging.jms.client.JBossConnection.close(JBossConnection.java:259)
 [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.disconnect(SoakReconnectableReceiver.java:186)
 [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.access$000(SoakReconnectableReceiver.java:41)
 [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$2.onException(SoakReconnectableReceiver.java:99)
 [java] at org.jboss.messaging.jms.client.JBossConnection$JMSFailureListener$1.run(JBossConnection.java:550)
 [java] at java.lang.Thread.run(Thread.java:619)
 [java] Caused by: MessagingException[errorCode=2 message=Connection is destroyed]
 [java] ... 11 more
 [java] javax.jms.JMSException: Timed out waiting for response when sending packet 30
 [java] at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:312)
 [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.createSession(ConnectionManagerImpl.java:326)
 [java] at org.jboss.messaging.core.client.impl.ClientSessionFactoryImpl.createSessionInternal(ClientSessionFactoryImpl.java:977)
 [java] at org.jboss.messaging.core.client.impl.ClientSessionFactoryImpl.createSession(ClientSessionFactoryImpl.java:721)
 [java] at org.jboss.messaging.jms.client.JBossConnection.createSessionInternal(JBossConnection.java:463)
 [java] at org.jboss.messaging.jms.client.JBossConnection.createSession(JBossConnection.java:160)
 [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.connect(SoakReconnectableReceiver.java:213)
 [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.access$100(SoakReconnectableReceiver.java:41)
 [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$2.onException(SoakReconnectableReceiver.java:100)
 [java] at org.jboss.messaging.jms.client.JBossConnection$JMSFailureListener$1.run(JBossConnection.java:550)
 [java] at java.lang.Thread.run(Thread.java:619)
 [java] Caused by: MessagingException[errorCode=3 message=Timed out waiting for response when sending packet 30]
 [java] ... 11 more


4. on the server, the number of remoting connection (and the memory) explodes:

 [java] **** Server Dump ****
 [java] date: Thu Aug 13 08:41:57 EDT 2009
 [java] heap memory: used=1.46 GB, max=1.88 GB
 [java] non-heap memory: used=1.46 GB, max=1.88 GB
 [java] # of thread: 62
 [java] # of conns: 112
 [java] JMS topics:
 [java] soakTopic: 17 subscription (17 non-durable) receiving 8337499 message
 [java] ********************


5. The exception continues on the subscribers
6. Things are messy on the server-side with more subscriptions than there are clients:
 [java] **** Server Dump ****
 [java] date: Thu Aug 13 08:43:57 EDT 2009
 [java] heap memory: used=1.70 GB, max=1.88 GB
 [java] non-heap memory: used=1.70 GB, max=1.88 GB
 [java] # of thread: 45
 [java] # of conns: 260
 [java] JMS topics:
 [java] soakTopic: 23 subscription (23 non-durable) receiving 10414915 message
 [java] ********************


6. the server start to clean up remoting connections and related resources
 [java] [Thread-0 (group:JBM-scheduled-threads-2030193044)] 08:44:25,717 WARNING [org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl] Did not receive initial ping from /172.16.8.15:38164, connection will be closed
 [java] [Thread-0 (group:JBM-scheduled-threads-2030193044)] 08:44:25,717 WARNING [org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl] Did not receive initial ping from /172.16.8.12:49942, connection will be closed


7. I also have some netty exception on the server:
 [java] [New I/O server worker #1-2] 08:44:31,281 WARNING [org.jboss.netty.channel.socket.nio.NioWorker] Unexpected exception in the selector loop.
 [java] org.jboss.netty.channel.ChannelException: Failed to register a socket to the selector.
 [java] at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:754)
 [java] at org.jboss.netty.channel.socket.nio.NioWorker.processRegisterTaskQueue(NioWorker.java:260)
 [java] at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:199)
 [java] at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:53)
 [java] at org.jboss.messaging.integration.transports.netty.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:184)
 [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 [java] at java.lang.Thread.run(Thread.java:619)
 [java] Caused by: java.nio.channels.ClosedChannelException
 [java] at java.nio.channels.spi.AbstractSelectableChannel.configureBlocking(AbstractSelectableChannel.java:252)
 [java] at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:739)
 [java] ... 7 more


8. Failed connections now are unable to clean up related resource on the server
 [java] [Thread-3 (group:JBM-scheduled-threads-2030193044)] 08:47:55,595 SEVERE [org.jboss.messaging.core.management.impl.ManagementServiceImpl] Failed to call listener
 [java] java.lang.IllegalStateException: Cannot find queue info for queue b685ebae-d3b7-4a4d-81f1-862366829b1f8b8e8cc7-8806-11de-9b94-0015178e6d7c
 [java] at org.jboss.messaging.core.postoffice.impl.PostOfficeImpl.onNotification(PostOfficeImpl.java:387)
 [java] at org.jboss.messaging.core.management.impl.ManagementServiceImpl.sendNotification(ManagementServiceImpl.java:647)
 [java] at org.jboss.messaging.core.server.impl.ServerConsumerImpl.close(ServerConsumerImpl.java:272)
 [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.close(ServerSessionImpl.java:339)
 [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.doHandleClose(ServerSessionImpl.java:2217)
 [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.handleClose(ServerSessionImpl.java:882)
 [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.connectionFailed(ServerSessionImpl.java:1177)
 [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:393)
 [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:244)
 [java] at org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl$FailedConnectionAction.run(RemotingServiceImpl.java:480)
 [java] at org.jboss.messaging.core.remoting.impl.Pinger.run(Pinger.java:116)
 [java] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 [java] at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 [java] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 [java] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 [java] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 [java] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 [java] at java.lang.Thread.run(Thread.java:619)


9. During that time, the producers started to see missing pings from the server:

10. When things settle down, all clients are stuck
 [java] **** Server Dump ****
 [java] date: Thu Aug 13 09:22:10 EDT 2009
 [java] heap memory: used=1.70 GB, max=1.87 GB
 [java] non-heap memory: used=1.70 GB, max=1.87 GB
 [java] # of thread: 30
 [java] # of conns: 16
 [java] JMS topics:
 [java] soakTopic: 2 subscription (2 non-durable) receiving 0 message
 [java] ********************


I did the same run with <block-non-persistent-send> set to true and I did not get any error.
To clear things up, no servers were killed/restarted during these runs.