Hotrod resource leak - too many open files
mackerman Aug 12, 2013 5:38 PMWe are running into a problem with a custom cache loader where we have a resource leak that looks like is due to Hotrod. We are running Infinispan 5.3.0.Final on CentOS 6.3. After running for a few days we have ended up with a few thousand of open files (around 2222 at the time this log was snaphot). Attached is a log of all file/socket open and closes (obtained via http://file-leak-detector.kohsuke.org/). Basically we see a few thousand of these traces:
Opened socket channel by thread:Timer-0 on Sat Aug 10 00:08:25 UTC 2013
at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:87)
at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.<init>(TcpTransport.java:84)
at org.infinispan.client.hotrod.impl.transport.tcp.TransportObjectFactory.makeObject(TransportObjectFactory.java:57)
at org.infinispan.client.hotrod.impl.transport.tcp.TransportObjectFactory.makeObject(TransportObjectFactory.java:38)
at org.apache.commons.pool.impl.GenericKeyedObjectPool.addObject(GenericKeyedObjectPool.java:1729)
at org.apache.commons.pool.impl.GenericKeyedObjectPool.ensureMinIdle(GenericKeyedObjectPool.java:2095)
at org.apache.commons.pool.impl.GenericKeyedObjectPool.ensureMinIdle(GenericKeyedObjectPool.java:2060)
at org.apache.commons.pool.impl.GenericKeyedObjectPool.access$1600(GenericKeyedObjectPool.java:204)
at org.apache.commons.pool.impl.GenericKeyedObjectPool$Evictor.run(GenericKeyedObjectPool.java:2360)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
without corresponding closes.
If you look at the log file you will see that there are other sockets being opened and closed, which I believe can be ignored. These are "Opened socket to null", which relate to database connections and rest messaging, and have mostly been closed by "Closed socket to cache.dev.norcal.pgilab.net".
At one point the server crashed due to a "too many open files" exception, which looks like it will eventually happen again given the current resource leak. our ulimit is 2^15.
Any insight would be appreciated.
thanks, Mitchell
-
10.81.224.163-open-files.log.zip 390.9 KB