Thread pool for new s2s connections may get exhausted when remote servers are unresponsive
Description
After establishing a new connection to a remote server (secured connection or server dialback) a stream header is sent and another is expected from the remote server. The problem is that if the connection was lost (and the JVM never realized of that) or for some other reason the remote server never responds then Wildfire will wait forever thus posibly blocking other threads that are trying to contact the same server.
A future enhancement will include modifying OutgoingSessionPromise to just use one thread per queued domain so that if a remote domain is not responding then only one thread will be consumed. Thus making a smarter usage of the thread pool.
Example of thread dumps due to this issue:
"pool-3-thread-5" prio=1 tid=0x08a730b0 nid=0x3ed4 runnable [0x698ff000..0x698ff570] at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411) at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
locked <0x73a8d950> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:167) at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2971) at org.xmlpull.mxp1.MXParser.more(MXParser.java:3025) at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1410) at org.jivesoftware.wildfire.net.MXParser.nextImpl(MXParser.java:331) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.jivesoftware.wildfire.server.OutgoingServerSession.createOutgoingSession(OutgoingServerSession.java:284) at org.jivesoftware.wildfire.server.OutgoingServerSession.authenticateDomain(OutgoingServerSession.java:140) - locked <0x7081c510> (a java.lang.String) at org.jivesoftware.wildfire.server.OutgoingSessionPromise.createSessionAndSendPacket(OutgoingSessionPromise.java:126) at org.jivesoftware.wildfire.server.OutgoingSessionPromise.access$300(OutgoingSessionPromise.java:37) at org.jivesoftware.wildfire.server.OutgoingSessionPromise$1$1.run(OutgoingSessionPromise.java:91) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595)
"pool-3-thread-4" prio=1 tid=0x08a74c38 nid=0x3ed3 waiting for monitor entry [0x69dff000..0x69dff5f0] at org.jivesoftware.wildfire.server.OutgoingServerSession.authenticateDomain(OutgoingServerSession.java:138)
waiting to lock <0x7081c510> (a java.lang.String) at org.jivesoftware.wildfire.server.OutgoingSessionPromise.createSessionAndSendPacket(OutgoingSessionPromise.java:126) at org.jivesoftware.wildfire.server.OutgoingSessionPromise.access$300(OutgoingSessionPromise.java:37) at org.jivesoftware.wildfire.server.OutgoingSessionPromise$1$1.run(OutgoingSessionPromise.java:91) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595)
After establishing a new connection to a remote server (secured connection or server dialback) a stream header is sent and another is expected from the remote server. The problem is that if the connection was lost (and the JVM never realized of that) or for some other reason the remote server never responds then Wildfire will wait forever thus posibly blocking other threads that are trying to contact the same server.
A future enhancement will include modifying OutgoingSessionPromise to just use one thread per queued domain so that if a remote domain is not responding then only one thread will be consumed. Thus making a smarter usage of the thread pool.
Example of thread dumps due to this issue:
"pool-3-thread-5" prio=1 tid=0x08a730b0 nid=0x3ed4 runnable [0x698ff000..0x698ff570]
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
locked <0x73a8d950> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2971)
at org.xmlpull.mxp1.MXParser.more(MXParser.java:3025)
at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1410)
at org.jivesoftware.wildfire.net.MXParser.nextImpl(MXParser.java:331)
at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
at org.jivesoftware.wildfire.server.OutgoingServerSession.createOutgoingSession(OutgoingServerSession.java:284)
at org.jivesoftware.wildfire.server.OutgoingServerSession.authenticateDomain(OutgoingServerSession.java:140)
- locked <0x7081c510> (a java.lang.String)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.createSessionAndSendPacket(OutgoingSessionPromise.java:126)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.access$300(OutgoingSessionPromise.java:37)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise$1$1.run(OutgoingSessionPromise.java:91)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
"pool-3-thread-4" prio=1 tid=0x08a74c38 nid=0x3ed3 waiting for monitor entry [0x69dff000..0x69dff5f0]
at org.jivesoftware.wildfire.server.OutgoingServerSession.authenticateDomain(OutgoingServerSession.java:138)
waiting to lock <0x7081c510> (a java.lang.String)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.createSessionAndSendPacket(OutgoingSessionPromise.java:126)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.access$300(OutgoingSessionPromise.java:37)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise$1$1.run(OutgoingSessionPromise.java:91)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)