[JM-343] Improve connection pool recovery logic Created: 22/Jul/05  Updated: 26/Mar/08  Resolved: 13/Jan/08

Status: Closed
Project: Openfire (ARCHIVED)
Components: Core, Database
Affects versions: 2.2.0 Beta 2
Fix versions: 3.4.5

Type: Improvement Priority: Major
Reporter: MattM Assignee: Daniel Henninger
Resolution: Fixed Votes: 20
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified


 Description   

Improve the connection pool recovery logic if possible. More info at:

http://www.jivesoftware.org/forums/thread.jspa?threadID=15165



 Comments   
Comment by Gaston Dombiak [ 22/Jul/05 ]

Matt,

Last week I was thinking about this same issue and discussed an idea with Bill. Basically, the idea was to create a DbCommand class that will encapsulate the logic of getting a connection from the pool, executing the query and returning the connection to the pool. When trying to execute the query a try/catch block would be used to detect SocketExceptions so if a socket exception was caught then a new connection from the pool will be obtained and the previous one is discarded. We may use a times_to_retry_limit if we want or just retry until a good connection is found thus cleaning up dead connections in the pool.

I would use an optimistic approach instead of a pesimistic one. Where optimistic approach = assume that the connection is good, try to execute the query and if the socket is dead then try to recover. A perimistic approach would be to assume that the connection is dead so a check-query should be performed before obtaining or using a connection from the pool. Note: The optimistic approach has an important overhead.

The DbCommand class would have 2 methods that subclasses would need to implement. #configureStatement and #processResultSet. A nice side effect of this class is that we won't need to remember to use a tr/catch/finally block to get and return connections to the pool and remember to close the result set and statement.

Comment by Gaston Dombiak [ 22/Jul/05 ]

Err: Note: The optimistic approach has an important overhead. --> Note: The pesimistic approach has an important overhead.

Comment by MattM [ 23/Jul/05 ]

Ack, I'd really like to stay away from the DbCommand approach. As soon as you start wrapping JDBC with another API the complexity can grow very quickly. In fact, we had a DbCommand approach in Jive Messenger originally and it was a bear to use from an API perspective. Another way we could approach the DbCommand logic is to implement some error handling in the ConnectionWrapper. I would propose:

  • Detect errors in ConnectionWrapper so that the connection doesn't get put back into the pool when "closed".
  • Have a health check monitor process that explicitly tries to execute a command in the database for each connection on some periodic basis.

I think the combination of those two things would handle the majority of cases for determining bad connections.

Comment by Elias Baixas [ 03/Mar/07 ]

Hello, I've done this quickhack to workaround this issue, you can find it in
http://www.igniterealtime.org/forum/thread.jspa?threadID=25011&tstart=0
it uses the "pessimistic" approach and times_to_retry is set to 10.

best regards,

Elias

Comment by Daniel Henninger [ 09/Jan/08 ]

FYI, actual reference in last comment is now at: http://www.igniterealtime.org/community/thread/25011

No clue where/what the original is though. =(

Comment by Daniel Henninger [ 09/Jan/08 ]

Aha http://www.igniterealtime.org/community/thread/15165 for original link posted

Comment by Daniel Henninger [ 12/Jan/08 ]

Matt provides the following ideas of possible 3rd party replacements for our home brew setup:

http://commons.apache.org/dbcp/
http://sourceforge.net/projects/c3p0
http://proxool.sourceforge.net/

So far I'm quite fond of the last one.

Comment by Daniel Henninger [ 13/Jan/08 ]

We have migrated to proxool in lieu of our home brew setup. Looks to be very mature, working like a charm so far, I'm very impressed.

Comment by MattM [ 14/Jan/08 ]

Can we show some simple additional stats on the database page? Check out the following:

http://proxool.sourceforge.net/admin.html

It would be super sweet to show the current info as well as a graph over time. Maybe the graphing would be in the future though.

Comment by MattM [ 14/Jan/08 ]

One more thing: it would be awesome to test the following:

Start MySQL
Start Openfire
Verify everything working
Stop Mysql
Verify things not working due to no db
Start Mysql
See if Openfire can recover due to improved connection pool logic

Comment by Daniel Henninger [ 14/Jan/08 ]

Hi Matt! I actually already am including the web servlet! Though now that you mention it, I might need to make a config tweak to trigger the graphing to actually occur. Good test suggestion!

Comment by MattM [ 14/Jan/08 ]

Rather than include the web servlet, it seems like it would be much better to incorporate that same data into the existing database page.

Comment by Daniel Henninger [ 15/Jan/08 ]

Hrm. Good point. I had planned on doing that "in the future" but might as well do it now instead of using the servlet. =)

BTW I ran your test and wow. it didn't miss a beat! (I took down MySQL, almost immediately did it detect problems, things broke, etc, and then I started MySQL back up and it immediately was happy again) I also did a rude test -> i kill -9'd mysql out from under openfire and it didn't detect the mysql shutdown immediately in terms of .. immediately show that mysql died, but the first database request caught it. (and I imagine the every 30 second check would have seen it) Brought up MySQL, all is well again immediately.

I'm quite pleased with it's behavior!

I'll look into improving the servlet.

BTW, looks like the ClearSpace folk aren't particularly interested for now, sounds like they solved their problems with their stuff. Bruce did tell me that JNDI has some sort of it's own connection pooling, but I'm not sure it has the powerful connection testing that this has. (proxool is testing the connection before and after it's used to see if it's still a valid connection, it seems to be very robust in that sense ... the caveat of course is that' it's doing extra "SELECT 1" commands, but I explicitly picked something super quick like that so it shouldn't be a noticable performance hit.

Speaking of which, at some point I'd like to investigate where we could use some indexes and such to improve our database interactions altogether. Guus found at least one instance where we were missing a possible important index that sped things up quite a bit.

Comment by James Mc Millan [ 26/Mar/08 ]

Hi Guys

Just to let you know, we upgraded to 3.5.0rc1 from 3.3.2 and we had major issues with the Proxool connection pooling, and we had to give it an unreasonably high number of connections in order for it to work (most of the time). It seems to hold onto connections for longer than it should and not release them.

The good news is switching to c3p0 (an easy implementation of ConnectionProvider) solved the problem for us. So, if anyone else has problems, switching to c3p0 is simple enough. At the moment though, we don't have statistics on the database page, which throws an exception as it is hardcoded to look for Proxool.

Thanks
James

Comment by Daniel Henninger [ 26/Mar/08 ]

There's an improvement coming to the current pool implemention in RC2.

Generated at Thu Apr 25 14:48:50 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100251-rev:2d0d695520e7095763476433152508933e579798.