I'm building a HA environment using JBoss AS 5.1 and PostgreSQL 9.0.
I have 2 instances of PostgreSQL 9.0 replicating as asynchronous master-slave using streaming replication (the highly expected feature of 9.x) and log shipping:
- data_node1 (master, rw)
- data_node2 (slave, ro)
I also have 2 instances of JBoss that I'd like to connect to those two and if I understand correctly, I can use HA (XA) DataSource for that purpose (http://community.jboss.org/wiki/JBossJCADatabaseFailover)
The problem I'm facing is related to failover process on the database level. In order to promote a PostgreSQL node to master, I need to create a trigger file that will initialise the failover process on that node.
Now, my questions are:
1) Is it ok to handle that on the HA DataSource level? I know I can provide a custom URLSelectorStrategy implementaton, but I'm not sure if invoking a shell script from there is the right thing to do in a distributed environment (say, "connection-failed.sh <node_number>").
2) Is this the right approach in general ? This is my first HA setup and I'm still a bit confused about all the aspects of the problem.
I've considered using:
- pg-pool II; introduces a single point of failure
- tungsten; lack of examples / not sure if XA will be supported
- heartbeat; not sure if a smooth failover is possible (without loosing connections)
Any input on this would be greatly appreciated.
JBoss traditionally deals strictly in middle-ware, and so does not offer any true HA services for the database layer. To answer your specific questions:
1) The <ha-tx-local-datasource> & <ha-xa-datasource> perform very rudamentary failover - they only offer failover over new connections. Also, there's no guarantee that the servers in your cluster will all point to the same url at the same time. Additionally, you probably don't want to try to initiate master promotion automatically, since you don't want this to be initiated by multiple servers.
2) When chosing an HA data storage solution, you're going to have to consider the trade-offs between availability (i.e. down time) vs data integrity (i.e. data loss) vs performance. Make sure you familiarize yourself with your options:
Of the options you listed, pgpool-II is probably the best option, since, though it still maintains a single point-of-failure, its point-of-failure does not also house your data. In general, statement-based replication middleware offers a higher level of availability than master-standby replication, and without the potential for data loss. I've not used the Tungsten suite of products, so I can't offer any recommendations either way.
Another option you may want to consider is HA-JDBC (http://ha-jdbc.sourceforge.net). Like PGPool-II, HA-JDBC is statement-based replication middleware, and therefore offers the potential for complete fault tolerance. HA-JDBC differs from pgpool-II, in that instead of requiring a single point-of-failure external process, the logic reside completely within the driver.