portal clustering problem when using optimistic...| JBoss.org Content Archive (Read Only)

45. Re: portal clustering problem when using optimistic locking

julien1 Aug 5, 2008 12:17 PM (in response to prabhat.jha)

About portlet replication:

by default JBoss Portal does not do anything and portlet session are wrappers of HttpSession but of a *dispatched* session, quite often in a cross context manner.

We have an internal replication mechanism which leverages the portal session (i.e the one we are sure it is replicated). To store portlet session state in the portal session to have the replication feature. So we don't do much magic as we use standard Servlet API and nothing else and everything we do seems to me valid.

So bottom line, we can do portal replication and optionally we provide a replication for portlets that configures it. We don't use anything that is not Servlet API based and that is not exotic.

46. Re: portal clustering problem when using optimistic locking

prabhat.jha Aug 5, 2008 12:46 PM (in response to prabhat.jha)

I think I did a good job of confusing Brian and it's my bad. I assumed that since portal has a cache definition, it would use it for all clustering configuration.

So if I am not mistaken now, tree cache configuration change that I would do in jboss-web-cluster.sar would affect portal and portlet session replication. Changes that I have been doing in jboss-portal-ha.sar would only affect clustering related to hibernate. correct?

47. Re: portal clustering problem when using optimistic locking

brian.stansberry Aug 5, 2008 1:07 PM (in response to prabhat.jha)

I'm not sure I completely understood Julien's comment, but yes, Prabhat, what you said sounds correct, since any portlet session replication feature is leveraging the standard HttpSession replication. Julien, please correct me if I'm wrong. :-)

48. Re: portal clustering problem when using optimistic locking

prabhat.jha Aug 6, 2008 2:55 PM (in response to prabhat.jha)

I ran tests with Buddy Replication on in jboss-web-cluster.sar. The jboss-service.xml used here is exactly same as that of EAP which is repeatable_read, repl_async and buddy replication on.

Configuration of hibernate/jboss cache clustering remain unchanged with pessimistic, read committe and invallidaton synch.

Results are similar to previous result upto 3 nodes. With 4 nodes I get better number, previously it was 3800 users, now it's 4400. But with 5 nodes, it's back to what I get previously.

What I would like to mention though that the cluster name, mcast_addr, port parameters are different in jboss-service.xml in web-cluster.sar and portal.sar. Do they need to use same cluster config?

49. Re: portal clustering problem when using optimistic locking

prabhat.jha Aug 6, 2008 2:59 PM (in response to prabhat.jha)

One interesting observation is that in 5-node scenario with 2500 users which means 500 users/node, response time is same as that of {1,2,3,4} clusters (~100ms). But when I increase the users from 2500 to 2800, response time goes ~6 seconds. I stop tests when it goes above 6 seconds. In {1,2,3,4} nodes configuration, response time increases gradually.

50. Re: portal clustering problem when using optimistic locking

galder.zamarreno Aug 6, 2008 3:23 PM (in response to prabhat.jha)

Prabhat, can you upload thread dumps somewhere taken from each of the 5 nodes when response time goes up to ~6 seconds?

51. Re: portal clustering problem when using optimistic locking

brian.stansberry Aug 6, 2008 5:44 PM (in response to prabhat.jha)

"prabhat.jha@jboss.com" wrote:
What I would like to mention though that the cluster name, mcast_addr, port parameters are different in jboss-service.xml in web-cluster.sar and portal.sar. Do they need to use same cluster config?

No, they have to be diffferent. The different caches use different JGroups channels and those settings keep those channels' traffic separate.

52. Re: portal clustering problem when using optimistic locking

prabhat.jha Aug 6, 2008 5:59 PM (in response to prabhat.jha)

Brian: That's what I thought too.

Galder: I will try to get thread dumps. There is a minor complication that AS instances are spawned and managed by Smartfrog which itself is a java process. I need to see how I can capture it to a file.

53. Re: portal clustering problem when using optimistic locking

brian.stansberry Aug 7, 2008 3:41 PM (in response to prabhat.jha)

There is a binary (which goes in server/all/lib) and a config file patch attached to JBCLUSTER-206. If you see if that has an impact, it would be great. I'd rather see the thread dumps than results with this new jar.

54. Re: portal clustering problem when using optimistic locking

prabhat.jha Aug 7, 2008 3:49 PM (in response to prabhat.jha)

As requested by Galder, I have attached thread dumps from all 5 servers at https://jira.jboss.org/jira/browse/JBPORTAL-1879. File name is aptly named 5-server-dumps.zip.

Not to point/mislead, I see IncomingPacketHandler which Brian had earlier pointed out holding on to lock in each dump.

55. Re: portal clustering problem when using optimistic locking

prabhat.jha Aug 7, 2008 3:51 PM (in response to prabhat.jha)

"bstansberry@jboss.com" wrote:
There is a binary (which goes in server/all/lib) and a config file patch attached to JBCLUSTER-206. If you see if that has an impact, it would be great. I'd rather see the thread dumps than results with this new jar.

I will give it a shot with this new jar. How about I give you both: result and thread dumps.

Should I stick with the current cache configuration which is PL+RC+Inv_Synch for hibernate and BR in jboss-web-cluster.sar?

56. Re: portal clustering problem when using optimistic locking

prabhat.jha Aug 7, 2008 4:16 PM (in response to prabhat.jha)

The reason I ask is because the problem started when using optimistic locking.

57. Re: portal clustering problem when using optimistic locking

brian.stansberry Aug 7, 2008 4:30 PM (in response to prabhat.jha)

Yes, same config.

58. Re: portal clustering problem when using optimistic locking

brian.stansberry Aug 7, 2008 5:15 PM (in response to prabhat.jha)

"prabhat.jha@jboss.com" wrote:
As requested by Galder, I have attached thread dumps from all 5 servers at https://jira.jboss.org/jira/browse/JBPORTAL-1879. File name is aptly named 5-server-dumps.zip.

Not to point/mislead, I see IncomingPacketHandler which Brian had earlier pointed out holding on to lock in each dump.

The IncomingPacketHandler is simply waiting on a queue for a Object.notify() call to wake it up to handle a message. In none of the thread dumps is it doing anything, which itself is informative -- there's no intra-cluster messages being handled by any of these servers in any of these thread dumps. That in and of itself is interesting. Makes me question whether whatever is going on here has anything to do w/ clustering.

Scanning through some of these I'm not seeing any unusual blocking of AJP connector threads in either JBC code or JGroups.

If you run these tests with a set of non-clustered portal servers, what kind of #s do you see?

59. Re: portal clustering problem when using optimistic locking

prabhat.jha Aug 7, 2008 5:50 PM (in response to prabhat.jha)

In none of the thread dumps is it doing anything, which itself is informative -- there's no intra-cluster messages being handled by any of these servers in any of these thread dumps. That in and of itself is interesting. Makes me question whether whatever is going on here has anything to do w/ clustering.

The only reason I started on clustering configuration modification is because after 3-nodes, portal cluster was not able to handle more requests Upto 3 node, I do not see any problem with scalability but with 4 and 5 I do.

Other potential bottleneck is database but given that 3-node cluster can handle around 4K users while with 5-node cluster, it starts crawling with 2.5K users, I am not suspecting database at this stage. I hope I am not overlooking something here.

Brian, what else do you suspect based on what you saw in thread dumps?

If you run these tests with a set of non-clustered portal servers, what kind of #s do you see?

I have not run tests with more than 2 server in a non-clustered environment. With 2-nodes, it could handle twice more load than that of 1-node which is also the case with 2-node cluster.

I would anyday prefer there is nothing wrong with jboss cache and hibernate integrations and it's something in Portal itself. ;-)