1 2 Previous Next 15 Replies Latest reply on Feb 19, 2008 12:23 PM by brian.stansberry

jndi lookup during restart

ben.wang Aug 14, 2007 2:24 AM

It seems this is the best forum place to discuss the naming issue. Here is the original forum thread that leads to this:
http://www.jboss.com/index.html?module=bb&op=viewtopic&t=114200
and there are couple of user forums that reported this issue as well. E.g.,
http://www.jboss.com/index.html?module=bb&op=viewtopic&t=96087
and
http://www.jboss.com/index.html?module=bb&op=viewtopic&t=65304

Basically, in NamingContext, we use a WeakReferenceMap to cache the jndi server stub information.on the client side. So during a server restart, unless the reference has been gc-ed (e.g., if -Dsun.rmi.dgc.client.gcInterval is defaulted to 60 seconds and server restart takes longer than that time), another client request (e.g., ic.lookup(name)) will result in an exception: java.rmi.NoSuchObjectException: no such object in table

Here is a simple code snippet to reproduce it:

 while (isTrue)
 {
 try
 {
 Context naming = new InitialContext ();
 home = (TestSLSBHome)naming.lookup ("TestSLSB");

 gd = home.create ();
 boolean succeed = false;
 succeed = gd.isSuccessful();
 System.out.println("The call is successful: " + succeed);

 try
 {
 // Manual pause here
 System.out.println("Please hit enter to continue");
 System.in.read();
 }
 catch (Exception ignored) {
 }

 }
 catch(Exception e)
 {
 System.out.println("**** Exception happened ....");
 }

So this is obviously a flaw since we can't depend on the dgc to recycle the server stub.

A proper way that I propose is to specifically catch the NoSuchObjectException in the lookup() code. When we encounter such an exception, we would flush out the server from the cache and do a fresh lookup again. It should be a simple fix that won't impact the code base.

Please note that this fix won't cover the case when the home stub is cached by the application (that is, no jndi lookup every time). It'd still generate the error but in this case, we can use the RetryInterceptor to catch the error and retry.

This would already be the case for the clustering call (i.e., SingleRetryInterceptor).

Any suggestion?

1. Re: jndi lookup during restart

brian.stansberry Aug 14, 2007 5:01 PM (in response to ben.wang)
Checked in the unit test I discussed on http://www.jboss.com/index.html?module=bb&op=viewtopic&t=114200 . It's org.jboss.test.naming.test.NamingRestartUnitTestCase.

(I was able to track down the thing I was concerned about with deploying a 2nd NamingService and made sure the test didn't cause a problem.)

The problem with HA-JNDI is as you described, Ben -- when HA-JNDI is stopped, the server-side Remote is unexported. The client side NamingContext has a cached refer to the stub to that Remote; invoking on that fails.

I was able to determine why the test doesn't fail with regular JNDI:

1) For regular JNDI, the server-side Remote object is an instance of NamingServer. That instance is cached in a static field NamingContext.localServer. Thus the server-side object actually survives a restart of the NamingService.

2) NamingService doesn't actually unexport that NamingServer as part of it's stop() processing. From org.jnp.server.Main.stop():

if( isStubExported == true ) UnicastRemoteObject.unexportObject(theServer.getNamingInstance(), false);

Field "isStubExported" is never set to "true", so the unexportObject call never happens.

Effect is a remote NamingContext still has a valid stub after a restart of the NamingService. If the test actually restarted the server rather than just bouncing a NamingService, it wouldn't work. This is what happens with Ben's manual test above. When I modified the test NamingService so it no longer used the static NamingServer in 1) above, the test fails.

Bottom line with all this is we understand the failure mode.
Actions
2. Re: jndi lookup during restart

brian.stansberry Aug 14, 2007 5:10 PM (in response to ben.wang)

As for the proposed fix, flushing the cache after a failure seems the correct thing to do, although as Adrian said on the other thread it would require significant testing.

Note that this wouldn't just be for lookup() -- same logic would need to be in all the naming calls.
Actions
3. Re: jndi lookup during restart

ben.wang Aug 15, 2007 5:09 AM (in response to ben.wang)

1. Ok, that explains it why you weren't seeing any error on the non-ha jndi restart.

2. Yes, since Naming is used everywhere, more extensive testing would be needed. But since this fix would only add exception catch clause, so normal lookup should work just as is.

3. As for other logic, I was proposing only lookup call since there is current logic for retry already (becuase of server overload, etc.). And I don't see the retry logic in other naming calls (e.g., bind/unbind). In those calls, we'd throw a CommunicationException directly.

What do people think? I am tempted to fix only the lookup that should be majority of use cases to minimize the code impact.

-Ben
Actions
4. Re: jndi lookup during restart

brian.stansberry Aug 19, 2007 10:50 PM (in response to ben.wang)

JIRA for this is http://jira.jboss.com/jira/browse/JBAS-4615 . This is basically implemented and tested in branch JBoss_4_0_5_GA_JBAS-4574; needs to be ported to the regular branches.

Just a note for the record: the fix we're implementing is based on catching java.rmi.NoSuchObjectException. If the naming service uses Remoting or even the pooled or http invokers, that exception won't be thrown. However, usually if those invokers are used the client-side proxy will still be valid after a restart (unless the server address or port has changed.) Fixing the edge case where the address or port has changed would involve catching and retrying after a large variety of exceptions, which is too big a behavior change.
Actions
5. Re: jndi lookup during restart

brian.stansberry Aug 20, 2007 3:54 PM (in response to ben.wang)

I went ahead and fixed this for all the Context operations, not just lookup(). If we're going to fix this we might as well really fix it. It was simple enough to encapsulate the error handling in a method and then apply it via a simple try/catch around each remote call. The class javadoc for NoSuchObjectException states that:

If a NoSuchObjectException occurs attempting to invoke a method on a remote object, the call may be retransmitted and still preserve RMI's "at most once" call semantics.

Based on this, it's OK to apply this to non-read methods like bind().
Actions

6. Re: jndi lookup during restart

brian.stansberry Aug 22, 2007 12:25 PM (in response to ben.wang)

"Jimmy Wilson" wrote:

Just so there's no confusion, I built the branch Ben and you created, and used that AS (I didn't try to copy fixed JARs etc). I also copied the client/jbossall-client.jar to the client (it is the only JAR the client uses besides log4j.jar).

So, I performed this test through the two scenarios listed below:

* Start 2 servers
* Start client that uses same proxy over and over
* Make a few requests
* Kill both servers
* Restart the second server
* Make more requests

Case #1: java.naming.provider.url specified as list

This now works as expected.

Case #2: java.naming.provider.url not specified (use discovery)

I could not always get this to work seamlessly.

Log info Jimmy provided:

2007-08-21 23:36:24,956 TRACE [org.jboss.proxy.ejb.RetryInterceptor] Begin reestablishInvokerProxy
2007-08-21 23:36:24,956 TRACE [org.jboss.proxy.ejb.RetryInterceptor] Using retry properties from NamingContextFactory
2007-08-21 23:36:25,057 TRACE [org.jboss.proxy.ejb.RetryInterceptor] Looking for invoker: Hello-RemoteInvoker
2007-08-21 23:36:25,060 TRACE [org.jboss.ha.framework.interfaces.HARMIClient] Invoking on target=HARMIServerImpl_Stub[UnicastRef2 [liveRef: [endpoint:[lo2:1101](remote),objID:[253c198f:1148bd97a0d:-8000, 2]]]]
2007-08-21 23:36:25,064 TRACE [org.jboss.ha.framework.interfaces.HARMIClient] Invoke failed, target=HARMIServerImpl_Stub[UnicastRef2 [liveRef: [endpoint:[lo2:1101](remote),objID:[253c198f:1148bd97a0d:-8000, 2]]]]
java.rmi.NoSuchObjectException: no such object in table
 at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:247)
 at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:223)
 at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:126)
 at org.jboss.ha.framework.server.HARMIServerImpl_Stub.invoke(Unknown Source)
 at org.jboss.ha.framework.interfaces.HARMIClient.invokeRemote(HARMIClient.java:172)
 at org.jboss.ha.framework.interfaces.HARMIClient.invoke(HARMIClient.java:267)
 at $Proxy0.lookup(Unknown Source)
 at org.jnp.interfaces.NamingContext.lookup(NamingContext.java:664)
 at org.jnp.interfaces.NamingContext.lookup(NamingContext.java:624)
 at javax.naming.InitialContext.lookup(InitialContext.java:351)
 at org.jboss.proxy.ejb.RetryInterceptor.reestablishInvokerProxy(RetryInterceptor.java:247)
 at org.jboss.proxy.ejb.RetryInterceptor.invoke(RetryInterceptor.java:185)
 at org.jboss.proxy.TransactionInterceptor.invoke(TransactionInterceptor.java:61)
 at org.jboss.proxy.SecurityInterceptor.invoke(SecurityInterceptor.java:70)
 at org.jboss.proxy.ejb.StatelessSessionInterceptor.invoke(StatelessSessionInterceptor.java:112)
 at org.jboss.proxy.ClientContainer.invoke(ClientContainer.java:100)
 at $Proxy2.sayHello(Unknown Source)
 at example.StdInClient.main(Unknown Source)
2007-08-21 23:36:25,066 TRACE [org.jboss.proxy.ejb.RetryInterceptor] Retry attempt 1: Failed to lookup proxy
javax.naming.CommunicationException [Root exception is java.rmi.RemoteException: Service unavailable.]
 at org.jnp.interfaces.NamingContext.lookup(NamingContext.java:777)
 at org.jnp.interfaces.NamingContext.lookup(NamingContext.java:624)
 at javax.naming.InitialContext.lookup(InitialContext.java:351)
 at org.jboss.proxy.ejb.RetryInterceptor.reestablishInvokerProxy(RetryInterceptor.java:247)
 at org.jboss.proxy.ejb.RetryInterceptor.invoke(RetryInterceptor.java:185)
 at org.jboss.proxy.TransactionInterceptor.invoke(TransactionInterceptor.java:61)
 at org.jboss.proxy.SecurityInterceptor.invoke(SecurityInterceptor.java:70)
 at org.jboss.proxy.ejb.StatelessSessionInterceptor.invoke(StatelessSessionInterceptor.java:112)
 at org.jboss.proxy.ClientContainer.invoke(ClientContainer.java:100)
 at $Proxy2.sayHello(Unknown Source)
 at example.StdInClient.main(Unknown Source)
Caused by: java.rmi.RemoteException: Service unavailable.
 at org.jboss.ha.framework.interfaces.HARMIClient.invokeRemote(HARMIClient.java:213)
 at org.jboss.ha.framework.interfaces.HARMIClient.invoke(HARMIClient.java:267)
 at $Proxy0.lookup(Unknown Source)
 at org.jnp.interfaces.NamingContext.lookup(NamingContext.java:672)
 ... 10 more

The final line of the stack trace shows the failure occured in the retry after flushing the cache. I'm looking into this, but I'm pretty sure this issue revolves around the fact that NamingContext.removeServer(Hashtable serverEnv) only removes an entry if java.naming.provider.url is specified.

7. Re: jndi lookup during restart

brian.stansberry Aug 22, 2007 3:33 PM (in response to ben.wang)

JIRA for above is http://jira.jboss.com/jira/browse/JBAS-4622
Actions
8. Re: jndi lookup during restart

brian.stansberry Aug 23, 2007 3:16 PM (in response to ben.wang)

This should be fixed in the JBoss_4_0_5_GA_JBAS-4574 branch and in trunk. I'll close the JIRA once I get a thumbs up from Jimmy Wilson on his test case.
Actions
9. Re: jndi lookup during restart

jcreynol Feb 13, 2008 1:08 PM (in response to ben.wang)

I'm seeing this problem JBOSS EAP 4.2CR1. Did this fix get propagated to that code base? We have a Swing Client using JBoss remoting with the rmi transport and unified invoker. We have code to catch potential Server problems and then re-authenticate the user, but subsequent lookups, when the server comes back online, displays:

Caused by: java.rmi.NoSuchObjectException: no such object in table
at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:247)
at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:223)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:126)
at org.jboss.remoting.transport.rmi.RMIServerInvoker_Stub.transport(Unknown Source)
at org.jboss.remoting.transport.rmi.RMIClientInvoker.transport(RMIClientInvoker.java:238)
... 14 more
Actions
10. Re: jndi lookup during restart

brian.stansberry Feb 16, 2008 1:25 PM (in response to ben.wang)

EAP 4.2CR1? Do you mean EAP 4.2.0.GA_CP_01 (the first cumulative patch release after 4.2.0.GA?
Actions
11. Re: jndi lookup during restart

jcreynol Feb 16, 2008 4:58 PM (in response to ben.wang)

The ZIP was provided to us by a JBoss Consultant back in October. It says jboss-EAP-4.2.CR1.zip. When I open the MANIFEST.MF for jboss.jar, it does say this is Spec Version "4.2.0.GA_CP01".

The readme.html says "JBoss AS 4.2.0.GA Release Notes" and "Detailed Release Notes
Includes versions: JBossAS-4.2.0.CR1, JBossAS-4.2.0.CR2, JBossAS-4.2.0.GA".

So what I meant was what I wrote, EAP 4.2CR1. :)

However, as I try to get a handle on all the various numbering techniques, the MANIFEST.MF is probably the most meaningful for those of you that are digging into the code and need an actual branch/tag to reference. As far as I can tell, 4.2.0.GA_CP01 *is* 4.2.0.CR1, but apparently 4.2.0.GA_CP01 is the more specific way to reference the build we're using...

So, any ideas if this particular error discussed in this thread is indeed also occuring in this build, within the Remoting code?

I've posted the topic under Remoting:
http://www.jboss.com/index.html?module=bb&op=viewtopic&t=129993
And opened an issue, here:
http://jira.jboss.com/jira/browse/JBREM-906

Thanks, much!!

John
Actions
12. Re: jndi lookup during restart

brian.stansberry Feb 16, 2008 5:32 PM (in response to ben.wang)

OK, sounds like you probably have 4.2.0.GA_CP01; don't know why the jar was named "CR1". CR sounds for "Candidate for Release", i.e. a release soon before the GA; that's not something one should use after the GA comes out. CP stands for "Cumulative Patch"; that's a bug-fix release made after GA and provided to subscription customers.

CP releases don't include fixes for every issue found after a GA comes out; we find many customers prefer that changes in CPs are minimal and that fixes are limited to those requested by customers or critical issues like security patches. Based on that, the JBAS-4622 hasn't been ported to the EAP 4.2 CP branch yet. If you're a customer and want to see that fixed in the next EAP 4.2 CP release, I suggest you raise an issue on the Customer Support Portal.

All that said, you're actually interested in JBREM-906 anyway. :-) I don't know enough about the remoting code to know if the same basic problem is there, but from looking at the stack trace, it seems pretty likely. Basically, any RMI-based transport is vulnerable to the problem of a client holding onto an RMI stub that no longer matches the server. Do you need to use an RMI connector?
Actions
13. Re: jndi lookup during restart

jcreynol Feb 19, 2008 11:35 AM (in response to ben.wang)

>> Basically, any RMI-based transport is vulnerable to the problem of a
>> client holding onto an RMI stub that no longer matches the server.
>> Do you need to use an RMI connector?

The short answer to this is no -- in fact, in production, per the suggestions of a JBoss consultant that we'd see better performance with Socket transport, we are using Socket transport.

The reason we're investigating a switch to RMI is that the RMI transport "properly" handles aborted transactions, in that it immediately throws an exception back to the client that the transaction was rolled back. Socket Transport will not catch that the transaction is rolled back, so the client doesn't get notified -- until the thread completes... assuming it does... and even then, the client only receives a notice that the long running thread's transaction is dead so, even though we made you wait for the long running thread to complete, we're only going to give you an InvalidStateException because the wrapping transaction has timed out.

That said, Socket Transport handles a server bounce quite elegantly. :)

So long answer is that we're trying to find a transport that meets all our needs -- which are pretty simple -- notify client immediately when Transaction times out (some type of Exception is fine TransactionRolledBack or whatever) AND handle a server restart.

Any advice on what the best approach is to get one of these transports fixed to meet our needs is very much appreciated.

Thanks, much!

John
Actions
14. Re: jndi lookup during restart

jcreynol Feb 19, 2008 11:36 AM (in response to ben.wang)

Forgot to mention that there is an ongoing thread regarding the Socket issue: http://www.jboss.com/index.html?module=bb&op=viewtopic&t=125962&start=10
Actions

1 2 Previous Next

Go to original post