1 2 Previous Next 21 Replies Latest reply on May 17, 2010 2:44 AM by vitaliylu Go to original post
      • 15. Re: Problem in retrieving WSDL from remote endpoint
        dward

        So, option 1? Essentially:

        1. Don't use ICU4-J.
        2. By default, assume UTF-8 as we do now.
        3. If the user specifies a different wsdl-encoding, then and only then do we read it in as such, and output our new WSDL ('cause we have to transform it, remember) using the user's specified encoding (for Content-Type response header)?

        Also, what do we do with all the hardcodings of passing in "UTF-8" into readStreamString?

        • 16. Re: Problem in retrieving WSDL from remote endpoint
          mageshbk

          > "all WSDL must adhere to Basic Profile so we are rejecting this jira."

           

          Yes we should, unless we want to alienate ourselves to make Interoperability (WS-I) meaningless!

           

          > don't know about this, or don't want to adhere to  it, so we should just handle the possibility of non-UTF-8 or UTF-16  encoded documents just in case?

           

          The success of JAVA was with its specification adherence. We can always support a non-confirming/not-a xml too as a WSDL and parse it, but what purpose does it solve? Will it work co-existing with other WS clients/providers?

          • 17. Re: Problem in retrieving WSDL from remote endpoint
            kconner

            No, we shouldn't reject this issue.

             

            There are two aspects to this

            - us being able to consume an external WSDL which we do not control

            - us being able to produce a WSDL for external use

             

            The part which this issue refers to is the first, the part where BP is important (to us) is the second.

             

            We need to be able to read external WSDLs in different character sets, irrespective of the BP, as these are not under our control and may not be under the control of whoever is using the ESB.

             

            As far as working out the charset is concerned, I do not believe we can rely on that library (as appealing as it may be).  It cannot guarantee to get the right result, just to make a 'best guess'.  In essence we end up in the same position as we are now, after all we are 'guessing' at UTF-8, just with an algorithm providing the guess.

             

            In my opinion we should be trying to determine the character set based by trusting whatever is specified on the input so, if it is something like http then we should be able to trust the content type.  If this cannot be trusted, or if it does not exist (such as local wsdl), then we should assume UTF-8 *unless* the developer overrides the type.

             

            Once we have it in a string representation then we are free to output this in UTF-8.

             

            Kev

            • 18. Re: Problem in retrieving WSDL from remote endpoint
              kconner

              Of course, another source of the character set information could be the WSDL itself, so even local files etc. may have enough information.  Any override should still be possible however.

               

              Kev

              • 19. Re: Problem in retrieving WSDL from remote endpoint
                dward

                JBESB-3279 is closed.

                 

                Kev and I continued this discussion over IM, and decided on the following, which I implemented in the fix:

                1. Even though WS Basic Profile has rules about Acceptable WSDL Character Encodings, we will do our best to handle those situations - where we can - so that our code will output compliant WSDL.
                2. WSDL can be read in using internal://, classpath://, file://, http:// and https:// protocols.  In the case of internal://, which is provided by JBossWS, it is already UTF-8.  In the case of http:// or https://, we will try to look for the charset specified by the HTTP response header "Content-Type".  If that is specified, we will try to convert from that specified encoding to UTF-8.
                3. A new SOAPProxy action property has been added: wsdlCharset (the camel case property name was chosen so it is consistent with the wsdlTransform property).  If the developer specifies the wsdlCharset property, the WSDL will be read in using that encoding, and we will try to convert from that specified encoding to UTF-8.  The presence of that property takes priority over the Content-Type header in the case of http:// or https://.
                4. The contract JSP (in the case of using JBR/HTTP gateway) and the HttpGatewayServlet both now always output UTF-8.
                5. The Programmer's Guid has been updated to include a description about the new wsdlCharset property.

                 

                Please refer to the jira item for a test case as well as a screenshot of some Russian characters in Unicode.

                 

                Vitaliy, with this fix, all you should have to do is add this property to your SOAPProxy action:

                 

                <property name="wsdlCharset" value="Cp1251"/>
                

                 

                I have tested the fix on both AS4+Java5 and AS5+Java6.  I also ran a clean integration build.

                • 20. Re: Problem in retrieving WSDL from remote endpoint
                  dward

                  Oh, and I did not have to use the ICU4J library at all, so no new jar dependencies added. 

                  • 21. Re: Problem in retrieving WSDL from remote endpoint
                    vitaliylu

                    Thanks to all!!!

                    1 2 Previous Next