4 Replies Latest reply: May 16, 2012 8:44 AM by tulasidhar reddy RSS

Encoding problem... Not able to store German characters

Rajani Kanth Anupoju Newbie


Hi,

I am using JBoss as AS, oracle as DB and using the struts framework. When I am saving the form information using the "GET" method the data is stored properly in the DB but when I am using the "POST" method the german characters are getting corrupted. When I tried to debug I found that actionform itself has the corrupted characters.

I have set the URIencoding = UTF-8 in the server.xml file. Still the problem persist.

I think it might not be a bug in Jboss, it is somewhat a configuration issue only. Please let me the configuration changes that I need to store the data in the correct format.

The same application is working fine in the WAS environment. We are migrating the application from WAS to Jboss.

Is this a known issue in JBoss or am I doing something wrong. Please suggest me resolving this issue.

Thanks in advance for your valuable suggestions.

Rajani Kanth

  • 1. Re: Encoding problem... Not able to store German characters
    tsangcn Newbie

    This is not a bug in JBoss. This is related to the way of handling charcters encoding in Servlet and JSP. Your application works in WAS, because WAS has implemented its own way of handling characters in HTML request.

    Problem 1: When a form is submitted, the browser will encode the characters in the character encoding set in the browser at that moment. The default encoding is the encoding at which the page is displayed but this can be changed by the user easily.

    Problem 2: When the Servlet container receives the request, it always pass the request parameters to your program decoded in ISO-8859-1 encoding. (e.g. Browser encoded in UTF-8 but container decoded in ISO-8859-1.) Thus your servlet or JSP will always receive garbage for characters other than ISO-8859-1 encoding.

    To solve problem 1, you have to add the accept-encoding option in the html form tag

    <form action="abc.do" method="post" accept-encoding="UTF-8">


    or if you are using Struts, use the accpet-charset

    <html:form action="abc" accept-charset="UTF-8">


    The accept-encoding option will tell the browser always encode the character in the specified encoding and ignore what the user has set.

    For problem 2, even when the form has accept-encoding, when you issue request.getParameter(), you still get the character decoded in ISO-8859-1. To get the correct encoding, you should do as follows when you issue the request.getParameter()

    String value = request.getParameter("mytext");
    try{
     value = new String(value.getBytes("8859_1"), "UTF-8");
    }catch(java.io.UnsupportedEncodingException e){
     System.err.println(e);
    }


    Although this decode the character using the correct encoding, the problem arises when you upload a file with filename not in ISO-8859-1 encoding. Because you cannot decode the filename correctly using the above method.

    So the best method at this moment is to use a filter to do the decoding. But filter is not available before Servlet 2.3.

    Add the following in web.xml

    <!-- This filter should be placed at beginning of web.xml so that the request is not accessed yet -->
    <filter>
     <display-name>set character encoding</display-name>
     <filter-name>setCharacterEncodingFilter</filter-name>
     <filter-class>your.package.web.filter.SetCharacterEncodingFilter</filter-class>
     <init-param>
     <param-name>encoding</param-name>
     <param-value>UTF-8</param-value>
     </init-param>
    </filter>
    
    <filter-mapping>
     <filter-name>setCharacterEncodingFilter</filter-name>
     <url-pattern>/*</url-pattern>
    </filter-mapping>


    And create the filter as follows

    package your.package.web.filter;
    
    import java.io.IOException;
    import javax.servlet.*;
    import javax.servlet.http.*;
    
    /**
     * Set the character encoding of the request to as set in "encoding"
     * this should be done before the request is accessed.
     */
    public class SetCharacterEncodingFilter implements Filter {
    
     private FilterConfig filterConfig;
     private String encoding;
    
     public void init(FilterConfig filterConfig) {
     this.filterConfig = filterConfig;
     this.encoding = filterConfig.getInitParameter("encoding");
     }
     public void doFilter(ServletRequest request,
     ServletResponse response,
     FilterChain chain) throws IOException, ServletException {
     request.setCharacterEncoding(this.encoding);
     chain.doFilter(request, response);
     }
     public void destroy() {
     this.filterConfig = null;
     }
    }


    References:
    http://java.sun.com/products/servlet/Filters.html
    http://java.sun.com/j2ee/1.4/docs/tutorial/doc/Servlets8.html#wp64572

    Thanks
    C. N.

  • 2. Re: Encoding problem... Not able to store German characters
    Rajani Kanth Anupoju Newbie

    Thanks for your suggestions/comments.

    I have already implemented this and it is working fine. I just wanted to know the performance impact of the filter.

    One more clarification:
    When I tried to DEBUG the code REQUEST object doesn't have any encoding set. "request.getCharacterEncoding()" is returning null value. But where as REQUEST object of WAS has the encoding the value set to "UTF-8". So what I am feeling is there might be a property which might can set the REQUEST object encoding. The reason behind this thought is "Even WAS uses the apache as a webserver internally and what is the equivalent property that is setting the request objects encoding state to "UTF-8".

    When WAS is providing such an option why not JBOSS is providing?

    Your suggestions are always appreciated.

  • 3. Re: Encoding problem... Not able to store German characters
    Cesar Romero Newbie

    I'm having the same problem with my application (I use latin characters, ISO8859-15). But the weird thing is that my windows installation seems to have no problem. The same version of JBoss on both machines, but my test server is a RedHat AS. In linux, JBoss seems to be re-encoding the UTF-8 string passed by the client again to UTF-8. Is there a way to prevent this without using the filter? I installed a tomcat 5.5 on that server and that installation seems to have no problem at all.