TitleClobbered UTF-8 in Wiki Body
Date13-Oct-2005 20:14:13 EEST
Version2.2.33
Submitter202.92.164.105
Bug criticalityMediumBug
Browser versionFirefox 1.5 beta 2, IE 6.0
Bug statusClosedBug
PageProvider used
Servlet ContainerResin 3.0.14
Operating SystemGentoo Linux 2.6.10
URLhttp://207.210.65.37/JSPWiki/Wiki.jsp?page=UTF8Test
Java version1.4.2_09

The UTF-8 characters in the wiki body are getting clobbered. The inputted UTF-8 characters on the test URL were originally in Chinese and Japanese. They show up as ?? in both preview and saved file.

If I change Edit.jsp line 36 from

String text    = wiki.safeGetParameter( request, EditorAreaTag.AREA_NAME );
to
String text    = request.getParameter( EditorAreaTag.AREA_NAME );
works around the saved file problem.

--

This may be caused by the following code in WikiEngine.java:

    public String safeGetParameter( ServletRequest request, String name )
    {
        try
        {
            String res = request.getParameter( name );
            if( res != null ) 
            {
                res = new String(res.getBytes("ISO-8859-1"),
                                 getContentEncoding() );
            }
            return res;
        }
        catch( UnsupportedEncodingException e )
        {
            log.fatal( "Unsupported encoding", e );
            return "";
        }
    }

As far as I can tell, if the submitted text is in any encoding other than ISO-8859-1, Java will replace all non-compatible characters in the text with "?" marks.

I guess this is not what was intended? It's certainly breaking uploading pages containing UTF-8 characters for me.

-- Chris Wilson

Well, as you can see from here (åäö), UTF-8 works nicely. I suspect that this is really Resin's fault, as we're running Tomcat. The reason why we're doing that complicated thing is that HTTP parameters area always parsed by the servlet container in Latin-1, even if they're really UTF-8. Therefore we need to take that complicated route to transform them back to UTF-8.

It seems that Resin does not have this "feature", and is happily assuming UTF-8 -submitted parameter fields, and therefore would not need this. However, if we remove that call to safeGetParameter(), then we'll end up with Tomcat and most other web servers failing... Do you know if there is any setting in Resin that allows you to go back to the default behaviour? I believe Tomcat behaviour is mandated by the servlet spec...

-- JanneJalkanen

I did some research into the issue, and this is what I found. The source of the problem is HTTP request does not have a field to indicate the encoding of POST content. Tomcat and Resin take different approach on handling the encoding of the form. Tomcat unconditionally uses the ISO-8859-1 encoding unless overridden, whereas Resin uses these settings. I could not find any citation on servlets must use ISO-8859-1 encoding.

Which approach is correct? In my opinion, Tomcat is predicably wrong, while Resin makes a guess based on limited information.

We can eliminate the uncertainty with the ServletRequest.setCharacterEncoding() method introduced in J2EE1.4. If we invoke this method, we can avoid the encoding gymnastics that is currently in WikiEngine.safeGetParameter(). The downside is this will force all installations to use JDK 1.4 or later. Will this cause any issue to the users that are on JDK 1.3?

I modified my Edit.jsp and Preview.jsp as follows and tested it to work with Resin. I have setup an alternate site at http://207.210.65.37/JSPWikiUTF8/Wiki.jsp?page=UTF8Test. I do not have access to a Tomcat server but I think it should work just as well. Can someone give it a try?

*** /usr/tmp/JSPWiki/Edit.jsp   2005-10-12 13:15:41.000000000 -0400
--- Edit.jsp    2005-10-30 10:36:49.000000000 -0500
***************
*** 25,38 ****


  <%
      String action  = request.getParameter("action");
      String ok      = request.getParameter("ok");
      String preview = request.getParameter("preview");
      String cancel  = request.getParameter("cancel");
      String append  = request.getParameter("append");
      String edit    = request.getParameter("edit");
!     String author  = wiki.safeGetParameter( request, "author" );
!     String text    = wiki.safeGetParameter( request, EditorAreaTag.AREA_NAME );

      //
      //  Create context and continue
--- 25,39 ----


  <%
+     request.setCharacterEncoding(wiki.getContentEncoding());
      String action  = request.getParameter("action");
      String ok      = request.getParameter("ok");
      String preview = request.getParameter("preview");
      String cancel  = request.getParameter("cancel");
      String append  = request.getParameter("append");
      String edit    = request.getParameter("edit");
!     String author  = request.getParameter("author" );
!     String text    = request.getParameter(EditorAreaTag.AREA_NAME );

      //
      //  Create context and continue
***************
*** 46,52 ****
      //
      //  WYSIWYG editor sends us its greetings
      //
!     String htmlText = wiki.safeGetParameter( request, "htmlPageText" );
      if( htmlText != null && cancel == null )
      {
          text = new HtmlStringToWikiTranslator().translate(htmlText,wikiContext);
--- 47,53 ----
      //
      //  WYSIWYG editor sends us its greetings
      //
!     String htmlText = request.getParameter( "htmlPageText" );
      if( htmlText != null && cancel == null )
      {
          text = new HtmlStringToWikiTranslator().translate(htmlText,wikiContext);

*** /usr/tmp/JSPWiki/Preview.jsp        2005-10-12 13:15:41.000000000 -0400
--- Preview.jsp 2005-10-30 10:49:54.000000000 -0500
***************
*** 13,18 ****
--- 13,19 ----
      WikiEngine wiki;
  %>
  <%
+     request.setCharacterEncoding(wiki.getContentEncoding());
      WikiContext wikiContext = wiki.createContext( request, WikiContext.PREVIEW );
      String pagereq = wikiContext.getPage().getName();

***************
*** 25,31 ****
      response.setContentType("text/html; charset="+wiki.getContentEncoding() );

      pageContext.setAttribute( "usertext",
!                               wiki.safeGetParameter( request, "text" ),
                                PageContext.REQUEST_SCOPE );

      long lastchange = 0;
--- 26,32 ----
      response.setContentType("text/html; charset="+wiki.getContentEncoding() );

      pageContext.setAttribute( "usertext",
!                               request.getParameter("text" ),
                                PageContext.REQUEST_SCOPE );

      long lastchange = 0;

-- msb0b

JDK 1.3 is not a problem; JSPWiki 2.2 requires JDK 1.4 anyway. Thanks, I'll look into it.

-- JanneJalkanen

(later) Yup. Seems to work. It was old code lying around; the fix coming in the nextish version. I'll keep this open until it's released.

-- JanneJalkanen

Fixed in 2.3.37.

--

I have removed WikiEngine.safeGetParameter() and WikiEngine.safeGetQueryString() and replaced references to these functions with request.setCharacterEncoding(), request.getParameter() and request.getQueryString() on my local copy. I have been running this for about a week, and I have noticed no ill effects so far.

-- msb0b

It would probably be easier to just recode WikiEngine.safeGetParameter() to just return request.getParameter() and recompile ;-)

-- JanneJalkanen

ok, found this page a bit late. But if it's true that HTTP is bound to LATIN1 safeGetParameter is ok. You do not need to use setCharacterEncoding - and for my experience I have not understood the function of setCharacterEncoding. Every proof of understanding failed. safeGetParameter should enter DefaultURLConstructor as well. See UTF-8 issues

-- rsc

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-41) was last changed on 26-Sep-2007 23:02 by JanneJalkanen