|Title|Missing extended characters in pagename and when searching
|Date|15-Dec-2005 09:14:14 EET
|Version|2.2.33
|Submitter|62.20.12.36
|pageprovider|JDBCPageProvider
|criticality|MediumBug
|container|Tomcat 5.0.28
|os|Mac, wXP, w2003
|browser_version|IE and Firefoox
|url|www.javalia.se/jspwiki/Wiki.jsp (se lågkaloridiet)
|x|Submit report
|java_version|java 1.4.2 (mac)

JspWiki cleans pagenames and searchterms from non-ascii letters. 

So the pagename "aåäöa" will show up as "aa" as pagename.

When looking at some url's I see the letters in question URLEncoded (%C3%B5 instead of ö).

Best regards
Roland

----

Well, they *should* be URLEncoded.  They work quite nicely on this site, and on other sites as well.  Perhaps the JDBCPageProvider is broken?

-- JanneJalkanen

Using JSPWiki 2.2.28 and JDBCPageProvider AND UTF-8 (the recommended setting for JSPWiki) 
non-ascii letters work fine in our installation. They show 
normally when referencing a page and in the page header, and are normally (unencoded) stored in the database.
However, as Janne says, the url must be encoded. IE6 does allow to manually enter the path non-encoded, 
encoding it behind the scenes - convenient but JSPWiki should not do this.

So, there is definitely no problem with the JDBC driver itself, but we use SQL Server with ntext/nvarchar
fields - using some other non-unicode enabled database would be expected to cause the problems you describe. 
Can you manually add a page with such names to the database?

-- Gregor

I tried to manually write the URLEncoded characters into the url and the Editpage cleans them away so I don't think it touched the database? I'm going to try to test to put an utf-8 filter before the wiki.

-- Roland (the submitter of this bug)


Hmm...  Have you tried setting useBodyEncodingForURI=true in your server.xml config file?  Like this:

{{{
    <Connector port="8009" className="org.apache.coyote.tomcat4.CoyoteConnector"
               enableLookups="false" redirectPort="8443" protocol="AJP/1.3"
               connectionTimeOut="120000" acceptCount="200" maxThreads="250"
               minSpareThreads="25" maxSpareThreads="75" useBodyEncodingForURI="true">
}}}

The above is for the AJP connector, but if your main webserver is Tomcat, you want to do it for the Http connector as well.

-- JanneJalkanen

useBodyEncodingForURI="true" solved the problem. I can't find the documentation on the tomcat-hompage about this since the Connector-page doesn't exist for my version of Tomcat. So to ask a question in the wrong forum. Should I expect changed behavior in my other webapps of this configuration?

-- Roland

It is a new "feature" in Tomcat 5.0.12 and above.  I have no idea why they changed this; and especially changed it from the default.  I don't think it should impact your other applications.

-- JanneJalkanen
 

I've discoverd a simular problem with the attatchments. They are linked correctly and have åäö in the database but I get a 404-error where it claims not to find the page (without åäö). Is there anything more that I have to think about to get  extended charset to work?

-- Roland