[jdom-interest] Parsing extended characters: jdom1.8/tomcat 4.0.6/axis1.0/windows98

Helen.Watchorn at transwareplc.com Helen.Watchorn at transwareplc.com
Mon Mar 3 02:11:45 PST 2003

Hi -

I have written a web service that allows the user to enter in keywords, in
order to search through XML metadata files, and if found to return a
learning object. The user may enter keywords that contain extended
characters, ie é or í. The XML metadata files don't declare their encoding,
so by default they are UTF-8. I run the web service using Forte v.4.

I URL.encode all keywords to UTF-8 and pass these keywords to my servlet.
When I run my servlet I get NULL when I add the line
request.getCharacterEncoding() therefore I added the line,
request.setEncoding("UTF-8") at the start of my processRequest() method.
But then the servlet file wouldn't compile in Forte or at the command line.
[Q.1 Why is this?] I found a workaround where you convert the incoming
String to a byteArray and then re-write that to a String with UTF-8
encoding, thus solving the compile problem.

[Q.2] My real problem is that the UTF-8 encoded keyword never matches any
text in the XML metadata file when the keyword contains an extended
character, [the XML file definitely contains the phrase in question]. The
character doesn't display itself correctly in the DOS Command box, but from
my reading that's because of the underlying code page that I'm using (which
is cp850) so I'm not unduly concerned about the display issue.

I don't know where else to fix or change encoding setup so that extended
characters  are matched - is it Tomcat? JDOM? Axis? Windows98?

Any ideas or insights, greatly appreciated.

More information about the jdom-interest mailing list