[jdom-interest] What does the encoding really mean?

Jason Hunter jhunter at acm.org
Tue Nov 27 16:16:22 PST 2001


This is the problem with XML putting the encoding information within the
text format itself.  If you change the encoding of the string
representation, you should change the encoding in the decl.  It's not
pretty, but that's how XML was designed.  :-)

-jh-

Fred Clewis wrote:
> 
> I'm using JDOM beta7 and xerces 2 beta3.  I have a question about the XML
> decl encoding attribute and when it should be altered.
> 
> Suppose you have a UTF-8 (with multibyte encodings) XML file and parse it
> in to build a document and then output it to a unicode string in Java that
> perhaps you use MQSeries to send somewhere.   In the MQSeries transport it
> is described as CCSID 1200, unicode, and it is stored as twobyte unicode.
> The xml data still says encoding="UTF-8".  Well, at that moment in memory,
> that is untrue.   Is that OK?  Does the original encoding from file,
> "UTF-8", need to be preserved like this for some subsequent purpose?   Does
> it need to be changed to "UCS-2"?
> 
> thanks for any ideas,
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com



More information about the jdom-interest mailing list