[jdom-interest] Full Unicode support considered necessary

Elliotte Rusty Harold elharo at metalab.unc.edu
Wed May 23 05:54:48 PDT 2001


There's been some question as to how important full Unicode 
compatibility is in practice. In particular does anybody actually 
need the new characters define in Uncode 3.1 and beyond. I've done a 
little research on that, and I think the answer is clearly yes. 
Here's what you lose by not supporting characters past 65,535:

1. Mathematical symbols (my personal interest). Needed for MathML. 
This set is going to get even bigger in Unicode 3.2.

2. Musical notation as used in sheet music; e.g. quarter notes and 
eighth notes and G-clefs and so forth (my wife's personal interest). 
Needed for MusicML and MusicXML.

3. Old italic scripts used for Etruscan and other scripts of the 
Italian peninsula. Unlike Latin, these really are dead languages, 
Nonetheless they're of significant interest to an active scholarly 
community.

4. Deseret: a phonemic alphabet devised to write the English 
language. It was originally developed in the 1850s at the University 
of Deseret, now the University of Utah. It was promoted by The Church 
of Jesus Christ of Latter-day Saints, also known as the "Mormon" or 
LDS Church, under Church President
Brigham Young (1801-1877).

5. About 40,000 new Han ideographs. Personally I'm not qualified to 
judge how important they are.

6. Egyptian hieroglyphics will probably be added in Unicode 3.2, at 
least the basic set used in elementary schools around the world.

I've half been waiting to see what Java 1.4 was going to do, but the 
latest word seems to be that Sun is going to punt. They are going to 
pretend the problem doesn't exist, at least until 1.5. Frankly that's 
too long. I think I'm going to start work outside JDOM on a 
UnicodeString class or some such that could be used to provide real 
Unicode support, and then I'm going to start reinventing the rest of 
Java's text handling and XML parsing on top of that.

Like I said, this is not a JDOM project, and won't be ready for JDOM 
anytime soon. But I would like to see that JDOM doesn't lock itself 
into String at a very low level. I'd like JDOM to hide the 
implementation details enough so that it's plausible that at some 
point in the future, we could use real Unicode support when it 
becomes available, either from Sun, from me, or from somebody else.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+



More information about the jdom-interest mailing list