[jdom-interest] XHTML issues
Rachel Greenham
rachel at linuxgrrls.org
Fri Jul 25 15:21:39 PDT 2003
Jason Hunter wrote:
>> Yes, including the characters directly and outputting with UTF-8 does
>> work, even just on -b9 (as long as you created your OutputStreamWriter
>> using the right encoding), no need for latest-CVS. I simply have a
>> *preference* for defining them as entities, either named or numerical,
>> and keeping the XHTML source 7-bit clean. I know HTTP is guaranteed
>> 8-bit safe, and browsers should cope, but I also want it to be readily
>> viewable in any text editor, specifically nedit in my case, which
>> doesn't have UTF-8 awareness.
>
>
> Then you really shouldn't output with UTF-8. :-)
>
> The ability to add characters for special escaping is what we added
> after beta 9. In UTF-8 no chars needs to be escaped since it represents
> all of Unicode 2.0. Our default escape strategy is only to escape what
> can't be represented, but you can override that.
Yeah, that sounds good. I've resolved my problem for now anyway, as
there weren't actually all that many special characters in what I'm
processing anyway and I've got a mapping to turn them all into named
entities instead.
But I'm thinking I may end up reverting to letting it write them out in
UTF-8 after all. Had a thought that non-HTML4-aware browsers (Netscape
<=4, etc.) may be happier with that.
The other issue I had by the way was that EntityRefs are being printed
out with surrounding newlines if newlines is true on the XMLOutputter.
This has the effect that they get surrounded by a visible space when
displayed in a browser. When you're using entities for quote marks,
apostrophes, and accented letters (most of the time in fact) this is
obviously not wanted. However, setting the XMLOutputter to not generate
newlines makes the source very unpleasant and difficult to look at
manually (eg: a 30,000 word story crushed to 66 logical lines, most of
those in a <pre> block, for instance).
My current solution is to extend XMLOutputter with my own
XHTMLOutputter, override printElement() and when a <p> element is
encountered, turn off newlines, call the superclass method, and turn
newlines on again. This is sufficient for the relatively simple
documents I'm working on at the moment, but might be nice if one could
control this in a normal XMLOutputter, eg: setting things so that
entities don't get surrounded by newlines. That, or some kind of
intelligence so it doesn't happen for elements either *when* they're
inlined in text.
Maybe it might be worth producing a special standalone XHTMLOutputter
that formats things nicely for web-development purposes - ie: so the
XHTML source is workable by hand - but it also renders properly in web
browsers (presuming they actually compose the XHTML itself correctly of
course).
--
Rachel
More information about the jdom-interest
mailing list