[jdom-interest] Parsing files starting with UTF-8 Byte Order Mark
Alastair Rodgers
alastair.rodgers at phocis.com
Tue Jul 1 02:12:41 PDT 2003
I forgot to mention, you check for FEFF because this is the Unicode char represented by the UTF-8 byte order mark (EF BB BF).
> -----Original Message-----
> From: Alastair Rodgers
> Sent: 01 July 2003 10:10
> To: 'Peter Eriksson'; jdom-interest at jdom.org
> Subject: RE: [jdom-interest] Parsing files starting with
> UTF-8 Byte Order Mark
>
>
> Hi Peter,
>
> The UTF-8 byte order mark is supposedly optional, but
> unfortunately there is a known bug in Sun JVMs which means
> they do not ignore it; so if it's present, you'll see it in
> your input stream (Sun JVM bug #4508058,
> http://developer.java.sun.com/developer/bugParade/bugs/4508058.html).
>
> The typical workaround is to do the check yourself when
> reading the input stream, for example:
>
> InputStream in = ...
> StringBuffer buf = new StringBuffer()
> int first = in.read();
> if ((first != -1) && (first != 0xFEFF)
> buf.append((char)first);
>
> ... Read the rest of the stream ...
>
> I haven't needed to use this with JDOM, but I expect you
> could get round the problem by using a
> java.io.PushbackReader. This wraps another Reader and allows
> you to read the first char, and if it is anything other than
> 0xFEFF, "push it back" into the Reader before passing the
> PushbackReader to SAXBuilder().build(). There may be more
> elegant ways round the problem too.
>
> Al.
>
>
> > -----Original Message-----
> > From: jdom-interest-admin at jdom.org
> > [mailto:jdom-interest-admin at jdom.org] On Behalf Of Peter Eriksson
> > Sent: 01 July 2003 06:46
> > To: jdom-interest at jdom.org
> > Subject: [jdom-interest] Parsing files starting with UTF-8
> > Byte Order Mark
> >
> >
> > Hello Everybody,
> >
> > I have a problem with parsing some XML files generated from
> > .Net. It seems that the file starts with the Byte Order Mark
> > for UTF-8 (EF BB BF). If I try to load the file using jdom-b8
> > I get an exception. Is there some way that I can load files
> > with or without this Byte Order Mark transparently, i.e.
> > without an exception being thrown.
> >
> > Anybody have a solution to the problem?
> >
> > /Peter
> >
> >
> >
> >
> >
>
More information about the jdom-interest
mailing list