[jdom-interest] StAX support

Michael Kay mike at saxonica.com
Mon Nov 14 01:07:45 PST 2011


On 14/11/2011 02:03, Rolf wrote:
> Hi All.
>
> I have been trying to put StAX support in to JDOM for a little while 
> now, and I have just pushed through the code to github that contains 
> the majority of the anticipated API on the JDOM side for handling the 
> StAX parsing/processing of XML.
To what purpose? If you're building a tree, there are no usability 
benefits in using a pull parser rather than a push parser. If there are 
any performance benefits, then they are (a) very small, and (b) 
accidents of the implementation rather than anything architectural. 
Architecturally, there are disadvantages because it is harder to insert 
other functionality (filters, validators etc) into the parsing pipeline.
>
> I have been using as references the StAX specification, the JDOM 'way' 
> of doing things, and the rest of the web.
>
> Some observations I have:
> 1. StAX is currently the fastest way (slightly) to parse XML on my 
> computer.
Which parser? Woodstox is fast, but it's also fast in push mode.
> 2. The StAX specification is perhaps the very worst specification I 
> have ever seen for functionality currently in the Java language/API. I 
> hope that other concepts in the JCP process have better results.
Agree 100%. There have been a lot of interoperability issues with StAX 
parsers as a result. Exception handling is a disaster area.
> 3. XML Validation with StAX is 'hard'.
Because pull pipelines are more difficult to construct than push pipelines.
> 4. DOCTYPE handling in StAX is unpredictable.
I'm not sure the "in StAX" is needed in that sentence...
> 5. after having been around for almost as long as JDOM, the StAX 
> concept is still 'dynamic' and changing.
Actually I see it as pretty dormant. It's an idea that really hasn't 
taken on significantly. I've been supporting StAX in Saxon for years and 
I see very little evidence that anyone uses it. The only parser that 
reached a decent level of maturity and stability was Woodstox, and that 
now seems to be stable with little further development.
>
> Essentially, I have had a long hard look at it, and I think there were 
> a number of oversights in the process.... it's a good concept that has 
> had a poor implementation.
>
> On the other hand, I have put a fair amount of thought in to it, and 
> gone a long way to making it work well in JDOM (within the limitations 
> of StAX), and there may be some use in it.
>
> My thinking is that I will leave the code in there for the moment, but 
> it is incomplete, and I really need to work on something else in the 
> meantime.
>
> It is still a 50/50 as to whether it should be in there, or be 
> stripped out again.
I'd vote against, on balance. It's feature creep - added complexity with 
very little benefit. (And if someone really needs to get the output of a 
StAX parser into a JDOM tree, they can always use a Saxon identity 
transformer with a Stax Source and a JDOM Result.)
>
> What I would really like is to get in touch with a StAX 'expert' and 
> run some of my concerns past them.
Tatu Saloranta of Woodstox fame is your man.
>
Regards,

Michael Kay
Saxonica


More information about the jdom-interest mailing list