[jdom-interest] Performance tests

Andy Clark andyc at apache.org
Tue May 8 20:47:14 PDT 2001


Dennis Sosnoski wrote:
> Best results are from Xerces DOM with deferred node expansion (Xerces deferred).
> This is more than made up for if you actually use most of the document (as
> opposed to reading it in and looking at only a small portion), though.

Yes, the time goes back into the "fluffing" of the DOM nodes.
But this is only on the first traversal of the tree. Subsequent
traversals are just as fast as the non-deferred DOM.

> Best results again for Xerces deferred, but going through the document expands
> the nodes to largest size of all.

This is sadly the result of the typical time vs. space battle
in computer science. I tried to do some smart things in the
deferred case to release the references to the internal data
tables as the tree is fluffed. But it still uses up memory
that needs to be garbage-collected.

> Best by far is Xerces base, followed by  Electric XML and Crimson DOM, then

As I said, the deferred DOM is just as fast as the Xerces base
after the first traversal. The nice thing about the deferred
DOM (if you can stand the extra memory used) is that the tree
construction plus traversal is still faster than the Xerces
base DOM. The reason is because the deferred DOM connects the
fluffed up nodes together under the covers whereas the base
DOM uses the standard DOM methods like appendChild().

> dom4j. JDOM is nearly as bad as Xerces deferred at this (with Xerces expanding
> all the nodes as it goes). Xerces base is 10-20 times the performance of
> JDOM/Xerces deferred in this test.

It might be useful to always do two traversals because you
will be able to see which implementations use lazy eval
tricks which eat up memory and time on the first traversal
but that may disappear in subsequent traversals. Also, some 
implementations do not re-use objects and will continually 
create more objects (wasting more memory) on subsequent 
traversals.

> implementations you're much better off just outputting text and parsing it back
> in, this will be about twice as fast and the data will be half the size.

This is what I always tell people. And by doing this they
avoid serialization incompatibilities when the DOM impl
changes, perhaps for performance improvements, etc.

-- 
Andy Clark * IBM, TRL - Japan * andyc at apache.org



More information about the jdom-interest mailing list