[jdom-interest] Re: Manipulating a very large XML file

Tatu Saloranta cowtowncoder at yahoo.com
Tue Mar 15 19:11:47 PST 2005


--- Michael Kay <mike at saxonica.com> wrote:
> > Exactly.  That was the hardest thing for me to get
> around.
> > But reducing the number of bytes per node in the
> representation
> > of the tree structure from around 80 down to more
> like 8
> > is the key to the whole thing.
> 
> Saxon's TinyTree uses 19 bytes per node - it's very
> hard to get below that.

Yes, especially since latest JVMs have something like
8 byte overhead per object (I think?)

Nonetheless, although I agree with Jason's sentiment
regarding huge XML documents (DBs do make good sense),
I also think that it might not be a bad idea to
consider an alternate JDom implementation, that would
try to be more efficient, even at some slight
perforamnce expense (more compact memory presentation
may help in cache locality etc, which may offset some
additional lookups). It's still nice not to use too
much memory when dealing with full-memory tree
representations.

One final idea regarding implementation: for some
special cases, it may actually be possible to even
share text node Strings... for some elements, values
are likely to come from a limited vocabulary (much
like attribute values). Perhaps a heuristics could be
enabled that would try to do 'interning' of short
Strings? (most XML parsers already intern element and
attribute names and prefixes, and share namespace
URIs).

-+ Tatu +-



		
__________________________________ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/ 


More information about the jdom-interest mailing list