[jdom-interest] RE: Keeping an XML file in memory

Thu Aug 2 02:16:43 PDT 2007

> Thanks Michael.  Could you point me to some documentation 
> that explains how to build a TinyTree from a file, and how to 
> create a Source object out of it (and which Source 
> implementation to use)?  I am not really familiar with 
> Saxon's own API, as I've always used it through JAXP interfaces.
> 
> Also, I am somewhat reluctant to indroduce a direct 
> dependency on Saxon in my code (although I suppose since I am 
> using some XSLT 2.0 features, I already have it, whether I 
> want it or not. :) 

I'll keep it brief since we're now on the wrong list for the topic.

The Saxon Javadoc is at
http://www.saxonica.com/documentation/javadoc/index.html

The simplest way to build a TinyTree is using

Configuration config = new Configuration();
DocumentInfo doc = config.buildDocument(Source src);

where src can be a StreamSource, a SAXSource, a DOMSource, etc.

The result is a DocumentInfo (a document node in the Saxon tree model) which
itself implements the JAXP Source interface, so you can use it for example
as the first argument to transformer.transform().

If you're running a complex workload that involves sharing source documents
between multiple concurrent transformations and so on, then I think it's
going to be difficult to do without some use of native Saxon interfaces -
particularly the Configuration object. It can probably be done in theory,
but everything is one step removed. For example, the Saxon implementation of
JAXP DocumentBuilder can be used to construct a TinyTree with a DOM wrapper
around it, and if you supply this as a transformation source then Saxon will
unpeel the wrapper. If you want to keep your code portable without losing
performance then I think it's better to implement your own abstraction
layer.

> Could you provide some more insight as to 
> what advantages a TinyTree provides over a simple byte array 
> in my situation?  Is it significantly less expensive to 
> create a Source object from a TinyTree than from a byte array?

If you use a byte array then the document is going to be parsed from
scratch, and a new tree built, for each transformation. That means finding
new memory each time to store the resulting TinyTree. It's quite common that
the parsing and tree-building takes as long as the transformation proper
(sometimes longer). Parsing from a byte array in memory is a bit faster than
parsing from a file on disk, but not much, and a lot slower than not parsing
at all. Creating a Source object from a TinyTree is zero-cost, the TinyTree
already is a Source object.

Michael Kay
http://www.saxonica.com/