[jdom-interest] Partial Tree building/instantiation --- XPath Filter

Scott Smith ssmith at summitlogic.com
Mon Apr 2 14:59:43 PDT 2001


I was thinking of this too.  Using XPath and XSLT, as Steven mentioned isn't
enough by itself (IMHO) because evaluating XPath requires an in-memory
representation of the XML.  It's a chicken and egg thing.
 
I think the solution would be a subclass of the current SAXBuilder that
takes an XPath pattern as a constructor parameter.  This new SAXBuilder
would create multiple JDOM objects, one for each node returned by the XPath
pattern.
 
Anyone processing LARGE XML files would find this very useful, I would
think.
 
Scott

-----Original Message-----
From: Jakob Jenkov [mailto:jakob at jenkov.com]
Sent: Monday, April 02, 2001 4:34 PM
To: jdom-interest at jdom.org
Subject: [jdom-interest] Partial Tree building/instantiation --- XPathFilter


Hi There.
 
I'm currently working on a long, long :-) project in which we parse through
some quite long files. We have tried converting these files to XML for
easier/standard parsing but each file will then be of a size of about 16-30+
MB each. I don't even dare think about how much memory such a JDOM tree
would take! And the plans for lazy evaluation won't help, since we are
visiting every node in the tree, thus instantiating all objects anyway.
Parsing the trees solely using SAX is not developer-friendly enough. What I
have in mind is some kind of a XPath filter, allowing you to build JDOM
trees from sub trees from the data, and dipose these trees when I don't
longer need that tree. Let me give an example:
 
We parse phone call records in files that sometimes can contain thousands
and thousands of records. In XML format these files and records would look
something like this:
 
<transferBatch>
    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>

    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>
    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>
    ...
    ...
    ...
</transferBatch>
 
 
 
Each <phoneCall> record with all it's sub records can be quite large, and
there can be thousands of these <phoneCall> records. I'd like some way to
get a JDOM tree for each <phoneCall> record one at a time, and to be able to
dispose <phoneCall> JDOM tree before moving on to the next. How will I do
that?
 
My Suggestion would be to insert an XPathFilter, that only builds JDOM trees
from the records that match the given XPath. In the example above, an XPath
of    transferBatch::phoneCall   would have done the job.
 
Does my complaints/ideas sound completely out-of-this-world? I think there
are many out there who will have the same problem, parsing one sub tree at a
time, without regard to the others.
 
 
Regards,
Jakob Jenkov
jakob at jenkov.com
 
 
 
 
 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://jdom.org/pipermail/jdom-interest/attachments/20010402/b24143cd/attachment.htm


More information about the jdom-interest mailing list