[jdom-interest] Partial Tree building/instantiation --- XPath Filter

Scott Smith ssmith at summitlogic.com
Mon Apr 2 15:03:08 PDT 2001


As an amendment to my last email:
 
It would also be useful to have the new SAXPatternBuilder (or whatever it's
called)  return JDOM objects via a callback rather than is a list.  This
should be obvious, because LARGE XML files are involved.  If all objects
were returned in at once in a List, it would defeat the purpose.
 
Scott

-----Original Message-----
From: Jakob Jenkov [mailto:jakob at jenkov.com]
Sent: Monday, April 02, 2001 4:34 PM
To: jdom-interest at jdom.org
Subject: [jdom-interest] Partial Tree building/instantiation --- XPathFilter


Hi There.
 
I'm currently working on a long, long :-) project in which we parse through
some quite long files. We have tried converting these files to XML for
easier/standard parsing but each file will then be of a size of about 16-30+
MB each. I don't even dare think about how much memory such a JDOM tree
would take! And the plans for lazy evaluation won't help, since we are
visiting every node in the tree, thus instantiating all objects anyway.
Parsing the trees solely using SAX is not developer-friendly enough. What I
have in mind is some kind of a XPath filter, allowing you to build JDOM
trees from sub trees from the data, and dipose these trees when I don't
longer need that tree. Let me give an example:
 
We parse phone call records in files that sometimes can contain thousands
and thousands of records. In XML format these files and records would look
something like this:
 
<transferBatch>
    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>

    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>
    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>
    ...
    ...
    ...
</transferBatch>
 
 
 
Each <phoneCall> record with all it's sub records can be quite large, and
there can be thousands of these <phoneCall> records. I'd like some way to
get a JDOM tree for each <phoneCall> record one at a time, and to be able to
dispose <phoneCall> JDOM tree before moving on to the next. How will I do
that?
 
My Suggestion would be to insert an XPathFilter, that only builds JDOM trees
from the records that match the given XPath. In the example above, an XPath
of    transferBatch::phoneCall   would have done the job.
 
Does my complaints/ideas sound completely out-of-this-world? I think there
are many out there who will have the same problem, parsing one sub tree at a
time, without regard to the others.
 
 
Regards,
Jakob Jenkov
jakob at jenkov.com
 
 
 
 
 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://jdom.org/pipermail/jdom-interest/attachments/20010402/9a1327af/attachment.htm


More information about the jdom-interest mailing list