[jdom-interest] Partial Tree building/instantiation --- XPath Filter

Adam Simantel Adam.Simantel at merant.com
Mon Apr 2 15:37:01 PDT 2001


I have run into this problem as well.  I have experimented with one
approach that works and would like to know what others think of it.

By the way, I looked at XSLT for a bit, and it appeared to me
that that the XSLT processor wanted to load the whole document
into memory first.  Saxon has a Preview mode to get around this
but I didn't research that very far since I would have had to
do two passes on everything before it was in JDOM where I wanted
it.


My solution is as follows:

Add a setHandler() method to a copy of org.jdom.input.SAXBuilder
to which an element name and a handler class is passed.

    private HashMap handledElements = new HashMap();
    public void setHandler(String elementName, ElementHandler handler) {
        handledElements.put(elementName, handler);
    }

Modify SAXHandler.endElement() (still inside SAXBuilder) so that
it calls the handler if this is a registered element.  Upon return
from the handler, the element content is removed from the tree.

The end usage ends up looking something like this:

	...
	MyElementHandler myhandler = new MyElementHandler();

	reader.setHandler("transferBatch", myhandler);
	reader.setHandler("phoneCall", myhandler);
	try {
		reader.build(srcXmlFilename);
	}
	...

You can have as many handlers as you want, and, anything you handle 
is discarded after it's handler is called.

The XPathFilter proposal would be very nice, since
the above solution only registers explicit element names.

I wanted to extend SAXBuilder, but I could not see how to do this. 
I see from other posts that XMLFilter may help.  I will look at that
and I welcome more advice in this area.

I can't see how to avoid needing access to the JDOM Element
stack (named "stack") inside the SAXHandler in SAXBuilder.  That's
how I pass the JDOM Element to the handler.  Perhaps a small change
to SAXBuilder would expose a safe interface to that stack and then
a subclass of SAXBuilder with an XMLFilter might be a clean solution?


Regards,

Adam Simantel
Adam.Simantel at merant.com



-----Original Message-----
From: Steven Gould [mailto:steven.gould at cgiusa.com]
Sent: Monday, April 02, 2001 2:14 PM
To: jdom-interest at jdom.org
Subject: Re: [jdom-interest] Partial Tree building/instantiation ---
XPathFilter


Jakob,

Could you use XSLT to break the file up into smaller, more manageable
documents? Then use JDOM to manipulate/process each of these smaller
documents.

Steve

---

Jakob Jenkov wrote:

> Hi There. I'm currently working on a long, long :-) project in which
> we parse through some quite long files. We have tried converting these
> files to XML for easier/standard parsing but each file will then be of
> a size of about 16-30+ MB each. I don't even dare think about how much
> memory such a JDOM tree would take! And the plans for lazy evaluation
> won't help, since we are visiting every node in the tree, thus
> instantiating all objects anyway. Parsing the trees solely using SAX
> is not developer-friendly enough. What I have in mind is some kind of
> a XPath filter, allowing you to build JDOM trees from sub trees from
> the data, and dipose these trees when I don't longer need that tree.
> Let me give an example: We parse phone call records in files that
> sometimes can contain thousands and thousands of records. In XML
> format these files and records would look something like
> this: <transferBatch>    <phoneCall>        <details>bla.bla.bla., sub
> records etc.</details>    </phoneCall>    <phoneCall>
> <details>bla.bla.bla., sub records etc.</details>    </phoneCall>
> <phoneCall>        <details>bla.bla.bla., sub records
> etc.</details>    </phoneCall>    ...    ...
> ...</transferBatch>   Each <phoneCall> record with all it's sub
> records can be quite large, and there can be thousands of these
> <phoneCall> records. I'd like some way to get a JDOM tree for each
> <phoneCall> record one at a time, and to be able to dispose
> <phoneCall> JDOM tree before moving on to the next. How will I do
> that? My Suggestion would be to insert an XPathFilter, that only
> builds JDOM trees from the records that match the given XPath. In the
> example above, an XPath of    transferBatch::phoneCall   would have
> done the job. Does my complaints/ideas sound completely
> out-of-this-world? I think there are many out there who will have the
> same problem, parsing one sub tree at a time, without regard to the
> others.  Regards,Jakob Jenkovjakob at jenkov.com
_______________________________________________
To control your jdom-interest membership:
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
t.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://jdom.org/pipermail/jdom-interest/attachments/20010402/543fe7e1/attachment.htm


More information about the jdom-interest mailing list