[jdom-interest] XPath

Tue Jun 20 12:18:43 PDT 2000

I'm working on an XPath library for JDOM, which could be included for 1.1.
It's not ready for beta yet, but I'll give a description of the design, to
ask for evaluation, feedback and help if anyone is interested.  If there is
interest and Brett and Jason agree to it, I'll submit the code to jdom
contrib in a couple weeks.

There are two new packages, org.jdom.xpath and org.jdom.xpath.parser.  The
parser package contains four interfaces modeled after the SAX API (thanks to
_Java and XML_, chapter 3; and to the author of SAX); after trying other
designs, this one worked best, allowing the greatest independence between
the parsing and the use of the parsed data.  Two interfaces are for parsers,
XPathParser and XPathPredicateParser, and two for handlers, XPathHandler and
XPathPredicateHandler.  Predicate expressions seemed sufficiently complex to
justify a separate parser.  Each parser has methods to set and get
respective handlers, and to set the opposite parsers, because an xpath can
have nested predicate expressions, and predicates can have nested xpaths.
The interfaces allow the parsers to vary independently (if there is any
reason to create different implementations).  The two handler interfaces
allow them to be plugged into the parsers, much as you would do with SAX.
The parser interfaces are not very interesting, so I will only give more
detail on the handler interfaces below.

In the org.jdom.xpath package, the first class is JDOMLocator, which
implements the handlers.  It takes as input an Element and an xpath String,
and returns a List of matches.  The list may contain Elements, Attributes,
PI's, Strings, and Comments (and maybe Namespaces, I'm not sure yet).
Example usage:
    JDOMLocator locator = new JDOMLocator();
    locator.setContextElement(element);
    List matches = locator.match("child::Chapter/child::node() |
Section[2]/attribute::*");
The above matches would contain the mixed content of every child of element
named "Chapter", and every attribute of the second child of element named
"Section".

The JDOMLocator class has just enough to prove the concept and to make these
examples work.  The following are example xpaths which work with JDOMLocator
so far (and have unit tests):
    child::text()
    child::processing-instruction('myapp')
    /*/*/child::Section
    Chapter/Section | Diagram
    Chapter/Section[1]/attribute::Focus

A method such as match(String xpath):List could be added to Element for JDOM
1.1.  Does anyone have other requirements for how to use XPath, or any
special needs an XSLT library would have on it?

Interesting websites on XPath ...
http://www.w3.org/TR/xpath
http://www.zvon.org/HTMLonly/XPathTutorial/General/examples.html

There is still quite a bit of work to do, but because of the organization,
it would not be difficult to split it among a couple more developers.  The
XPathHandler interface is pretty much finished, and is fully utilized by
implementation classes on both sides.  An impl of the XPathParser works for
cases even more complete than the examples above, though it does not yet
handle predicates with nested xpaths or predicates, but that won't be too
difficult to fix.  The parser package also has a DefaultHandler and a couple
utility classes.  Other than XPathParseException extending JDOMException,
this package has no dependency to JDOM.

The XPathPredicateHandler interface needs a lot of work, as does the
predicate parser and handler implementations.  This would be the first
obvious area if anyone wants to take that on.  This handler works similarly
to the XPathHandler below.  It also has constants defined to identify
operators.

package org.jdom.xpath.parser;
/**
 * Receives events as an XPathParser is parsing, when each token
 * starts and ends and a few methods describing the steps.
 *
 * @author Michael Hinchey
 */
public interface XPathHandler {

  /**
   * Called before any parsing begins.
   */
  void startParsingXPath() throws XPathParseException;

/**
 * Called after any parsing is finished.
 * Should this be called after an error? (Hinchey)
 */
 void endParsingXPath() throws XPathParseException;

  /**
   * Since an xpath can be composed of multiple paths separated
   * by '|' characters, this is called before each one.
   */
  void startPath() throws XPathParseException;

  /**
   * Called after parsing each path among one or more
   */
  void endPath() throws XPathParseException;

/**
 * The path which just started to be parsed is absolute.
 * This means the context node should be changed to the root node.
 * Called after startPath() and before any startStep().
 */
 void absolute() throws XPathParseException;

  /**
   * Each path may have one or more steps.
   */
  void startStep() throws XPathParseException;

/**
 * The nametest for a step.
 * Called after startStep() and before endStep().
 * A step may only have one nametest or a nodetype.
 * @param axis The axis for a step (double colons not included).
 * @param prefix The namespace prefix if any, or null.
 * @param localName The nodetest without a namespace prefix.
 */
 void nametest(String axis, String prefix, String localName) throws
XPathParseException;

/**
 * A nodetype for a step.
 * Called after startStep() and before endStep().
 * A step may only have one nametest or a nodetype.
 */
 void nodetype(String axis, String nodetype, String literal) throws
XPathParseException;

  /**
   * Is called at the beginning of every predicate expression.
   */
  XPathPredicateHandler startXPathPredicate();

  /**
   * Is called at the end of every predicate expression.
   */
  void endXPathPredicate(XPathPredicateHandler handler);

  /**
   * The end of a step, before the next startStep() if any.
   */
  void endStep() throws XPathParseException;
}

-Mike Hinchey

(Please note, I may not respond for a few days as I'll be out of town.)