[jdom-interest] ID and IDREF

Wed Sep 6 10:57:10 PDT 2000

"Galluzzo, Eric" wrote:
[...]
> One problem I can foresee is that maintaining some sort of ID -> element map
> while the user is constructing a document will be really quite painful, and
> perhaps even impossible.
> [...]
> would have to update the ID -> element map on the last statement, which
> could be slow if root actually had lots more elements in it.  Likewise,
> moving children about between documents and so on would become much trickier
> for JDOM to implement.
> 
> So I would say the only feasible way to implement such a method would be to
> do it "on the fly" -- in other words, to search through all attributes of
> all elements until one is found with type "ID" and with the correct value.
> But that's bound to be really, really slow.  So perhaps it's not even
> feasible at all....

You've identified various issues. First of all, in the absence of a DTD
(or other schema), it is impossible to know which attributes are declared
as type 'ID', and therefore there are *no* IDs in the document.

Assuming we have a schema available, and that IDs have been declared and
used in the document, it's a simple matter of keeping track of which 
attributes on which elements are of type ID, and then storing references
to elements containing instances of IDs in a hashtable. This is also how
you could ascertain if all IDs are unique. Prior to storing a new ID
value, the lookup *should* fail. Catch that exception and at that point
add the new ID to the hashtable. Then, at any point you need to dereference
an ID, a simple hash lookup provides the element reference. There may be
other ways of doing this, but this is at least how I've done it in my
parser. Performance is not really a problem, although on *enormous* 
documents the hashtable might be a memory hog. I've not seen this as 
a burden in my experience.

The idea of moving children between documents relies on the IDs in the 
source not overlapping the IDs in the destination. This should throw an
exception that would either cause the paste to fail, or automatically
reassign new IDs, probably at user preference. IDs are only valid within
the scope of their document of origin, so this by nature is a problem 
that should be solved in the context of the application environment. 

Keeping the API lightweight is important. If JDOM is being used for
validating parsers, then it must already support these features. It
almost sounds like one could have three components to an API:

  1. Core well-formed parsing (basically, SAX)
  2. XML 1.0 DTD parsing (SAX2 includes this)
  3. Database feature parsing

I'm not suggesting this, just noting that it's easy to overburden a 
design by pushing too many requirements its way.

Murray

...........................................................................
Murray Altheim, SGML/XML Grease Monkey     <mailto:altheim&#64;eng.sun.com>
XML Technology Center
Sun Microsystems, 1601 Willow Rd., MS UMPK17-102, Menlo Park, CA 94025

      In the evening
      The rice leaves in the garden
      Rustle in the autumn wind
      That blows through my reed hut.  -- Minamoto no Tsunenobu