[jdom-interest] getChild() vs. getChildElements()

Joerg joerg at freiheit.com
Fri Aug 11 10:55:12 PDT 2000


Brett McLaughlin wrote:

> Joerg wrote:
> >
> > Hi everybody,
> >
> > I am new to this list and was just digging through the list archives
> > trying to get a picture of the development of the JDOM API. I was also
> > following the unfortunate discussion about the naming of the getChild()
> > or getChildElement() methods. When I stumbled over JDOM, the first thing
> >
> > I was lucky about to see was the simple structure and naming of the API.
>
> I'm sure if you read the XML spec at all, you also noticed the erroneous
> nature of many of the names.
>
> <small-rant>
> I can't believe we are still arguing this facet of this thread. I don't
> mind that people are still arguing over the name of the method. I just
> can't believe that people won't read the specs and admit that an Element
> is <i>not</i> the only type of child an Element can have! Argggh!!
> People, at least read the specifications, and argue (if you must) that
> we are simplifying the spec, and admit it's not a conformant name. That,
> at least, is honest!
> </small-rant>
>
> OK, now I'm back ;-)
>
> >
> > Once we made our own DOM implementation for a special purpose, added a
> > few 'special Java' functions (for example getChilds() :-) ) and
> > developed a DOM wrapper from this, because the standard DOM methods are
> > more or less unusable. This wrapper simplifies the access to the
> > document for a (very common) subset of XML-Documents. For example we
> > trimmed away all dangling whitespace and normalized multiple text nodes
> > already in the DocumentHandler of the SAX parser. For our purposes it is
>
> Trimming dangling whitespace is another non-XML-conformant issue. For
> your purposes that's fine, but certainly you aren't advocating that in
> JDOM, right? We really, really went through this.

Sorry for mentioning this topic again, but this also touches the basic topic I tried to
address.

>
>
> Folks, I understand that you want JDOM really simple for /your/
> use-case. However, JDOM isn't for /your/ use-case, it is for simple
> manipulation of XML in Java. However, that doesn't mean "simple
> manipulation of the most common features of XML" because everyone sees
> common differently. It doesn't mean "an API that uses almost-XML to help

> you out," either. We say we are Java + XML. Therefore, we must be
> XML-conformant. To abandon that will mean we won't ever be standardized,
> we won't ever be widespread, which more or less means that folks like
> Jason and I won't be able to keep working on it (as opposed to things
> that are more realistic), and it will die, or become a glitch in the
> system ;-) So understand that we /must/ at least attempt to support the
> basic tenets of XML. Not DOM, not SAX, but XML itself. We aren't talking

Why not "simple manipulation of the most common features of XML". This needs not
neccessarily mean that you have to exclude all the other features. What you are
referring to, here is what our special needs were for our own API. I am not arguing
against following the standards. Standards are very important and what happens when due
to a lack standards companies are setting up their own "standards", one can see every
day.

You are fully right that everybody has a different idea of what is common. But your 80%
/ 20% rule of thumb implies that there are are more commonly used features and less
commonly used features. What is missing in my opinion, is the bigger picture behind
this all. It is not enough to say only you will follow the standards but be a little
bit simpler than DOM. Don't misunderstand me here. I appreciate this approach very
much. The only thing is, if it is not more clearly defined to which extend the API
should be simpler, this will be a permanent source of conflicts and of
misunderstandings. What I tried to illustrate in my email is of course what I think
what is common (but I clearly stated that).

> even about functionality loss, here, but a method name! I can't believe
> people are honestly arguing that a longer method name /complicates/ the
> API! That's false - it clarifies the API (nobody can argue that), and it
> may make the API longer to type, but by nature, a no-functionality
> change cannot /complicate/ an API, if the name assists in
> /clarification/, even when none is needed. That's just logical, folks
> ;-)

The point is not that the name is longer. It is mentioning the Element. But before you
go through the roof again, let me complete this point below, where it is more clear,
what I want to say.

>
>
> >
> > e.g. of no interest to call getContent() for a non-leaf Element. Our
> > method even returns 'null', when called on a non-leaf Element.
> > Furthermore our getChilds() method returns only Element nodes (as an
> > Enumeration, a List is better but we had nothing else then) to avoid all
> >
> > the nerve-racking 'instanceof' or getNodeType() checks.
> >
> > What I mean is, we 'declared' a special case in the usage of XML to our
> > general case. An XML Document is in that case a tree of Elements, which
>
> And that's great. If you want that, you are welcome to subclass JDOM and
> call it "myJDOM" or something. But you special case is not my special
> case. And so on - we can't impose our special cases on other users when
> not all those users want that special case.

Please be fair :-). I'm only talking about our own API and the motivation behind it as
an example for what the "user outside" needs. Maybe I was a little bit overemphasizing
this, but it was an urge for me to get this discussion going again a bit more in this
direction. The discussion was very heated up in the beginning but came to nothing in
the end. There have been wild ideas for a compromise (getElement(), etc) but no one
addressed the basic discrepancy between the two opinions. So it was time for someone to
make another, maybe a bit provocative, thesis.

>
>
> > can have values. This view of an XML document is very common for
> > problems in which you have to manipulate deep data structures, which
> > form the vast majority of problems outside the area of traditional
> > document management. In our case it is explicitly defined, that you work
>
> I don't agree. I wrote a complete set of data binding classes that, in
> total, calls getChildren() one time. Period. Over 5,000 lines of code,
> more or less, and one call to getChild(). It's just not a valid argument
> that XML has a "vast majority" - the whole point of XML is to allow any
> use, which is going to result in lots of crazy uses, even minimally. Add
> up all the minimal weird cases, and you approach the number of "normal
> cases", so vast majority is lost. I just want to continually emphasize
> that we can simplify accessing XML - that's great. But we can't cut
> things out because we presume that our use-cases are normal and others
> aren't.
>

It's a different story if you are developing a more general tool or you are dealing
with specific documents, in which you have to navigate. But you are on the other hand
right, that the numerous 'crazy' application areas will also sum up. A nice
perspective. But again, I'm not arguing for leaving something out.

> >
> > only with Elements when you navigate on the tree. So it is not necessary
> >
> > there to say 'getChildElement()' or 'getChildElements()', this would be
> > a steady repetition of a meaningless statement, because we don't deal
> > with other Nodes than Elements. So we named our methods getChild() and
> > getChilds(), which says everything necessary, namely that you want to
> > have one or all childs (not the parent and not the siblings) of a given
> > Element, which are, as always, also Elements.
>
> In your case, that is a valid desire - of course, you are completely
> ignoring entities, which I would maintain are in the "normal" use-case,
> as you put it. XMLC uses them, which want to move to JDOM. Am I going to
> block that effort? No way ;-)
>
> >
> > In JDOM, as I understood, the situation is even much more clear than in
> > our DOM wrapper. Here, not everything is based on a Node, so the only
> > potential tree-building class is Element. And text is mainly a value of
>
> Wrong. Parsed entities:
>
> <root>
>   &parsed;
>   <empty />
> </root>
>
> may expand to:
>
> <root>
>   <copyright>
>     This was developed in <b>2000</b>.
>   </copyright>
>   <empty />
> </root>

<short-interruption>
    According to your UML-diagram, your Entity class cannot have any
    associated Elements. So I was not sure how you handle parsed entities.
</short-interruption>

>
>
> Depending on when the entity is parsed and expanded, there is no clear
> definition of what the child is. Is it the entity? Is it a set of
> Elements that the parsed entity converts to? Is it just the empty
> element? Is it all? or None? In this case, getChildren() is completely
> ambiguous, as is both getChild(parsed) and getChild(copyright). And
> based on when the document is accessed, there may be no way of telling
> that the parsed entity (the copyright and enclosed character data) is
> even an entity at all! So getChildren() is returning potentially
> misleading data. This is clearly a case where getChildElement/s() does
> the right thing, and clearly gives you only the empty element, which is
> both correct and easy to understand. There is no potential for foul-ups
> with that approach.

Here we come to the point. I am maybe not that bulletproof in the XML spec as you are,
but I definitely didn't miss that point. As you say here, one can have two different
views of an XML document. One with expanded entities (and with normalized text nodes
;-) ) and one view "behind the scenes" where you can see how it is all structured, with
all entities etc.. .

If it is the case, that getChild() really doesn't return Elements of parsed entities (I
was a bit mislead, in interpeting your UML diagram), then a possible simplification for
the access for people who are (temporarily, due to external requirements :-)) not
interested in the internal structure of the document is lost (I stop telling this case
the majority of applications, because this is too polemic, and we always argue then
about the number of applications in different areas. And nobody knows this now, that's
right).

Therefore, in my opinion (to complicate everything completely) one may go a step
further. It should be possible to provide methods which allow to navigate over the XML
document on a higher level of abstraction (namely, with all enties expanded), not
caring about entites and all this stuff. This would greatly simplify access for many
programmers.

And on the other hand you have the "low-level" access to the physical structures of the
document (I suppose with getMixedContent() ) when you need to manipulate them.

>
>
> > an Element (All people, who want to use XML in a different manner, can
> > use getMixedContent() or DOM directly). This is a basic goal of JDOM, so
>
> or DOM directly? What? Are you really saying that if you want more than
> child elements, you should use DOM? That's not our goal at all. You
> presume too much about what we want to accomplish here.
>
> >
> > I don't understand this discussion at all, why the nice simplicity of
> > JDOM should be sacrificed a foggy 'standards compliance'. JDOM is JDOM,
>
> Uggh... it's not more complex to call getChildElements() than get
> Child(). I can't believe this is in contention - it may be less
> convenient, or more annoying, for some, but it is a fallacy to say it
> complicates the API. And foggy 'standards compliance', as you put it, is
> what is getting us into the JDK if things go well. So don't knock it ;-)

When you add "foggy standards compliance in the naming", then it expresses it better
what I want to say here. But the basic misunderstanding I hopefully resolved before.

> Lack of standards compliance will get you nowhere. Even deviations from
> standards, like XMLC instead of JSP, are based on a lower level of
> standards (XML and servlets), or Cocoon (XML and XSLT). Standards are
> the reason that Java and XML run on multiple platforms. You take that

I quite agree.

> ability away, and you cut all uses of JDOM for anything but
> inter-application use, as the XML resulting can't be sent to other
> fully-XML-conformant handler APIs, many times. You also elimintate the
> ability to accept XML that is not the "simpler subset" form you can deal
> with. It just doesn't make sense.
>
> > it has its specialized purpose and its clear basic definitions (you
> > should have no other nodes than Elements, folks). This argument of the
>
> What? You're a little nuts here - no Attribute? Wow... I simply don't
> think JDOM is right for you... it may work for you, but we will not
> simplify to that degree.
>
> > confused JDOM beginners which only know the DOM specification is a
> > little bit far-fetched. When someone starts reading about JDOM it is
>
> You haven't read the XML specification either, though, it sounds like.
> And the XML spec is not that tough! It's clear where JDOM is deficient,
> and we are going to try to resolve those areas.
>
> > very soon absolutely clear what is its intention and what is the
> > difference to standard DOM. And if not then it is necessary to emphasize
> >
> > this point, but not to change the API because of this.
> >
> > The first time, I was first a little bit confused when I saw the class
> > diagram because I was missing, what, the Node class of course. But then
> > I thought: "hey are these people really thinking about what you really
> > want to DO with an XML document and not only about what is in a certain
> > way formally consistent?". I understand it, for example, not as yet why
> > someone had the idea to tie a DOM node to a specific document. These
> > 0.1% of the cases where it would be nice to get directly the Document,
> > containing the node, really don't justify all that increasing of entropy
> >
> > when all the computers perform all these unneccessary cloneNode()
> > operations, while they are merging or transforming XML documents. O.K.,
> > I could imagine that Elements with the same name can have a different
> > purpose in different DTD's or that the meaning of an Element is somehow
> > bound to the document it is part of. But I'm a little bit
> > digressing from the subject.
> >
> > For my (of course personal) opinion JDOM should be specialized in its
> > intention and consequent in the optimization for that special purpose.
>
> Which decreases our usability, which decreases our user base, which
> decreases the effort put into it...
>
> > This does not imply, that it is not standards compliant, because there
> > are all these special functions with which you can do everything else,
> > the standard requires (for me it would be perfect even without that but,
> >
> > who knows, one day I will need them, too). Everything else will bring
> > JDOM in the direction of DOM and this is definetly the wrong direction.
>
> There is no possibility that we are ever moving towards DOM, and even a
> cursory inspection of JDOM shows that. Ask Arnaud LeHors on the DOM
> working group how close JDOM is to DOM - he'll laugh you out of the room
> ;-)
>
> > Therefore the often mentioned 'default' in the
> > naming should reflect this what is used most often in the area, JDOM is
> > explicitly addressing (manipulating XML data structures with Java). And
> > this naming should be short, significant and free of repetitions (like
> > it is more or less now). Everything else is wasting resources (time,
> > nerves and entropy) like DOM is doing.
>
> Time? Nerves? Entropy? I'm amazed at the amount of energy you must be
> spending on those extra 8 or so characters! ;-)

>
>
> >
> > All I can do now is waiting, hoping that JDOM continues this promising
> > way it started, that we can silently bury our own DOM wrapper (in
> > honour, of course), because its major drawback is (apart from the owner
> > document reference, which I cannot get rid of) that it not automaticly
> > follows the evolving of the standards. Otherwise I have to implement
> > proper namespace support soon (ahrrrg... some fearless hero has to do
> > such things, but I'm through with implementing DOM features for some
> > time).
>
> But namespaces are not needed, and can just be ignored, right? I mean,
> that's what you are implying - if you don't need it, just trash it. I
> don't get your reasoning here, it is contrary to the entire rest of your
> email!

Hey, when you read again what I wrote here you will see that I don't mean I don't need
namespaces and therefore left them out. It is just, that we started with our
implementation at a time where namespaces were not ready specified. But we need it, and
therefore I have the 'sword of Damocles' hanging over me to implement it when we stick
to our wrapper. So I hope I can use e.g. JDOM or something else (I think I'm very much
in favour for JDOM :-) ) which implements proper the standards ;-), and also supports
namespaces but provides /also/ a simpler access to an XML document. Again, I'm not
arguing to exclude anything.

It is just if you want to provide really a simpler API for XML access, it should /also/
include functions that are not so low-level, that you always have to wrestle with all
the (of course necessary) physical structures of an XML document.

>
>
> >
> > But, joking apart, I have the impression that JDOM can get a cool
> > standard, because of its clear and useful design approach and the
>
> I'm really confused now ;-_ We should'nt follow standards, but we should
> be a standard - how does that work ;-)
>
> -Brett
>
> > competence and influence, which is assembled here.
> >
> > If you are here then you probably read all the stuff before, so thanks
> > for your time.
> >
> > Joerg
> >

Joerg


--
<?xml version="1.0" standalone="no" encoding="ISO-8859-1"?>
<businesscard>
  <bookmarks>
    <bookmark> http://www.google.com </bookmark>
    <bookmark> http://www.gnu.org </bookmark>
    <bookmark> http://java.apache.org </bookmark>
    <bookmark> http://www.w3.org </bookmark>
    <bookmark> http://www-db.stanford.edu/lore </bookmark>
  </bookmarks>
  <sender>
    <firstname> Jörg </firstname>
    <name> Kirchhof </name>
    <title> Dipl.-Ing. </title>
    <email> joerg at freiheit.com </email>
    <phone> +49-40-890584-14 </phone>
  </sender>
  <company>
    <name> freiheit.com Technologieberatung GmbH </name>
    <address> Theodorstraße 42-90 </address>
    <zip> 22761 </zip>
    <city> Hamburg </city>
    <country> Germany </country>
    <phone> +49-40-890584-0 </phone>
    <fax> +49-40-890584-20 </fax>
    <www> http://www.freiheit.com </www>
  </company>
</businesscard>





More information about the jdom-interest mailing list