From jdom at tuis.net Tue Sep 4 03:41:55 2012 From: jdom at tuis.net (Rolf Lear) Date: Tue, 04 Sep 2012 06:41:55 -0400 Subject: [jdom-interest] Preparing for JDOM 2.0.3 Message-ID: <5045DAF3.7010108@tuis.net> Hi All. A few issues have been identified in JDOM over the past few weeks. When the first issue was resolved (unable to serialize subclasses of Element outside org.jdom2 package) I promised to release 2.0.3 over this past weekend, but a second (low priority) issue was identified (lack of support for specific JAXP factory). Additionally, 'Canadian Wilf' showed interest in improving the performance of the Verifier code. As a result, I have been working with Wilf to get the Verifier code 'fast'. Taken together, it all means that I have 'slipped' this release date... Right now the performance changes have been completed successfully, with the Verifier now running in about one third of the time it used to. This speeds up parsing considerably. THere is a wiki page documenting the process here: https://github.com/hunterhacker/jdom/wiki/Verifier-Performance I have just built the 'hotfix' package containing all fixes since JDOM 2.0.2 and posted it to github here: https://github.com/hunterhacker/jdom/downloads I intend to release the full 2.0.3 package on this coming weekend (slipping the 2.0.3 release date by 1 week). Thanks & Happy Coding Rolf From noel at peralex.com Thu Sep 6 06:40:35 2012 From: noel at peralex.com (Noel Grandin) Date: Thu, 06 Sep 2012 15:40:35 +0200 Subject: [jdom-interest] Preparing for JDOM 2.0.3 In-Reply-To: <5045DAF3.7010108@tuis.net> References: <5045DAF3.7010108@tuis.net> Message-ID: <5048A7D3.7070701@peralex.com> Very nice work! On 2012-09-04 12:41, Rolf Lear wrote: > Hi All. > > A few issues have been identified in JDOM over the past few weeks. > When the first issue was resolved (unable to serialize subclasses of > Element outside org.jdom2 package) I promised to release 2.0.3 over > this past weekend, but a second (low priority) issue was identified > (lack of support for specific JAXP factory). > > Additionally, 'Canadian Wilf' showed interest in improving the > performance of the Verifier code. As a result, I have been working > with Wilf to get the Verifier code 'fast'. > > Taken together, it all means that I have 'slipped' this release date... > > Right now the performance changes have been completed successfully, > with the Verifier now running in about one third of the time it used > to. This speeds up parsing considerably. THere is a wiki page > documenting the process here: > https://github.com/hunterhacker/jdom/wiki/Verifier-Performance > > I have just built the 'hotfix' package containing all fixes since JDOM > 2.0.2 and posted it to github here: > https://github.com/hunterhacker/jdom/downloads > > I intend to release the full 2.0.3 package on this coming weekend > (slipping the 2.0.3 release date by 1 week). > > Thanks & Happy Coding > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > > Disclaimer: http://www.peralex.com/disclaimer.html From jdom at tuis.net Fri Sep 7 11:48:01 2012 From: jdom at tuis.net (Rolf Lear) Date: Fri, 07 Sep 2012 14:48:01 -0400 Subject: [jdom-interest] Fwd: XML 1.1 -- Please stab me with a dull knife and trample my dead body In-Reply-To: References: <5049F11E.8050004@saxonica.com> Message-ID: <82068eb04129387f7b45c8e8893644ab@tuis.net> Hi Wilf. You are getting your wires crossed..... In your mail you referenced parsed and external entities. These have nothing to do with PCDATA (parsed character data - regular XML text), and CDATA (unparsed character data - ) Michael was answering your question based on the 'entities', where as you want the details on the 'PCDATA' and the 'CDATA'. So, forget about the 'entity' references, and focus on the valid character data for XML. The only difference between CDATA (character blocks between ) and PCDATA (element 'text'), is that the XML Parser will look for '<' and '&' characters in PCDATA, but not in CDATA. With the correct escaping, all CDATA content can be expressed as PCDATA content. This does not help you though, because not all Java 'char' characters are valid Unicode characters, and thus not all chars are valid as either CDATA or PCDATA. In XML 1.0 this distinction was clear. In XML 1.1 I am not certain how to interpret the difference between 'Chars' and 'RestrictedChars': http://www.w3.org/TR/xml11/#charsets JDOM takes a 1.0 perspective on Characters... which may be a problem, but it is not going to solve your issues even if it supports 1.1 chars. Rolf On Fri, 7 Sep 2012 08:45:33 -0700, Canadian Wilf wrote: > Then what is the proper mode: > > Element e = new Element("foo") > > Should I do this: > > e.setText(string_of_sanitized_data_with_illegal_characters_escaped); > > or > > e.setText(any_text); > > > Wilf > > > On Fri, Sep 7, 2012 at 6:05 AM, Michael Kay wrote: > >> No, that's all wrong. The contents of an unparsed entity are always an >> external resource, they are never part of a text or attribute node. >> Parsed >> entities do become part of the content, but they must always use the XML >> character set. >> >> Michael Kay >> Saxonica >> >> On 07/09/2012 13:10, Canadian Wilf wrote: >> >> According to the xml 1.1 spec: >> >> 4 Physical Structures ... >>> [Definition: An *unparsed entity* is a resource whose contents may or >>> may not be text , and if text, may >>> be other than XML. Each unparsed entity has an associated >>> notation, >>> identified by name. Beyond a requirement that an XML processor make the >>> identifiers for the entity and notation available to the application, >>> XML >>> places no constraints on the contents of unparsed entities.] >> >> >> >> AND >> >> Entities may be either parsed or unparsed. [Definition: The contents of >>> a *parsed entity* are referred to as its replacement >>> text; >>> this text is considered an >>> integral part of the document.] >> >> [Definition: An *unparsed entity* is a resource whose contents may or may >>> not be text , and if text, may be >>> other than XML. Each unparsed entity has an associated >>> notation, >>> identified by name. Beyond a requirement that an XML processor make the >>> identifiers for the entity and notation available to the application, >>> XML >>> places no constraints on the contents of unparsed entities.] >>> Parsed entities are invoked by name using entity references; unparsed >>> entities by name, given in the value of *ENTITY* or *ENTITIES* >>> attributes. >> >> >> >> In the current JDOM version, Element method setText(string) and also >> addContent(CDATA) refuses text that contains illegal characters. It is >> treating the data provided as 'parsed' when it should by the spec be >> treating it as free content. >> >> I understand: >> >> 1) The xml 1.1 spec defines a parsed entity as its 'replacement text'. >> >> 2) Replacement text' would refer to the actual textual makeup of a >> serialized Element, not the data an Element holds in a Text content >> element >> >> >> Then, if the above is true, the current implementation is actually wrong >> to verify data. >> >> I propose that JDOM stop verifying data set as Element text and CDATA >> and leave it to the xerces (or whatever) to make sure the document is >> proper 1.1. >> >> Am I understanding everything correctly? >> >> Thoughts? >> >> ---------- Forwarded message ---------- >> From: Canadian Wilf >> Date: Thu, Sep 6, 2012 at 9:52 PM >> Subject: XML 1.1 -- Please stab me with a dull knife and trample my dead >> body >> To: jdom-interest at jdom.org >> >> >> Hi All, >> >> I just learned that in order to safely use JDOM2, I will need to >> sanitize my Element .setText(string) so that the parsed data does not >> contain verboten characters under the XML 1.1 spec. >> >> I have an ascii processor and it needs to be able to use xml as a >> document format. Unfortunately, not all ascii is allowed in an Element >> text. >> >> Stab me with a dull knife and trample my dead body. But ..... please >> please please don't make me sanitize all my data before putting it into >> XML >> Elements. >> >> 1) It makes my programming task much more cumbersome because I must >> ensure not to feed any of the new verboten and doomed ascii/UTF-8 >> characters to store as xml text. >> >> 2) No one uses xml 1.1, do they? >> >> 3) It slows down the parsing (a very small amount) with all the element >> text checking. >> >> Now that JDOM2 is xml 1.1 compatible, is there any turning back. Can >> this be undone? >> >> Does everyone understand that their software will bust if data provided >> as text is not adhering to the new standard? >> >> What about you? How do you deal with it when using the libraries? >> >> Wilf >> >> >> >> _______________________________________________ >> To control your jdom-interest >> membership:http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> >> >> >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> From curoli at gmail.com Fri Sep 7 13:22:29 2012 From: curoli at gmail.com (Oliver Ruebenacker) Date: Fri, 7 Sep 2012 16:22:29 -0400 Subject: [jdom-interest] Fwd: XML 1.1 -- Please stab me with a dull knife and trample my dead body In-Reply-To: References: <5049F11E.8050004@saxonica.com> <82068eb04129387f7b45c8e8893644ab@tuis.net> Message-ID: Hello, On Fri, Sep 7, 2012 at 3:17 PM, Canadian Wilf wrote: > Let's focus on valid character data for xml. How to do this: > > String s = someRandomBytesNowAsString(); Java Strings are not actually random bytes. The bytes are UTF-16, if I remember correctly. > Element e = new Element("random") > e.setText(s) or e.addContent(new CDATA(s)) > > Currently this will fail. Sorry, you lost me here. How will this fail? Will it throw an exception? Or will it otherwise do something undesired? Maybe I'm missing something, but it sounds to me as if you are referring to specs that apply to XML character streams and not to JDOM objects. Take care Oliver >.. Which seems wrong because I should be able to > send whatever data I want as text in xml content. > > What use is xml (1.0 or 1.1) if I cannot represent various data? Is the > solution to make a custom escaper for my data? > > e.setText(encodeSpecial(s)) and decodeSpecial(e.getText()) > > Crazy! > > Wilf > > > On Fri, Sep 7, 2012 at 11:48 AM, Rolf Lear wrote: >> >> >> Hi Wilf. >> >> You are getting your wires crossed..... In your mail you referenced parsed >> and external entities. These have nothing to do with PCDATA (parsed >> character data - regular XML text), and CDATA (unparsed character data - >> ) >> >> Michael was answering your question based on the 'entities', where as you >> want the details on the 'PCDATA' and the 'CDATA'. >> >> So, forget about the 'entity' references, and focus on the valid character >> data for XML. >> >> The only difference between CDATA (character blocks between > ]]> ) and PCDATA (element 'text'), is that the XML Parser will look for >> '<' and '&' characters in PCDATA, but not in CDATA. >> >> With the correct escaping, all CDATA content can be expressed as PCDATA >> content. >> >> This does not help you though, because not all Java 'char' characters are >> valid Unicode characters, and thus not all chars are valid as either CDATA >> or PCDATA. >> >> In XML 1.0 this distinction was clear. >> >> In XML 1.1 I am not certain how to interpret the difference between >> 'Chars' and 'RestrictedChars': http://www.w3.org/TR/xml11/#charsets >> >> JDOM takes a 1.0 perspective on Characters... which may be a problem, but >> it is not going to solve your issues even if it supports 1.1 chars. >> >> Rolf >> >> >> >> >> On Fri, 7 Sep 2012 08:45:33 -0700, Canadian Wilf >> wrote: >> > Then what is the proper mode: >> > >> > Element e = new Element("foo") >> > >> > Should I do this: >> > >> > e.setText(string_of_sanitized_data_with_illegal_characters_escaped); >> > >> > or >> > >> > e.setText(any_text); >> > >> > >> > Wilf >> > >> > >> > On Fri, Sep 7, 2012 at 6:05 AM, Michael Kay wrote: >> > >> >> No, that's all wrong. The contents of an unparsed entity are always an >> >> external resource, they are never part of a text or attribute node. >> >> Parsed >> >> entities do become part of the content, but they must always use the >> XML >> >> character set. >> >> >> >> Michael Kay >> >> Saxonica >> >> >> >> On 07/09/2012 13:10, Canadian Wilf wrote: >> >> >> >> According to the xml 1.1 spec: >> >> >> >> 4 Physical Structures ... >> >>> [Definition: An *unparsed entity* is a resource whose contents may or >> >>> may not be text , and if text, >> may >> >>> be other than XML. Each unparsed entity has an associated >> >>> notation, >> >>> identified by name. Beyond a requirement that an XML processor make >> the >> >>> identifiers for the entity and notation available to the application, >> >>> XML >> >>> places no constraints on the contents of unparsed entities.] >> >> >> >> >> >> >> >> AND >> >> >> >> Entities may be either parsed or unparsed. [Definition: The contents >> of >> >>> a *parsed entity* are referred to as its replacement >> >>> text; >> >>> this text is considered an >> >>> integral part of the document.] >> >> >> >> [Definition: An *unparsed entity* is a resource whose contents may or >> may >> >>> not be text , and if text, may be >> >>> other than XML. Each unparsed entity has an associated >> >>> notation, >> >>> identified by name. Beyond a requirement that an XML processor make >> the >> >>> identifiers for the entity and notation available to the application, >> >>> XML >> >>> places no constraints on the contents of unparsed entities.] >> >>> Parsed entities are invoked by name using entity references; unparsed >> >>> entities by name, given in the value of *ENTITY* or *ENTITIES* >> >>> attributes. >> >> >> >> >> >> >> >> In the current JDOM version, Element method setText(string) and also >> >> addContent(CDATA) refuses text that contains illegal characters. It is >> >> treating the data provided as 'parsed' when it should by the spec be >> >> treating it as free content. >> >> >> >> I understand: >> >> >> >> 1) The xml 1.1 spec defines a parsed entity as its 'replacement >> text'. >> >> >> >> 2) Replacement text' would refer to the actual textual makeup of a >> >> serialized Element, not the data an Element holds in a Text content >> >> element >> >> >> >> >> >> Then, if the above is true, the current implementation is actually >> wrong >> >> to verify data. >> >> >> >> I propose that JDOM stop verifying data set as Element text and CDATA >> >> and leave it to the xerces (or whatever) to make sure the document is >> >> proper 1.1. >> >> >> >> Am I understanding everything correctly? >> >> >> >> Thoughts? >> >> >> >> ---------- Forwarded message ---------- >> >> From: Canadian Wilf >> >> Date: Thu, Sep 6, 2012 at 9:52 PM >> >> Subject: XML 1.1 -- Please stab me with a dull knife and trample my >> dead >> >> body >> >> To: jdom-interest at jdom.org >> >> >> >> >> >> Hi All, >> >> >> >> I just learned that in order to safely use JDOM2, I will need to >> >> sanitize my Element .setText(string) so that the parsed data does not >> >> contain verboten characters under the XML 1.1 spec. >> >> >> >> I have an ascii processor and it needs to be able to use xml as a >> >> document format. Unfortunately, not all ascii is allowed in an Element >> >> text. >> >> >> >> Stab me with a dull knife and trample my dead body. But ..... please >> >> please please don't make me sanitize all my data before putting it into >> >> XML >> >> Elements. >> >> >> >> 1) It makes my programming task much more cumbersome because I must >> >> ensure not to feed any of the new verboten and doomed ascii/UTF-8 >> >> characters to store as xml text. >> >> >> >> 2) No one uses xml 1.1, do they? >> >> >> >> 3) It slows down the parsing (a very small amount) with all the >> element >> >> text checking. >> >> >> >> Now that JDOM2 is xml 1.1 compatible, is there any turning back. Can >> >> this be undone? >> >> >> >> Does everyone understand that their software will bust if data >> provided >> >> as text is not adhering to the new standard? >> >> >> >> What about you? How do you deal with it when using the libraries? >> >> >> >> Wilf >> >> >> >> >> >> >> >> _______________________________________________ >> >> To control your jdom-interest >> >> >> >> membership:http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> >> >> >> >> >> >> >> _______________________________________________ >> >> To control your jdom-interest membership: >> >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> >> > > > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com -- Scientific Developer at PanGenX (http://www.pangenx.com) "Stagnation and the search for truth are always opposites." - Nadezhda Tolokonnikova From bjorn at xowave.com Fri Sep 7 15:27:28 2012 From: bjorn at xowave.com (Bjorn Roche) Date: Fri, 7 Sep 2012 18:27:28 -0400 Subject: [jdom-interest] Fwd: XML 1.1 -- Please stab me with a dull knife and trample my dead body In-Reply-To: References: <5049F11E.8050004@saxonica.com> <82068eb04129387f7b45c8e8893644ab@tuis.net> Message-ID: <7544681B-675D-481C-8FB9-017BC9FEC5CC@xowave.com> On Sep 7, 2012, at 4:43 PM, Canadian Wilf wrote: > I can do this: > > String random = new String(someRandomByte[]) Let me address this by pointing out a degenerate case. Strings in java are terminated by the null char (er, I think. Wow, it's been a while since I learned this insanely basic thing). If your someRandomBytes contains two consecutive zero bytes (= a single zero char), then the string "random" will obviously not be what you wanted, because it will end early -- if you are lucky. Another example is if the "someRandomByte" ends in the first half of a unicode codepoint. What happens then? So, yes you can construct a string from a byte array like you did here but please don't! RTFM: "The behavior of this constructor when the given bytes are not valid in the default charset is unspecified." Unspecified. As in "it might delete your hard drive, log on to facebook and unfriend your wife." That's what unspecified means, so those bytes need to be "sanitized" too. If that's the kind of data you want to put in XML (raw, random-assed binary), use Base64! > However, the string cannot be passed to the Text of an XML Element since it may contain illegal characters (<= 0X20 ascii, vertical tab, etc.) This will fail: > > new Element("test").setText(random) > > XOM and JDOM both restrict the access and will throw IllegalDataException if one of the characters (0x--0xFFFF) is not in XML Unicode specs. First off, I think maybe you should read this because we are not talking about 0x0 to 0xFFFF: http://www.joelonsoftware.com/articles/Unicode.html Secondly, yes there are values that must be escaped in XML. For example < and > for obvious reasons, but the library does this for you. Then there are values you can't put into XML at all. These fall into other categories. "not valid in a string" (eg the NULL character usually used as a string terminator) is one. Yes, that's right, you can't put 0x00 in an XML string, 'cause you can't put it in a string! OMG! Stop the presses! I also find this annoying, and have been bitten by it (I think it was 0x17 or something), but that's life. I agree, however, it would be nice to have some clarity on exactly what's allowed. When in doubt, use Base64! Or create sub elements for the weird chars, just like html does for, say, newlines:
bjorn ----------------------------- Bjorn Roche http://www.xonami.com Audio Collaboration http://blog.bjornroche.com From jdom at tuis.net Fri Sep 7 16:29:15 2012 From: jdom at tuis.net (Rolf Lear) Date: Fri, 07 Sep 2012 19:29:15 -0400 Subject: [jdom-interest] Fwd: XML 1.1 -- Please stab me with a dull knife and trample my dead body In-Reply-To: <82068eb04129387f7b45c8e8893644ab@tuis.net> References: <5049F11E.8050004@saxonica.com> <82068eb04129387f7b45c8e8893644ab@tuis.net> Message-ID: <504A834B.5050706@tuis.net> So, I have been studying up on the Chars and RestrictedChars in the XML1.1 spec. My personal feeling is that the RestrictedChars mechanism for specifying the document format is somewhat complicated, but I now believe I have 'grokked' it. It all boils down to these four constraints: 1. There are two sets of Characters defined for XML: Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F] RestrictedChar is a subset of Char 2. a valid XML *unparsed* document is defined as: document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) 3. prolog, element, and Misc are all (indirectly) constrained to 'Char' based characters. 4. Character and entity references must resolve to data from the 'Char' set... http://www.w3.org/TR/xml11/#sec-references Based on the four statements above it is apparent that a valid document consists of a prolog (which may be empty), an element (which must exist), and followed by optional comments, PI's and whitespace. Further, there are not allowed to be any restricted chars in the *unparsed* document anywhere. But, a big difference between XML 1.0 and 1.1 is that the Char dataset for 1.1 is larger than 1.0 (it includes [#x1-#xD7FF] instead of 'just' #x9 | #xA | #xD | [#x20-#xD7FF] ) So, XML 1.1 includes all the low-value control characters.... but, it *Restricts* them from appearing *raw* in the unparsed document. It goes even further, and it also restricts the following chars in the *unparsed* document: [#x7F-#x84] | [#x86-#x9F]. In XML 1.1 though, you can use a char reference to display these restricted chars like  Unfortunately for you, Wilf, XML 1.1 still makes the following Java char values illegal as XML characters: 0x0000, 0xD800-0xDFFF, and 0xFFFF JDOM 2.x follows JDOM 1.x and allows the set of characters defined for XML 1.0. This is likely a problem. Unfortunately, it is not easily possible for JDOM to 'infer' whether it is working with an XML 1.0 or 1.1 document. Perhaps this needs some thought. Rolf On 07/09/2012 2:48 PM, Rolf Lear wrote: > > Hi Wilf. > > You are getting your wires crossed..... In your mail you referenced parsed > and external entities. These have nothing to do with PCDATA (parsed > character data - regular XML text), and CDATA (unparsed character data - > ) > > Michael was answering your question based on the 'entities', where as you > want the details on the 'PCDATA' and the 'CDATA'. > > So, forget about the 'entity' references, and focus on the valid character > data for XML. > > The only difference between CDATA (character blocks between ]]> ) and PCDATA (element 'text'), is that the XML Parser will look for > '<' and '&' characters in PCDATA, but not in CDATA. > > With the correct escaping, all CDATA content can be expressed as PCDATA > content. > > This does not help you though, because not all Java 'char' characters are > valid Unicode characters, and thus not all chars are valid as either CDATA > or PCDATA. > > In XML 1.0 this distinction was clear. > > In XML 1.1 I am not certain how to interpret the difference between > 'Chars' and 'RestrictedChars': http://www.w3.org/TR/xml11/#charsets > > JDOM takes a 1.0 perspective on Characters... which may be a problem, but > it is not going to solve your issues even if it supports 1.1 chars. > > Rolf > > > > > On Fri, 7 Sep 2012 08:45:33 -0700, Canadian Wilf > wrote: >> Then what is the proper mode: >> >> Element e = new Element("foo") >> >> Should I do this: >> >> e.setText(string_of_sanitized_data_with_illegal_characters_escaped); >> >> or >> >> e.setText(any_text); >> >> >> Wilf >> >> >> On Fri, Sep 7, 2012 at 6:05 AM, Michael Kay wrote: >> >>> No, that's all wrong. The contents of an unparsed entity are always an >>> external resource, they are never part of a text or attribute node. >>> Parsed >>> entities do become part of the content, but they must always use the > XML >>> character set. >>> >>> Michael Kay >>> Saxonica >>> >>> On 07/09/2012 13:10, Canadian Wilf wrote: >>> >>> According to the xml 1.1 spec: >>> >>> 4 Physical Structures ... >>>> [Definition: An *unparsed entity* is a resource whose contents may or >>>> may not be text , and if text, > may >>>> be other than XML. Each unparsed entity has an associated >>>> notation, >>>> identified by name. Beyond a requirement that an XML processor make > the >>>> identifiers for the entity and notation available to the application, >>>> XML >>>> places no constraints on the contents of unparsed entities.] >>> >>> >>> >>> AND >>> >>> Entities may be either parsed or unparsed. [Definition: The contents > of >>>> a *parsed entity* are referred to as its replacement >>>> text; >>>> this text is considered an >>>> integral part of the document.] >>> >>> [Definition: An *unparsed entity* is a resource whose contents may or > may >>>> not be text , and if text, may be >>>> other than XML. Each unparsed entity has an associated >>>> notation, >>>> identified by name. Beyond a requirement that an XML processor make > the >>>> identifiers for the entity and notation available to the application, >>>> XML >>>> places no constraints on the contents of unparsed entities.] >>>> Parsed entities are invoked by name using entity references; unparsed >>>> entities by name, given in the value of *ENTITY* or *ENTITIES* >>>> attributes. >>> >>> >>> >>> In the current JDOM version, Element method setText(string) and also >>> addContent(CDATA) refuses text that contains illegal characters. It is >>> treating the data provided as 'parsed' when it should by the spec be >>> treating it as free content. >>> >>> I understand: >>> >>> 1) The xml 1.1 spec defines a parsed entity as its 'replacement > text'. >>> >>> 2) Replacement text' would refer to the actual textual makeup of a >>> serialized Element, not the data an Element holds in a Text content >>> element >>> >>> >>> Then, if the above is true, the current implementation is actually > wrong >>> to verify data. >>> >>> I propose that JDOM stop verifying data set as Element text and CDATA >>> and leave it to the xerces (or whatever) to make sure the document is >>> proper 1.1. >>> >>> Am I understanding everything correctly? >>> >>> Thoughts? >>> >>> ---------- Forwarded message ---------- >>> From: Canadian Wilf >>> Date: Thu, Sep 6, 2012 at 9:52 PM >>> Subject: XML 1.1 -- Please stab me with a dull knife and trample my > dead >>> body >>> To: jdom-interest at jdom.org >>> >>> >>> Hi All, >>> >>> I just learned that in order to safely use JDOM2, I will need to >>> sanitize my Element .setText(string) so that the parsed data does not >>> contain verboten characters under the XML 1.1 spec. >>> >>> I have an ascii processor and it needs to be able to use xml as a >>> document format. Unfortunately, not all ascii is allowed in an Element >>> text. >>> >>> Stab me with a dull knife and trample my dead body. But ..... please >>> please please don't make me sanitize all my data before putting it into >>> XML >>> Elements. >>> >>> 1) It makes my programming task much more cumbersome because I must >>> ensure not to feed any of the new verboten and doomed ascii/UTF-8 >>> characters to store as xml text. >>> >>> 2) No one uses xml 1.1, do they? >>> >>> 3) It slows down the parsing (a very small amount) with all the > element >>> text checking. >>> >>> Now that JDOM2 is xml 1.1 compatible, is there any turning back. Can >>> this be undone? >>> >>> Does everyone understand that their software will bust if data > provided >>> as text is not adhering to the new standard? >>> >>> What about you? How do you deal with it when using the libraries? >>> >>> Wilf >>> >>> >>> >>> _______________________________________________ >>> To control your jdom-interest >>> > membership:http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >>> >>> >>> >>> _______________________________________________ >>> To control your jdom-interest membership: >>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >>> > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From jdom at tuis.net Tue Sep 11 03:59:29 2012 From: jdom at tuis.net (Rolf Lear) Date: Tue, 11 Sep 2012 06:59:29 -0400 Subject: [jdom-interest] JDOM 2.0.3 released - special note for Maven users Message-ID: <504F1991.1050807@tuis.net> Hi all. JDOM 2.0.3 is now available from the regular locations, unless you are a maven user, in which case, it is not the normal location! See the maven notes at the end.... The changes for 2.0.3 are as follows: Bugs: Fixes Issue 88 - makes subclasses of JDOM content serializable even if they are not in the org.jdom2 package. Fixes Issue 90 - fixes a false-positive check for Attributes. See the issue for the details. Features: Fixes Issue 89 - extends the SAX processing in JDOM to allow specific (named) JAXP factories to be used Fixes Issue 91 - A performance improvement for AttributeList Fixes Issue 92 - Performance improvements for Verifier No issue, but includes performance improvements to the regular ContentList. Procedural: Resolves Issue 87 - The name for the JDOM artifact in maven-central: JDOM 2.x from now on will be released in to the jdom2 artifact, instead of the jdom artifact. Please download the package from: https://github.com/downloads/hunterhacker/jdom/jdom-2.0.3.zip Maven Users =========== If you use maven to access your JDOM resources, please note that this release was not made to the jdom artifact, but to the jdom2 artifact. This has all sorts of implications, but, I am assured, that this is the best way to reduce the headaches that were created when jdom 2.0.0 was first released. Please see the notes on issue #87 to understand the reasons for why this decision was made... https://github.com/hunterhacker/jdom/issues/87 In future, maven users should only reference JDOM 1.x versions from the jdom artifact, and all JDOM 2.x versions should be referenced from the jdom2 artifact. Happy Coding Rolf From jdom at tuis.net Wed Sep 12 06:48:05 2012 From: jdom at tuis.net (Rolf Lear) Date: Wed, 12 Sep 2012 09:48:05 -0400 Subject: [jdom-interest] Pending fix for issue #93 Message-ID: Hi all. Recently issue #93 was filed (this morning), and I have a fix out for it already.... https://github.com/hunterhacker/jdom/issues/93 This issue relates to using JDOM in a security-constrained environment (in this case, an applet). The actual issue is that some of the JDOM code references System.getProperties(), and some properties are not accessible from Applets. This issue is contained within a very limited scope of JDOM usage, so it should have no impact on regular JDOM users. Still, you should probably be aware of it. The issue has been fixed, and there is a hotfix package of JDOM with the fix available on the github download site. I will be scheduling a formal release of JDOM 2.0.4 for the October timeframe unless something else comes up before that. Thanks all Rolf From jdom at tuis.net Wed Sep 12 15:35:16 2012 From: jdom at tuis.net (Rolf Lear) Date: Wed, 12 Sep 2012 18:35:16 -0400 Subject: [jdom-interest] HotFix packages on GitHub Message-ID: <50510E24.2000207@tuis.net> Hi all. During the 2.x process I have uploaded a number of files to GitHub here: https://github.com/hunterhacker/jdom/downloads There are 'real' packages (2.0.0, 2.0.1, 2.0.2, and 2.0.3) as well as real support files (jdom2-dev-jars.zip). There are also a lot of 'low value' files, like the early beta versions of 2.x, and the various *issue*.zip interim fix packages. I intend to remove all except the 'current' issue packages, and I intend to remove the BETA packages. This would leave just the important stuff behind. Can anyone think of any reason to keep these 'low value' files? They are just taking up space..... aren't they? Unless I hear otherwise, I will remove the cruft this coming weekend.... Rolf From mike at saxonica.com Thu Sep 13 00:08:01 2012 From: mike at saxonica.com (Michael Kay) Date: Thu, 13 Sep 2012 08:08:01 +0100 Subject: [jdom-interest] Performance measurements with Saxon Message-ID: <50518651.4020309@saxonica.com> JDOM2 is now working as an external object model for Saxon. We've done some performance measurements which are summarised here: http://dev.saxonica.com/blog/mike/2012/09/index.html#000194 These figures show that of all the external object models, JDOM2 now comes second (to XOM) in the league. The Saxon driver for XOM is probably the most carefully tuned of all the drivers, which may have something to do with it; also, I believe that XOM added features explicitly for Saxon's use, to make sorting of nodes into document order more efficient. A more detailed breakdown of the results for JDOM1 and JDOM2 is given below. The first group of results are for JDOM1, the second group for JDOM2. For each query in the XMark benchmark, they show the execution time in seconds running against a 1Mb source document; the driver executes each query repeatedly until 1000 iterations or 30 seconds have elapsed. There's a consistent speed-up between JDOM and JDOM2. In the cases where the speed-up is greatest, however, this is in part because of improvements in the Saxon "wrapper": instead of using our own general-purpose implementation of the descendant axis, we now make use of Parent.getDescendants(). In this measurements, JDOM2 has slightly lower memory requirements but slightly higher tree-building time; but I wouldn't be 100% confident that either figure is consistent. Our intention is to release Saxon 9.5 (when it's ready) with support for both JDOM and JDOM2. Michael Kay Saxonica From jdom at tuis.net Thu Sep 13 06:19:40 2012 From: jdom at tuis.net (Rolf Lear) Date: Thu, 13 Sep 2012 09:19:40 -0400 Subject: [jdom-interest] Performance measurements with Saxon In-Reply-To: <50518651.4020309@saxonica.com> References: <50518651.4020309@saxonica.com> Message-ID: <9314207044bfed1bb22c2b905b38f336@tuis.net> Hi Michael. I look at those results and I am really pleased that JDOM 2.x is so much faster than JDOM 1.x on the query time (twice as fast as JDOM 1.x). There were a number of areas in JDOM 2.x that I focused on, memory footprint, iterator performance, and parse time. It is really good to see that the memory and iterator improvements are reflected in your 'independent' tests. Of course, it's also instinctive to be competitive.... and, in that light, I have to ask: - is it possible you can point me to the code you are using for the test (especially the 'wrapper layers' so I can inspect that code, and perhaps have a 'second opinion' to see whether the wrapper has room for improvement, and also whether JDOM can accommodate the Saxon logic more efficiently... I am willing (eager) to spend some time ensuring that the combination of JDOM and Saxon is as good as possible. - can you give an indication of what the baseline time is for the TinyTree query process? The ratios are good to compare one model against the other, but, creating the JDOM model takes 110ms less than XOM, and if the queries are taking just a few ms, then it stands to reason that JDOM2 outperforms XOM substantially for cases where. For example, if the Query takes 5ms, then JDOM can query the document 22 times in the time it takes XOM to query it once.... Finally, I already have a scheduled release for JDOM 2.0.4 for early October. If it is possible to 'link up' with your Saxon team I think it is worth working together so that I can have an even better combination of JDOM 2.x and Saxon for release 9.5 of Saxon.... would that be possible? It would also be great to get some feedback on the JDOM 2.x apis and whether the changes have made it easier (or harder) to integrate with Saxon.... a 'debriefing' would be nice. Thanks for the feedack on the performance though, it's great to see something independent. Rolf On Thu, 13 Sep 2012 08:08:01 +0100, Michael Kay wrote: > JDOM2 is now working as an external object model for Saxon. > > We've done some performance measurements which are summarised here: > > http://dev.saxonica.com/blog/mike/2012/09/index.html#000194 > > These figures show that of all the external object models, JDOM2 now > comes second (to XOM) in the league. The Saxon driver for XOM is > probably the most carefully tuned of all the drivers, which may have > something to do with it; also, I believe that XOM added features > explicitly for Saxon's use, to make sorting of nodes into document order > more efficient. > > A more detailed breakdown of the results for JDOM1 and JDOM2 is given > below. The first group of results are for JDOM1, the second group for > JDOM2. For each query in the XMark benchmark, they show the execution > time in seconds running against a 1Mb source document; the driver > executes each query repeatedly until 1000 iterations or 30 seconds have > elapsed. > > There's a consistent speed-up between JDOM and JDOM2. In the cases where > the speed-up is greatest, however, this is in part because of > improvements in the Saxon "wrapper": instead of using our own > general-purpose implementation of the descendant axis, we now make use > of Parent.getDescendants(). > > In this measurements, JDOM2 has slightly lower memory requirements but > slightly higher tree-building time; but I wouldn't be 100% confident > that either figure is consistent. > > Our intention is to release Saxon 9.5 (when it's ready) with support for > both JDOM and JDOM2. > > Michael Kay > Saxonica > From mike at saxonica.com Thu Sep 13 07:28:12 2012 From: mike at saxonica.com (Michael Kay) Date: Thu, 13 Sep 2012 15:28:12 +0100 Subject: [jdom-interest] Performance measurements with Saxon In-Reply-To: <9314207044bfed1bb22c2b905b38f336@tuis.net> References: <50518651.4020309@saxonica.com> <9314207044bfed1bb22c2b905b38f336@tuis.net> Message-ID: <5051ED7C.90901@saxonica.com> O'Neil is working on some refactoring of the wrapper code at the moment, he'll send you a copy when it's stable. We're trying to reduce proliferation so that improvements to algorithms only need to be made once. Generally these queries run far faster than the tree construction time. In the table I posted, "build-time" is the time to build the model in ms (say 177ms) and "avg" is the time to run the query in ms (0.04ms for the simplest queries, about 30ms for the most expensive). So you are right that if the model has to be built in order to run a single query or transformation, the build time can be more important than the query time. This is of course the scenario where lazy construction ought to play a role. (Most of the XMark queries are linear with document size assuming the Saxon-EE optimizer is available; if I remember right only one is quadratic. Of course with non-linear queries, the query time quickly overtakes the build time as the document size grows.) In this test we wanted to test our own builders, so we are building the tree programmatically rather than just invoking the parser; we haven't tested how this build time compares with the "native" build using the parser. The only case for using JDOM with Saxon in preference to using the TinyTree is where the model is built programmatically by a previous step in the processing pipeline, so this isn't an unreasonable thing to do. Michael Kay Saxonica On 13/09/2012 14:19, Rolf Lear wrote: > Hi Michael. > > I look at those results and I am really pleased that JDOM 2.x is so much > faster than JDOM 1.x on the query time (twice as fast as JDOM 1.x). > > There were a number of areas in JDOM 2.x that I focused on, memory > footprint, iterator performance, and parse time. It is really good to see > that the memory and iterator improvements are reflected in your > 'independent' tests. > > Of course, it's also instinctive to be competitive.... and, in that light, > I have to ask: > > - is it possible you can point me to the code you are using for the test > (especially the 'wrapper layers' so I can inspect that code, and perhaps > have a 'second opinion' to see whether the wrapper has room for > improvement, and also whether JDOM can accommodate the Saxon logic more > efficiently... I am willing (eager) to spend some time ensuring that the > combination of JDOM and Saxon is as good as possible. > > - can you give an indication of what the baseline time is for the TinyTree > query process? The ratios are good to compare one model against the other, > but, creating the JDOM model takes 110ms less than XOM, and if the queries > are taking just a few ms, then it stands to reason that JDOM2 outperforms > XOM substantially for cases where. For example, if the Query takes 5ms, > then JDOM can query the document 22 times in the time it takes XOM to query > it once.... > > > Finally, I already have a scheduled release for JDOM 2.0.4 for early > October. If it is possible to 'link up' with your Saxon team I think it is > worth working together so that I can have an even better combination of > JDOM 2.x and Saxon for release 9.5 of Saxon.... would that be possible? It > would also be great to get some feedback on the JDOM 2.x apis and whether > the changes have made it easier (or harder) to integrate with Saxon.... a > 'debriefing' would be nice. > > Thanks for the feedack on the performance though, it's great to see > something independent. > > Rolf > > On Thu, 13 Sep 2012 08:08:01 +0100, Michael Kay wrote: >> JDOM2 is now working as an external object model for Saxon. >> >> We've done some performance measurements which are summarised here: >> >> http://dev.saxonica.com/blog/mike/2012/09/index.html#000194 >> >> These figures show that of all the external object models, JDOM2 now >> comes second (to XOM) in the league. The Saxon driver for XOM is >> probably the most carefully tuned of all the drivers, which may have >> something to do with it; also, I believe that XOM added features >> explicitly for Saxon's use, to make sorting of nodes into document order >> more efficient. >> >> A more detailed breakdown of the results for JDOM1 and JDOM2 is given >> below. The first group of results are for JDOM1, the second group for >> JDOM2. For each query in the XMark benchmark, they show the execution >> time in seconds running against a 1Mb source document; the driver >> executes each query repeatedly until 1000 iterations or 30 seconds have >> elapsed. >> >> There's a consistent speed-up between JDOM and JDOM2. In the cases where >> the speed-up is greatest, however, this is in part because of >> improvements in the Saxon "wrapper": instead of using our own >> general-purpose implementation of the descendant axis, we now make use >> of Parent.getDescendants(). >> >> In this measurements, JDOM2 has slightly lower memory requirements but >> slightly higher tree-building time; but I wouldn't be 100% confident >> that either figure is consistent. >> >> Our intention is to release Saxon 9.5 (when it's ready) with support for >> both JDOM and JDOM2. >> >> Michael Kay >> Saxonica >> > > From jdom at tuis.net Sat Sep 15 10:58:18 2012 From: jdom at tuis.net (Rolf Lear) Date: Sat, 15 Sep 2012 13:58:18 -0400 Subject: [jdom-interest] Enhance jdom by OSGi support In-Reply-To: <50543246.9090806@gmx.net> References: <50543246.9090806@gmx.net> Message-ID: <5054C1BA.8080907@tuis.net> Hi Benjamin. An early issue was created in the JDOM 2.x process: https://github.com/hunterhacker/jdom/issues/6 This has been resolved, and JDOM 2.x has no classes/files in anything other than the org.jdom2.* namespace. This should make it easier to make a bundle from JDOM 2.x That's the good news. The bad news is that I know nothing about OSGi. I have no idea of what it takes to support that model. You mention Maven in your mail. At the moment the 'maven' word is a swear word in my home.... It's not likely (while I am maintaining JDOM) for the code base to be converted to a maven build process. There are some real reasons, and some emotional reasons, but fundamentally I regret having committed to producing a JDOM artifact on maven-central. If a maven build process for JDOM is a requirement of OSGi support then it is a 'no-go' for me. If maven is not required for creating a suitable OSGi system, then I will consider putting in effort to make it work on the following 'conditions': - there is some distinct reason why it is better for 'jdom' to create the bundle rather than some third-party (as you have already pointed out, other people seem to be making OSGi bundles for JDOM already...) - there is an OSGi expert who has 'round-trip' experience in making OSGi bundles who can take responsibility for the JDOM OSGi bundle (responsibility for either 'doing it', or alternatively being a 'mentor' for someone who 'does it', and then 'validates' the result). The expert also has to be available for some time to answer any issues that may come up. - there is no need for maven in the JDOM build - there is no need to change any signatures of the JDOM API - there is a relatively easy system for testing the bundle to ensure it works. I have learned (from the maven-central artifact for JDOM) that there are issues when trying to support some protocol/system that you don't understand. I do no know OSGi. I do not use it. I do not know its benefits even. I am not equipped to produce it. I cannot even learn enough about it to get to the point where I am expert enough to do it properly. There needs to be a committed OSGi expert involved. Rolf On 15/09/2012 3:46 AM, Benjamin Graf wrote: > Hi, > > is OSGi support still of any interest? Maybe have a look on > https://github.com/apache/servicemix4-bundles/tree/trunk/jdom-2.0.2 to > get the right manifest entries for jdom2. It might be useful to switch > the whole project to maven at give it standardized structure and let all > the magic been done by plugins (package type bundle) > > Any comments? > > Greets > Benjamin > > > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >