[jdom-interest] Parsing HTML elements

Paul Libbrecht paul at hoplahup.net
Wed Nov 21 12:59:44 PST 2012


Thanks Rolf,

that'd be the right thing indeed which I did not think of.

For now, I've implemented a replacement of the raw data... that is simpler.
I sure agree JDOM should refuse to do anything with undeclared prefixes.
I had tried to add namespace declarations within the factory but that has not been taken in account.

thanks.

Paul


Le 21 nov. 2012 à 00:08, Rolf Lear a écrit :

> Hi Paul.
> 
> In the mail below I suggested using a parsing proxy. The term I meant to use is a 'Filter'. See this article here:
> 
> http://www.ibm.com/developerworks/xml/library/x-tipsaxfilter/
> 
> You can do some magic with http://www.jdom.org/docs/apidocs/org/jdom2/input/SAXBuilder.html#setXMLFilter(org.xml.sax.XMLFilter)
> 
> For example, your filter could exend http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html
> 
> and then override the method http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html#startElement(java.lang.String,%20java.lang.String,%20java.lang.String,%20org.xml.sax.Attributes)
> 
> to set the 'attrs' URI's correctly, and then call super.startElement(....).
> 
> Rolf
> 
> On 20/11/2012 12:14 PM, Rolf Lear wrote:
>> 
>> Hmmm not using the default API.
>> 
>> JDOM expects the getURI() method to have a value if there is a prefix
>> for the attribute. This is reasonable... ;)
>> 
>> This indicates the sax stream is broken. JDOM should be throwing
>> "Namespace URIs must be non-null and non-empty Strings".
>> 
>> If you cannot fic the SAX stream code, you can maybe write a proxy class
>> that fixes the URIs as the events pass through.
>> 
>> Rolf
>> 
>> 
>> Rolf
>> 
>> Paul Libbrecht <paul at hoplahup.net> wrote:
>> 
>> Hello JDOm experts,
>> 
>> I'm hitting a wall here and I am not sure who is responsible.
>> Just like the previous series of post, I am trying to parse an HTML
>> document.
>> In this case I use the CyberNeko HTML parser
>> http://nekohtml.sourceforge.net/ which creates a SAX stream hence is
>> easily convertible to a JDOM document.
>> 
>> Now, my big issue is that the document I have (which I cannot easily
>> change right now) contains undeclared namespace-prefixed attribute-names!
>> 
>> Do I have a way to predefine the namespace somewhere?
>> 
>> thanks in advance
>> 
>> Paul
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>> 
>> 
>> 
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>> 
> 




More information about the jdom-interest mailing list