[jdom-interest] Parsing HTML elements

Rolf Lear jdom at tuis.net
Tue Nov 20 15:08:49 PST 2012


Hi Paul.

In the mail below I suggested using a parsing proxy. The term I meant to 
use is a 'Filter'. See this article here:

http://www.ibm.com/developerworks/xml/library/x-tipsaxfilter/

You can do some magic with 
http://www.jdom.org/docs/apidocs/org/jdom2/input/SAXBuilder.html#setXMLFilter(org.xml.sax.XMLFilter)

For example, your filter could exend 
http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html

and then override the method 
http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html#startElement(java.lang.String,%20java.lang.String,%20java.lang.String,%20org.xml.sax.Attributes)

to set the 'attrs' URI's correctly, and then call super.startElement(....).

Rolf

On 20/11/2012 12:14 PM, Rolf Lear wrote:
>
> Hmmm not using the default API.
>
> JDOM expects the getURI() method to have a value if there is a prefix
> for the attribute. This is reasonable... ;)
>
> This indicates the sax stream is broken. JDOM should be throwing
> "Namespace URIs must be non-null and non-empty Strings".
>
> If you cannot fic the SAX stream code, you can maybe write a proxy class
> that fixes the URIs as the events pass through.
>
> Rolf
>
>
> Rolf
>
> Paul Libbrecht <paul at hoplahup.net> wrote:
>
> Hello JDOm experts,
>
> I'm hitting a wall here and I am not sure who is responsible.
> Just like the previous series of post, I am trying to parse an HTML
> document.
> In this case I use the CyberNeko HTML parser
> http://nekohtml.sourceforge.net/ which creates a SAX stream hence is
> easily convertible to a JDOM document.
>
> Now, my big issue is that the document I have (which I cannot easily
> change right now) contains undeclared namespace-prefixed attribute-names!
>
> Do I have a way to predefine the namespace somewhere?
>
> thanks in advance
>
> Paul
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>
>
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>



More information about the jdom-interest mailing list