Re: javax.xml.transform and HTML

On Sep 4, 2008, at 14:43, Michael(tm) Smith wrote:

> Julian Reschke <julian.reschke@gmx.de>, 2008-09-03 10:10 +0200:
>
>> Henri Sivonen wrote:
>>> Oops. I didn't realize there was content after the signature.
>>> Is this commonly used? It's a rather unobvious use of a transform  
>>> package.
>>
>> I know it's commonly used for serializing XML (actually, as far as  
>> I recall,
>> it's the recommended way to do it when you have to rely on what the  
>> JDK
>> includes). Once you know it's there and realize that it includes HTML
>> serialization as well, it's kind of obvious to use it for that as  
>> well.
>>
>> That being said, I don't recall whether it was recommended  
>> anywhere. And no,
>> I don't know how common it is.
>>
>> Is there a better alternative that doesn't require including  
>> additional
>> packages?
>
> That seems like a really good question. Henri, I'd think that
> after as much exploration as you've done around XML processing in
> Java, if there were some better way, you might know about it. Does
> anything come to mind?
>
> Or wait, I now note that qualification of "doesn't require
> including additional packages"... which I guess gets back to what
> Julian had mentioned earlier about developers not being at liberty
> to install additional packages into Java environments on shared
> hosts where they need to do their work.

I don't know of any better way to get a SAX to XML or SAX to HTML  
serializer from the APIs provided by the JDK.

Although I hadn't been aware of the JDK including the Xalan serializer  
behind TrAX, I was unaware that it can be used without a transform  
before Julian mentioned it. That is, I didn't know that you can use a  
Transformer without loading transform into it. (And still, before I  
form an opinion on whether doing so makes sense, I want to step  
through the process in a debugger to find out what exactly happens  
between the SAX events going into the empty Transformer and the  
OutputStream coming out.)

So far, I have used three ways to serialize SAX to XML in Java.

First, I use the serializer from GNU JAXP. Using it has become  
increasingly difficult as GNU JAXP started to depend on GCJ stuff and  
stopped being fully functional on a pure JRE.

Then I started using the Xalan serializer as shipped by the Apache  
Software Foundation (i.e. not depending on the Sun-private copy inside  
the JDK). I got increasingly annoyed by the way it handled Namespaces,  
it not sanitizing non-XML characters, the verbosity of instantiating  
it and the slowness of reaction to
https://issues.apache.org/jira/browse/XALANJ-2419

Now I am using a SAX to XML serializer that I wrote myself. It has no  
configurability, has no factories or providers, sanitizes non-XML  
characters in content, obeys my sense of Namespace aesthetics and is  
contained in one .java file.

For serializing SAX to HTML, for a long time, I used a serializer that  
a friend and I pair programmed as part of a university project. Now  
I'm using a serializer that I wrote from scratch by extrapolating from  
the DOM to Unicode algorithm that the HTML5 spec gives.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 4 September 2008 15:30:42 UTC