RE: xml-stylesheet issues (incl Arbortext behavior) from Grosso, Paul on 2009-02-26 (public-xml-core-wg@w3.org from February 2009)

From: Grosso, Paul <pgrosso@ptc.com>
Date: Thu, 26 Feb 2009 15:55:31 -0500
To: <public-xml-core-wg@w3.org>
Message-ID: <CF83BAA719FD2C439D25CBB1C9D1D3020E9E21AE@HQ-MAIL4.ptcnet.ptc.com>
Arbortext Editor has a UI to allow a user to associate
different stylesheets to their document for different
outputs (e.g., edit view, print/pdf, single html file,
chunked web, htmlhelp).  The UI causes an xml-stylesheet
PI to be written for each output for which there is an
association.  The generated PIs set the href, type,
media, and alternate pseudo-attributes.  

So to test the various issues, I hand-edited the PI
then brought up the document in Arbortext Editor,
and looked to see which associations were actually made.

The editor actually uses the href and media attributes,
but it doesn't do anything (e.g., ignores as far as
processing) all other attributes including the title,
charset, type, and alternate attributes.

If there is more than one PI with the same value for
the media attribute, the first such one in document order 
(where the internal subset acts as though it comes *after* 
the preceding prolog) wins and the rest are ignored.

> -----Original Message-----
> From: public-xml-core-wg-request@w3.org 
> [mailto:public-xml-core-wg-request@w3.org] On Behalf Of Simon Pieters
> Sent: Tuesday, 2009 February 17 10:38
> To: public-xml-core-wg@w3.org
> Subject: xml-stylesheet issues
> 
> As promised here are some issues with the existing 
> xml-stylesheet spec (probably not exhaustive):
> 
> 
> * What happens when the PI is XML 1.0-well-formed but doesn't 
> follow the xml-stylesheet syntax?

The entire PI is ignored.  (Warning messages are sometimes given.)

> 
> * What happens when there are unknown pseudo-attributes?

They are ignored, but the rest of the PI is processed and
is effective as long as there are values for the href and 
media pseudo-attributes.

> 
> * What happens when there are unknown values?

If there is an unknown value for media, the PI is ignored.

The value of the href attribute is used to retrieve the
referenced resource, and either the retrieval is successful
or not.  If not, a warning message is given and some default
stylesheet association processing occurs.

The value for all other attributes is ignored, so if there
are unknown values, it doesn't matter, the association is
still made.

> 
> * What happens when there are duplicate pseudo-attributes? 
> (This seems to actually be allowed in the syntax.)

If there are duplicate occurrences of href or media, the last one
(parsing the PI left to right) wins and the other occurrences
are ignored.

All other pseudo-attributes are ignored anyway, and duplications
don't make any difference.

> 
> * What happens when a CharRef hits the [WFC: Legal Character] 
> constraint in XML 1.0? (Unclear to me whether this is allowed 
> in the syntax.)

Arbortext's stylesheet PI processor doesn't seem to recognize 
charRefs.  If strings such as &#33; are used within the value
of the href or media attributes, they are left as strings of
five (or how ever many) characters.

> 
> * When is the processing of the PI invoked?
>   - What happens if you change the PI's 'data'?
>   - What happens if you change the PI's 'target'?
>   - What happens if you remove the PI from the DOM?
>   - What happens if you add the PI to the DOM (with scripting)?
>   - What happens if you insert the PI somewhere other than in 
> the prologue?
>   - What happens if the PI is a child of Document but after 
> the root element and you then move the root element so that 
> the PI becomes part of the prologue?

When the editor first reads a document, it is turned into
an internal data structure.  The PIs are processed when the
document is first brought up in the editor and the various
associations are remembered as part of the edit session.  
Henceforth, the serialization of the document is ignored,
so changing the PI in the file on disk is irrelevant.

The editor does not show the prolog as part of the document
in the standard UI, so there is no way to "change the PI".

One can change the in-memory stylesheet associations via
various commands/interfaces while in the editor, but this
does not affect the PI (until the file is reserialized and
written out at which time the PIs that get written reflect
the latest in-memory values).

>   
> * Is it conforming for a document to have an xml-stylesheet 
> PI anywhere other than in the prologue? Is it used or ignored?

Arbortext recognizes/processes an xml-stylesheet PI anywhere
preceding the document start tag (i.e., anywhere in the prolog)
including in the internal subset with one oddity (which is
probably a bug):  if there are any stylesheet PIs in the
internal subset, an stylesheet PIs following the internal
subset and preceding the document start tag are ignored.

An xml-stylesheet PI anywhere after the document start tag
is treated just like any other regular PI in the document.

Stylesheet PIs in the external subset are ignored.

When Arbortext reserializes a document to write it out, any
stylesheet PIs that were recognized as such (regardless of
where they were originally) are written out following the
XML declaration and preceding the doctype declaration if any.

> 
> * Browsers support type="text/xsl" but text/xsl is not a 
> registered media type and is not an XML media type per RFC 3023.

I don't think this is an issue for the xml-stylesheet PI spec.

> 
> * If charset is specified and the PI points to an XSLT 
> transformation, should the charset='' information be used?

The charset attribute is ignored by Arbortext (and I don't 
have an opinion for what's the right answer here).

> 
> * media='' references HTML4 which is outdated; browsers use 
> the Media Queries spec here.

I'm not sure there is a problem here.  HTML4 says:

 The following is a list of recognized media descriptors 
 . . .
 Future versions of HTML may introduce new values and may allow
 parameterized values. To facilitate the introduction of these
 extensions, conforming user agents must be able to parse the
 media attribute value as follows....

That leaves a lot of leeway.  On the other hand, we could consider 
updating some of the references in the xml-stylesheet spec. 

> 
> * CSSOM integration: 
> http://dev.w3.org/cvsweb/~checkout~/csswg/cssom/Overview.html?
> content-type=text/html;%20charset=utf-8#the-linkstyle defines 
> the LinkStyle interface that HTML <link> and 
> <?xml-stylesheet?> implement -- we should coordinate with Anne here.

I don't quite understand this, but I'm pretty sure this is
outside the scope of the xml-stylesheet PI spec. 

> 
> * CSS issues: it's unclear whether referencing an element 
> should work if type="text/css" -- the type of the document 
> would be an XML type which is not a CSS type, and browsers 
> largely don't support this anyway.

I'm not sure I understand the issue here, but I'm not sure
the xml-stylesheet spec needs to say anything about this.  

It's not up to this spec to say what kinds of stylesheets can
profitably be associated with the xml document.  For example,
Arbortext has its own stylesheet type that we associate with
the XML document, and it doesn't matter that browsers wouldn't
know what to do with it.  Perhaps some application could figure
out how to use CSS to style an XML document.  It's not for
this spec to say.

paul
Received on Thursday, 26 February 2009 20:58:24 UTC