RE: xml-stylesheet issues--suggested resolutions from Grosso, Paul on 2009-06-11 (public-xml-core-wg@w3.org from June 2009)

From: Grosso, Paul <pgrosso@ptc.com>
Date: Thu, 11 Jun 2009 10:50:12 -0400
To: <public-xml-core-wg@w3.org>
Message-ID: <CF83BAA719FD2C439D25CBB1C9D1D3020FEA4924@HQ-MAIL4.ptcnet.ptc.com>
I see there are some things I missed, so it's good that we're 
having some more discussion.

Comments below.

> -----Original Message-----
> From: Simon Pieters [mailto:simonp@opera.com] 
> Sent: Thursday, 2009 June 11 3:27
> To: Grosso, Paul; public-xml-core-wg@w3.org
> Subject: Re: xml-stylesheet issues--suggested resolutions
> 
> On Wed, 10 Jun 2009 16:28:57 +0200, Grosso, Paul 
> <pgrosso@ptc.com> wrote:
> 
> >> > * What happens when the PI is XML 1.0-well-formed but
> >> doesn't follow the
> >> > xml-stylesheet syntax?
> >>
> >> > * What happens when there are duplicate pseudo-attributes?
> >> (This seems
> >> > to actually be allowed in the syntax.)
> >
> > I suggest:
> >
> >  This is an error; the xml-stylesheet processor MAY ignore the
> >  entire PI; if it tries to recover, it SHOULD ignore all but the
> >  last assignment to a given pseudo-attribute.
> >
> > This is what Arbortext currently does, and if we change the spec
> > to say "MUST ignore the entire PI", and we change our code to be
> > compliant, some user documents will suddenly stop working.  If we
> > don't change our code, then we would be non-compliant which looks
> > bad both for Arbortext and the AssocSS spec (because it generally
> > looks bad for a spec when implementors ignore it).
> >
> > In fact, from an XML Core point of view, I'm less worried about
> > what Arbortext does than what the "major browser vendors" do.
> > If we start changing the AssocSS spec to make current behavior
> > completely non-compliant, I'm quite sure there will be cases
> > (such as this one in Arbortext's case) where they will decide
> > they just can't invalidate existing documents, so they will
> > ignore the spec.  I'd rather not be in the position of setting
> > ourselves up to be ignored.
> 
> I doubt there is enough legacy content with invalid 
> xml-stylesheet PIs to  
> make browser vendors ignore the spec. I say this because there are  
> surprisingly few bugs reported on Opera for our Draconian 
> handling of invalid xml-stylesheet PIs.
> 
> There's one bug that cites this test case:
> 
>     
>
http://home.arcor.de/martin.honnen/operaBugs/op9/XML/ampersandInPI2.xml
> 
> The bug says that Opera is wrong in aborting parsing. I think 
> Firefox, Safari and IE ignore the PI here.

This is a reasonable discussion to have, and I'd like to
hear what others think.

My basic concern remains--if we are too strict, we risk
being ignored.  If we can convince ourselves--or get
assurances from implementors--that we won't get ignored,
then we can perhaps be stricter.

> 
> We have a much bigger problem with draconian error handling 
> in XML proper  
> in general than with xml-stylesheet. So from our perspective, 
> defining  
> error recovery for XML 1.0 and Namespaces in XML 1.0 is a 
> higher priority.
> 
> 
> > In this duplicate pseudo-attribute case, I could live with
> > tightening my above suggestion to "...processor SHOULD ignore..."
> > because at least that way Arbortext could say "yes, we should,
> > but due to legacy issues, we decided instead to recover" and
> > still not be non-compliant with the spec.
> 
> Is the legacy situation for Arbortext so bad that people rely 
> on its error recovery behavior?

Probably not.  I was mostly using this as an example.

> 
> >> > * What happens when a CharRef hits the [WFC: Legal
> >> Character] constraint
> >> > in XML 1.0? (Unclear to me whether this is allowed in 
> the syntax.)
> >>
> >> Syntax error: must ignore the entire PI. We should tighten up
> >> the syntax
> >> so that duplicate pseudo-attributes and NCRs that are syntax
> >> errors in XML
> >> 1.0 are also syntax errors in xml-stylesheet.
> >>
> >
> > As I said before:
> >
> > As far as I can tell, the XML Rec says:
> >
> > [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
> >
> > and Char is "any unicode character..." so I don't see how there
> > could be a CharRef in a PI.
> 
> Not on the XML 1.0 layer, but on the xml-stylesheet layer.
> 
> [3]   	PseudoAttValue	   ::=   	('"' ([^"<&] | 
> CharRef | PredefEntityRef)*  
> '"'	
> 			| "'" ([^'<&] | CharRef | 
> PredefEntityRef)* "'")	
> 			- (Char* '?>' Char*)
> 	
> http://www.w3.org/TR/xml-stylesheet/#NT-PseudoAttValue
> 

Interesting--I missed that.

So the PI is valid, but when it is parsed as an xml-stylesheet
PI, the value of the pseudoattribute is discovered to have
what is considered a charref to an illegal character.

I'd probably say we should treat that as an invalid value
for the pseudoattribute--however we decide to handle that
(more below on that).

> 
> >> > * What happens when there are unknown values?
> >>
> >
> > In general, I don't see why we have to say anything about the
> > values of attributes (except for 'alternate').  The original
> > idea behind the Assoc SS spec was to define how to map the
> > xml-stylesheet PI into the equivalent HTML 4.0 constructs, and
> > then let the semantics be driven by HTML 4.0, and I see no
> > reason to change that.

Another concern I have is that, regardless of the details of
what we say about invalid values, I don't want to require any
implementation to verify values for attributes it's going to
ignore.  So an invalid/unknown value for a pseudoattribute
should never require that the entire PI be ignored, because
a given implementation might not even be looking at the value
of that pseudoattribute.

> 
> The reason is that the HTML4 spec does a poor job at specifying the  
> semantics and requirements.
> 
> Maybe we could cite HTML5 instead, though?
> 

We could, or we could live with HTML4's poor semantics.
Most of the world does.

> 
> >> Unexpected value for 'type': must either abort processing the PI or
> >> continue as if type was absent.
> >
> > The Assoc SS spec currently requires the type attribute (so 
> continuing
> > as if it were absent is equivalent to aborting, since it 
> isn't allowed
> > to be absent).
> 
> http://www.w3.org/1999/06/REC-xml-stylesheet-19990629/errata
> 

Oops, I missed that.  So it looks like nine years ago we'd already 
come to the conclusion that I came too once again.  (Clearly, my
memory doesn't extend that far back.)

> 
> >> We should probably say that 'text/xsl' is
> >> to be treated the same as 'text/xml' for the purposes of 
> 'type' (for
> >> compat with existing content).
> >
> > I don't see why we have to say anything about the values of
> > the type attribute.
> 
> I guess it could be argued that this is something for the 
> XSLT spec to worry about.
> 
> 
> >> Invalid IRI in 'href': interpret the value using the rules in "Web
> >> Addresses" (currently called "URLs" and specified here:
> >> http://www.whatwg.org/specs/web-apps/current-work/multipage/in
> > frastructure.html#parsing-urls
> >> ). If that returns an error: must ignore the entire PI.
> >
> > Likewise per my previous paragraph.  We don't have to say anything
> > about how to handle the value of the href attribute.
> >
> >>
> >> 'media': refer to the Media Queries spec. If it's an invalid
> >> media query, must ignore the entire PI.
> >
> > Again, I don't think we should say anything about how to handle
> > the value of the media attribute.
> 
> If we refer to the HTML4 spec here, then we're not being 
> helpful. If we refer to the HTML5 spec, then it's fine.

We could refer to HTML5, or we could decide not to "be helpful".
I don't say that facetiously.  We could decide that the AssocSS
spec merely defines how to map the xml-stylesheet PI into values
for certain things like href, type, media, etc., and say that
the interpretation of those values is left to other specs.

On the other hand, I have no objections if we decide we want
to specify which spec to reference for the interpretation of
the various values.  I need to hear from the rest of the WG
about this.

> 
> 
> >> >  * Is it conforming for a document to have an
> >> xml-stylesheet PI anywhere
> >> > other than in the prologue? Is it used or ignored?
> >>
> >> Misplaced xml-stylesheet PI: must ignore the entire PI.
> >> Documents must not use misplaced xml-stylesheet PIs.
> >
> > I agree, but there is more to this.
> >
> > The Assoc SS spec says:
> >
> >  The xml-stylesheet processing instruction is allowed only in
> >  the prolog of an XML document.
> >
> > So an xml-stylesheet processor should ignore any PI not in 
> the prolog.
> 
> That doesn't follow. The spec needs to require that separately.
> 

I would read the spec as saying that any PI that looks like
an xml-stylesheet PI but can't be (because it isn't in the
prolog), isn't an xml-stylesheet PI, so it gets ignored by
the xml-stylesheet PI processor.

But I don't mind saying that explicitly.

> 
> > I would add that an xml-stylesheet processor should ignore any PI
> > that is not physically in the document entity.
> 
> Yes.
> 
> 
> > I would also add that, in the case of multiple xml-stylesheet PIs
> > for the same media, the xml-stylesheet processor should ignore all
> > but the last in document order.
> 
> Why? Browsers support multiple for CSS. XSLT requires multiple to be  
> supported, too, although browsers generally use either the 
> first or last for XSLT.

I guess I'm not sure what it means to have multiple xml-stylesheet PIs
with the same media value, but if it does make sense, I'm okay with that
at the xml-stylesheet processor level, as long as we don't require
an editor/browser/composition application to necessarily know how to
handle multiple ones.

paul
Received on Thursday, 11 June 2009 14:55:13 UTC