Re: xml-stylesheet issues--suggested resolutions

On Wed, 10 Jun 2009 16:28:57 +0200, Grosso, Paul <pgrosso@ptc.com> wrote:

>> > * What happens when the PI is XML 1.0-well-formed but
>> doesn't follow the
>> > xml-stylesheet syntax?
>>
>> > * What happens when there are duplicate pseudo-attributes?
>> (This seems
>> > to actually be allowed in the syntax.)
>
> I suggest:
>
>  This is an error; the xml-stylesheet processor MAY ignore the
>  entire PI; if it tries to recover, it SHOULD ignore all but the
>  last assignment to a given pseudo-attribute.
>
> This is what Arbortext currently does, and if we change the spec
> to say "MUST ignore the entire PI", and we change our code to be
> compliant, some user documents will suddenly stop working.  If we
> don't change our code, then we would be non-compliant which looks
> bad both for Arbortext and the AssocSS spec (because it generally
> looks bad for a spec when implementors ignore it).
>
> In fact, from an XML Core point of view, I'm less worried about
> what Arbortext does than what the "major browser vendors" do.
> If we start changing the AssocSS spec to make current behavior
> completely non-compliant, I'm quite sure there will be cases
> (such as this one in Arbortext's case) where they will decide
> they just can't invalidate existing documents, so they will
> ignore the spec.  I'd rather not be in the position of setting
> ourselves up to be ignored.

I doubt there is enough legacy content with invalid xml-stylesheet PIs to  
make browser vendors ignore the spec. I say this because there are  
surprisingly few bugs reported on Opera for our Draconian handling of  
invalid xml-stylesheet PIs.

There's one bug that cites this test case:

    http://home.arcor.de/martin.honnen/operaBugs/op9/XML/ampersandInPI2.xml

The bug says that Opera is wrong in aborting parsing. I think Firefox,  
Safari and IE ignore the PI here.

We have a much bigger problem with draconian error handling in XML proper  
in general than with xml-stylesheet. So from our perspective, defining  
error recovery for XML 1.0 and Namespaces in XML 1.0 is a higher priority.


> In this duplicate pseudo-attribute case, I could live with
> tightening my above suggestion to "...processor SHOULD ignore..."
> because at least that way Arbortext could say "yes, we should,
> but due to legacy issues, we decided instead to recover" and
> still not be non-compliant with the spec.

Is the legacy situation for Abortext so bad that people rely on its error  
recovery behavior?


>> > * What happens when a CharRef hits the [WFC: Legal
>> Character] constraint
>> > in XML 1.0? (Unclear to me whether this is allowed in the syntax.)
>>
>> Syntax error: must ignore the entire PI. We should tighten up
>> the syntax
>> so that duplicate pseudo-attributes and NCRs that are syntax
>> errors in XML
>> 1.0 are also syntax errors in xml-stylesheet.
>>
>
> As I said before:
>
> As far as I can tell, the XML Rec says:
>
> [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
>
> and Char is "any unicode character..." so I don't see how there
> could be a CharRef in a PI.

Not on the XML 1.0 layer, but on the xml-stylesheet layer.

[3]    PseudoAttValue    ::=    ('"' ([^"<&] | CharRef | PredefEntityRef)*  
'"' 
   | "'" ([^'<&] | CharRef | PredefEntityRef)* "'") 
   - (Char* '?>' Char*)
 
http://www.w3.org/TR/xml-stylesheet/#NT-PseudoAttValue


>> > * What happens when there are unknown values?
>>
>
> In general, I don't see why we have to say anything about the
> values of attributes (except for 'alternate').  The original
> idea behind the Assoc SS spec was to define how to map the
> xml-stylesheet PI into the equivalent HTML 4.0 constructs, and
> then let the semantics be driven by HTML 4.0, and I see no
> reason to change that.

The reason is that the HTML4 spec does a poor job at specifying the  
semantics and requirements.

Maybe we could cite HTML5 instead, though?


>> Unexpected value for 'type': must either abort processing the PI or
>> continue as if type was absent.
>
> The Assoc SS spec currently requires the type attribute (so continuing
> as if it were absent is equivalent to aborting, since it isn't allowed
> to be absent).

http://www.w3.org/1999/06/REC-xml-stylesheet-19990629/errata


>> We should probably say that 'text/xsl' is
>> to be treated the same as 'text/xml' for the purposes of 'type' (for
>> compat with existing content).
>
> I don't see why we have to say anything about the values of
> the type attribute.

I guess it could be argued that this is something for the XSLT spec to  
worry about.


>> Invalid IRI in 'href': interpret the value using the rules in "Web
>> Addresses" (currently called "URLs" and specified here:
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/in
> frastructure.html#parsing-urls
>> ). If that returns an error: must ignore the entire PI.
>
> Likewise per my previous paragraph.  We don't have to say anything
> about how to handle the value of the href attribute.
>
>>
>> 'media': refer to the Media Queries spec. If it's an invalid
>> media query, must ignore the entire PI.
>
> Again, I don't think we should say anything about how to handle
> the value of the media attribute.

If we refer to the HTML4 spec here, then we're not being helpful. If we  
refer to the HTML5 spec, then it's fine.


>> >  * Is it conforming for a document to have an
>> xml-stylesheet PI anywhere
>> > other than in the prologue? Is it used or ignored?
>>
>> Misplaced xml-stylesheet PI: must ignore the entire PI.
>> Documents must not use misplaced xml-stylesheet PIs.
>
> I agree, but there is more to this.
>
> The Assoc SS spec says:
>
>  The xml-stylesheet processing instruction is allowed only in
>  the prolog of an XML document.
>
> So an xml-stylesheet processor should ignore any PI not in the prolog.

That doesn't follow. The spec needs to require that separately.


> I would add that an xml-stylesheet processor should ignore any PI
> that is not physically in the document entity.

Yes.


> I would also add that, in the case of multiple xml-stylesheet PIs
> for the same media, the xml-stylesheet processor should ignore all
> but the last in document order.

Why? Browsers support multiple for CSS. XSLT requires multiple to be  
supported, too, although browsers generally use either the first or last  
for XSLT.

-- 
Simon Pieters
Opera Software

Received on Thursday, 11 June 2009 08:27:46 UTC