- From: Grosso, Paul <pgrosso@ptc.com>
- Date: Thu, 11 Jun 2009 10:50:12 -0400
- To: <public-xml-core-wg@w3.org>
I see there are some things I missed, so it's good that we're
having some more discussion.
Comments below.
> -----Original Message-----
> From: Simon Pieters [mailto:simonp@opera.com]
> Sent: Thursday, 2009 June 11 3:27
> To: Grosso, Paul; public-xml-core-wg@w3.org
> Subject: Re: xml-stylesheet issues--suggested resolutions
>
> On Wed, 10 Jun 2009 16:28:57 +0200, Grosso, Paul
> <pgrosso@ptc.com> wrote:
>
> >> > * What happens when the PI is XML 1.0-well-formed but
> >> doesn't follow the
> >> > xml-stylesheet syntax?
> >>
> >> > * What happens when there are duplicate pseudo-attributes?
> >> (This seems
> >> > to actually be allowed in the syntax.)
> >
> > I suggest:
> >
> > This is an error; the xml-stylesheet processor MAY ignore the
> > entire PI; if it tries to recover, it SHOULD ignore all but the
> > last assignment to a given pseudo-attribute.
> >
> > This is what Arbortext currently does, and if we change the spec
> > to say "MUST ignore the entire PI", and we change our code to be
> > compliant, some user documents will suddenly stop working. If we
> > don't change our code, then we would be non-compliant which looks
> > bad both for Arbortext and the AssocSS spec (because it generally
> > looks bad for a spec when implementors ignore it).
> >
> > In fact, from an XML Core point of view, I'm less worried about
> > what Arbortext does than what the "major browser vendors" do.
> > If we start changing the AssocSS spec to make current behavior
> > completely non-compliant, I'm quite sure there will be cases
> > (such as this one in Arbortext's case) where they will decide
> > they just can't invalidate existing documents, so they will
> > ignore the spec. I'd rather not be in the position of setting
> > ourselves up to be ignored.
>
> I doubt there is enough legacy content with invalid
> xml-stylesheet PIs to
> make browser vendors ignore the spec. I say this because there are
> surprisingly few bugs reported on Opera for our Draconian
> handling of invalid xml-stylesheet PIs.
>
> There's one bug that cites this test case:
>
>
>
http://home.arcor.de/martin.honnen/operaBugs/op9/XML/ampersandInPI2.xml
>
> The bug says that Opera is wrong in aborting parsing. I think
> Firefox, Safari and IE ignore the PI here.
This is a reasonable discussion to have, and I'd like to
hear what others think.
My basic concern remains--if we are too strict, we risk
being ignored. If we can convince ourselves--or get
assurances from implementors--that we won't get ignored,
then we can perhaps be stricter.
>
> We have a much bigger problem with draconian error handling
> in XML proper
> in general than with xml-stylesheet. So from our perspective,
> defining
> error recovery for XML 1.0 and Namespaces in XML 1.0 is a
> higher priority.
>
>
> > In this duplicate pseudo-attribute case, I could live with
> > tightening my above suggestion to "...processor SHOULD ignore..."
> > because at least that way Arbortext could say "yes, we should,
> > but due to legacy issues, we decided instead to recover" and
> > still not be non-compliant with the spec.
>
> Is the legacy situation for Arbortext so bad that people rely
> on its error recovery behavior?
Probably not. I was mostly using this as an example.
>
> >> > * What happens when a CharRef hits the [WFC: Legal
> >> Character] constraint
> >> > in XML 1.0? (Unclear to me whether this is allowed in
> the syntax.)
> >>
> >> Syntax error: must ignore the entire PI. We should tighten up
> >> the syntax
> >> so that duplicate pseudo-attributes and NCRs that are syntax
> >> errors in XML
> >> 1.0 are also syntax errors in xml-stylesheet.
> >>
> >
> > As I said before:
> >
> > As far as I can tell, the XML Rec says:
> >
> > [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
> >
> > and Char is "any unicode character..." so I don't see how there
> > could be a CharRef in a PI.
>
> Not on the XML 1.0 layer, but on the xml-stylesheet layer.
>
> [3] PseudoAttValue ::= ('"' ([^"<&] |
> CharRef | PredefEntityRef)*
> '"'
> | "'" ([^'<&] | CharRef |
> PredefEntityRef)* "'")
> - (Char* '?>' Char*)
>
> http://www.w3.org/TR/xml-stylesheet/#NT-PseudoAttValue
>
Interesting--I missed that.
So the PI is valid, but when it is parsed as an xml-stylesheet
PI, the value of the pseudoattribute is discovered to have
what is considered a charref to an illegal character.
I'd probably say we should treat that as an invalid value
for the pseudoattribute--however we decide to handle that
(more below on that).
>
> >> > * What happens when there are unknown values?
> >>
> >
> > In general, I don't see why we have to say anything about the
> > values of attributes (except for 'alternate'). The original
> > idea behind the Assoc SS spec was to define how to map the
> > xml-stylesheet PI into the equivalent HTML 4.0 constructs, and
> > then let the semantics be driven by HTML 4.0, and I see no
> > reason to change that.
Another concern I have is that, regardless of the details of
what we say about invalid values, I don't want to require any
implementation to verify values for attributes it's going to
ignore. So an invalid/unknown value for a pseudoattribute
should never require that the entire PI be ignored, because
a given implementation might not even be looking at the value
of that pseudoattribute.
>
> The reason is that the HTML4 spec does a poor job at specifying the
> semantics and requirements.
>
> Maybe we could cite HTML5 instead, though?
>
We could, or we could live with HTML4's poor semantics.
Most of the world does.
>
> >> Unexpected value for 'type': must either abort processing the PI or
> >> continue as if type was absent.
> >
> > The Assoc SS spec currently requires the type attribute (so
> continuing
> > as if it were absent is equivalent to aborting, since it
> isn't allowed
> > to be absent).
>
> http://www.w3.org/1999/06/REC-xml-stylesheet-19990629/errata
>
Oops, I missed that. So it looks like nine years ago we'd already
come to the conclusion that I came too once again. (Clearly, my
memory doesn't extend that far back.)
>
> >> We should probably say that 'text/xsl' is
> >> to be treated the same as 'text/xml' for the purposes of
> 'type' (for
> >> compat with existing content).
> >
> > I don't see why we have to say anything about the values of
> > the type attribute.
>
> I guess it could be argued that this is something for the
> XSLT spec to worry about.
>
>
> >> Invalid IRI in 'href': interpret the value using the rules in "Web
> >> Addresses" (currently called "URLs" and specified here:
> >> http://www.whatwg.org/specs/web-apps/current-work/multipage/in
> > frastructure.html#parsing-urls
> >> ). If that returns an error: must ignore the entire PI.
> >
> > Likewise per my previous paragraph. We don't have to say anything
> > about how to handle the value of the href attribute.
> >
> >>
> >> 'media': refer to the Media Queries spec. If it's an invalid
> >> media query, must ignore the entire PI.
> >
> > Again, I don't think we should say anything about how to handle
> > the value of the media attribute.
>
> If we refer to the HTML4 spec here, then we're not being
> helpful. If we refer to the HTML5 spec, then it's fine.
We could refer to HTML5, or we could decide not to "be helpful".
I don't say that facetiously. We could decide that the AssocSS
spec merely defines how to map the xml-stylesheet PI into values
for certain things like href, type, media, etc., and say that
the interpretation of those values is left to other specs.
On the other hand, I have no objections if we decide we want
to specify which spec to reference for the interpretation of
the various values. I need to hear from the rest of the WG
about this.
>
>
> >> > * Is it conforming for a document to have an
> >> xml-stylesheet PI anywhere
> >> > other than in the prologue? Is it used or ignored?
> >>
> >> Misplaced xml-stylesheet PI: must ignore the entire PI.
> >> Documents must not use misplaced xml-stylesheet PIs.
> >
> > I agree, but there is more to this.
> >
> > The Assoc SS spec says:
> >
> > The xml-stylesheet processing instruction is allowed only in
> > the prolog of an XML document.
> >
> > So an xml-stylesheet processor should ignore any PI not in
> the prolog.
>
> That doesn't follow. The spec needs to require that separately.
>
I would read the spec as saying that any PI that looks like
an xml-stylesheet PI but can't be (because it isn't in the
prolog), isn't an xml-stylesheet PI, so it gets ignored by
the xml-stylesheet PI processor.
But I don't mind saying that explicitly.
>
> > I would add that an xml-stylesheet processor should ignore any PI
> > that is not physically in the document entity.
>
> Yes.
>
>
> > I would also add that, in the case of multiple xml-stylesheet PIs
> > for the same media, the xml-stylesheet processor should ignore all
> > but the last in document order.
>
> Why? Browsers support multiple for CSS. XSLT requires multiple to be
> supported, too, although browsers generally use either the
> first or last for XSLT.
I guess I'm not sure what it means to have multiple xml-stylesheet PIs
with the same media value, but if it does make sense, I'm okay with that
at the xml-stylesheet processor level, as long as we don't require
an editor/browser/composition application to necessarily know how to
handle multiple ones.
paul
Received on Thursday, 11 June 2009 14:55:13 UTC