RE: ACTION A-645-07: schema for serialization parameters from Abel Braaksma on 2016-06-25 (public-xsl-query@w3.org from June 2016)

From: Abel Braaksma <abel.braaksma@xs4all.nl>
Date: Sat, 25 Jun 2016 14:55:11 +0200
To: "'C. M. Sperberg-McQueen'" <cmsmcq@blackmesatech.com>, "'Public Joint XSLT XQuery XPath'" <public-xsl-query@w3.org>
Message-ID: <148a01d1cee0$d019c140$704d43c0$@xs4all.nl>
On the whitespace issue, whether it is allowed or disallowed, I stand corrected. I thought from last meeting that the WG agreed that it was disallowed unless specified. Here it is implicitly specified. So I looked up what we say in the XPath text on EQName and there we write:

"The namespace URI value is whitespace normalized according to the rules for the xs:anyURI type in Section 3.2.17 anyURI" 

While I couldn't find anything on whitespace normalization in the XSD spec, it does say, in the XSD spec:

"Spaces are, in principle, allowed in the ·lexical space· of anyURI, however, their use is highly discouraged (unless they are encoded by %20)."

So, I would assume that normalization does *not* take place (I will raise an XPath bug for this, as the specs seems to disagree on this amongst each other).

In short: I agree with your assessment that whitespaces SHOULD  be allowed (that it is discouraged does not mean it is disallowed). But it may introduce a new issue, as currently we specify whitespace="collapse", which may not be the right thing to do.

I will respond on the other issue in-line in a follow-up mail.

(musings: if whitespace is allowed in a namespace URI, how can that namespace ever be used in an xsi:schemaLocation attribute?)

Thanks,
Abel 

> -----Original Message-----
> From: C. M. Sperberg-McQueen [mailto:cmsmcq@blackmesatech.com]
> Sent: Saturday, June 25, 2016 4:11 AM
> To: Abel Braaksma
> Cc: C. M. Sperberg-McQueen; 'Public Joint XSLT XQuery XPath'
> Subject: Re: ACTION A-645-07: schema for serialization parameters
> 
> 
> On Jun 24, 2016, at 6:40 AM, Abel Braaksma wrote:
> 
> > I'm sending this with HTML layout in the hope it is better readable.
> >
> > Currently, with this latest version, I tested a few  scenarios, primarily with
> method/@value, here are the -- sometimes surprising -- results:
> 
> Thank you very much for these test cases.
> 
> >
> > Red: either incorrectly valid or incorrectly invalid
> > Orange: correct results but for incorrect reasons
> > Green: correctly valid or correctly invalid
> >
> > Note: Red and orange can also mean that the result is unexpected but the
> validators pass the XSD correctly, in that case I added an "*", which means it
> is likely a bug in the XSD.
> >
> > 0) xml:id
> > MSXML.NET: chokes on xml:id, so I removed that (should we include
> > that? Some WG's do in their schemas, I believe)
> 
> I think you are saying that MSXML.NET chokes on the xml:id attributes
> included in the test document I prepared.  Is that correct?
> 
> If it does, then I think it's in error.  I'll spare everyone the details, because on
> closer examination I suspect you mean that MSXML.NET chokes on some but
> not all of the xml:id attributes; a copy/paste error led to the ID "json-i-v-EQ1"
> occurring on three elements.
> 
> Can you clarify? Thanks.
> 
> Of course, we might wish not just to allow xml:id but to include the schema
> for the XML namespace; I'm agnostic on that.
> 
> >
> > 1) spaces in EQName URI
> > value="Q{ http://example.com/nss/foo }bar"
> > LibXML: valid
> > MSXML4: invalid (for incorrect reasons)
> > MSXML.NET: invalid
> > Saxon-EE: valid
> 
> This is the value on elements method-v-EQ2 and json-v-EQ2, correct?
> (I put the IDs into the test so that it would be easier to discuss specific values,
> without having to do a character-by-character comparison between literals.)
> 
> My reply seems to have lost the formatting, so I should specify that here you
> label the "invalid" results correct.
> 
> Why do you believe this is (or should be) invalid?  I read the grammar of
> XPath 3.1 as saying it should be valid.  The relevant productions are[1]
> 
> [117] URIQualifiedName ::= BracedURILiteral NCName /* ws: explicit */ [118]
> BracedURILiteral ::= "Q" "{" [^{}]* "}" /* ws: explicit */
> 
> [1] http://www.w3.org/XML/Group/qtspecs/specifications/xquery-
> 31/html/xpath-31-diff.html#prod-xpath31-URIQualifiedName
> 
> The "ws: explicit" rule says that "the EBNF notation explicitly notates, with S
> or otherwise, where whitespace characters are allowed."  Here, the rule says
> that any characters other than "{" and "}" are allowed within the braces; I
> think the blanks (U+0020 in this case, but the same logic applies to other
> whitespace characters) found in this literal.
> 
> I think allowing whitespace within the angle brackets is probably a mistake,
> since it's not allowed in URIs or IRIs, but I was trying to match the grammar,
> not improve it.
> 
> 
> >
> > 2) invalid EQName, extra "}"
> > value="Q{http://example.com/nss/foo}}bar"
> > LibXML: valid
> > MSXML4: invalid (for incorrect reasons)
> > MSXML.NET: valid
> > Saxon-EE: valid
> 
> I agree that this value is allowed by the grammar in XPath and should be
> made invalid by the schema.
> 
> > 3) spaces within URI
> > value="Q{http://e xample.com/nss/foo}bar"
> > LibXML: valid
> > MSXML4: invalid (for incorrect reasons)
> > MSXML.NET: valid
> > Saxon-EE: valid
> 
> I believe (but have not checked within the last several years) that the current
> specs for URIs and IRIs do not allow whitespace within either.  So I agree that
> in principle this should probably be disallowed.
> 
> But it's currently allowed by the XPath spec, unless i have missed something,
> so I did not try to make the schema disallow it.
> 
> >
> > 4) invalid EQName, double starting {{
> > value="Q{{http://example.com/nss/foo}bar"
> > LibXML: valid
> > MSXML4: invalid (for incorrect reasons)
> > MSXML.NET: valid
> > Saxon-EE: valid
> 
> As for 2 above.
> 
> >
> > 5) invalid NCName part, wrong start-char
> > value="Q{http://example.com/nss/foo}-bar"
> > LibXML: invalid
> > MSXML4: valid
> > MSXML.NET: invalid
> > Saxon-EE: invalid
> 
> Agreed that this is and should be invalid.
> 
> >
> > 6) url-escaped URI (should be allowed)
> > value="Q{http://e%20xample.com/nss/foo}bar"
> > LibXML: valid
> > MSXML4: valid
> > MSXML.NET: valid
> > Saxon-EE: valid
> 
> Agreed that this is now and should be valid.
> 
> >
> > 7) missing NCName part
> > value="Q{http://example.com/nss/foo}"
> > LibXML: invalid
> > MSXML4: invalid
> > MSXML.NET: invalid
> > Saxon-EE: invalid
> 
> Agreed that this is and should be invalid.
> 
> >
> > 8) no-namespace EQName (in "method", this should only be variants of
> > "Q{}html", i.e. the allowed defaults) value="Q{}html"
> > LibXML: invalid*
> > MSXML4: invalid*
> > MSXML.NET: invalid*
> > Saxon-EE: invalid*
> 
> I do not believe the spec intends for this to be valid.  Perhaps I'm wrong; I will
> have to reread the text.
> 
> Perhaps it should.
> 
> If it should be valid, is this a small enough change to make at this point?  Or is
> it too late?
> 
> >
> > 9) no-namespace EQName with spaces
> > value="Q{  }html"
> > LibXML: invalid* (for wrong reasons, missing enum)
> > MSXML4: invalid* (for wrong reasons, missing enum)
> > MSXML.NET: invalid* (for wrong reasons, missing enum)
> > Saxon-EE: invalid* (for wrong reasons, missing enum)
> 
> As for 8 and 1.
> 
> >
> > 10) empty value for method
> > value=""
> > LibXML: invalid
> > MSXML4: invalid
> > MSXML.NET: invalid
> > Saxon-EE: invalid
> 
> I agree that this is and should be invalid.
> 
> >
> > Findings:
> > - MSXML4 chokes on subtracting regexes, i.e. "[\c-[:]]", fixing that
> > by adding a hierarchy resolve it for MSXML4
> 
> Others will know better than I; is MSXML 4 currently in wide use?
> 
> Actually, I suppose my instinct is to try to make the schema work with it,
> even if it's not known to be currently in wide use.  Your sketches show a
> reasonably simple way.
> 
> > - The current expression "Q\{(.*)\}" can be made stricter to disallow
> > whitespace and curlies, or remove allowed whitespace by deriving by
> > restriction from a base type
> 
> Agreed as to the curly braces.  Unless we change the grammar of XPath, I am
> not persuaded as to the whitespace.
> 
> > - the no-namespace EQNames that are allowed in method-type and
> > json-node-output-method-type should be added
> 
> I'll have to think about this; I see a certain logic to it, but I don't see that logic
> in the spec.  Unless I am mistaken, this would require textual changes to the
> serialization spec as well as to the schema.
> 
> > - should we include xml.xsd as we do in several other scenarios, to allow
> xml:id etc?
> 
> Agnostic.
> 
> >
> > Proposal
> > I propose a few minor changes that validate in all scenarios above and fixes
> a few bugs in the XSD:
> >
> > I experimented with these two definitions (the base type is needed to
> remove the subtracting regex):
> >
> >   <xs:simpleType name="EQName-Base">
> >     <xs:restriction base="xs:token">
> >       <xs:pattern value="Q\{\S*\}[^:]+"/>
> >       <xs:whiteSpace value="collapse"/>
> >     </xs:restriction>
> >   </xs:simpleType>
> >
> >   <xs:simpleType name="EQName">
> >     <xs:restriction base="output:EQName-Base">
> >       <xs:pattern value="Q\{[^\s\{\}]*\}[\i][\c]*"/>
> >       <xs:whiteSpace value="collapse"/>
> >     </xs:restriction>
> >   </xs:simpleType>
> >
> >
> > And
> >
> >   <xs:simpleType name="EQName-Base">
> >     <xs:restriction base="xs:token">
> >       <xs:pattern value="Q\{[^\s\{\}]*\}[^:]+"/>
> >       <xs:whiteSpace value="collapse"/>
> >     </xs:restriction>
> >   </xs:simpleType>
> >
> >   <xs:simpleType name="EQName">
> >     <xs:restriction base="output:EQName-Base">
> >       <xs:pattern value="Q\{.*\}[\i][\c]*"/>
> >       <xs:whiteSpace value="collapse"/>
> >     </xs:restriction>
> >   </xs:simpleType>
> 
> I might be inclined to make EQName-Base anonymous, but I realize that
> some people find nesting in such contexts confusing (and I'd develop the
> definitions using a named form, anonymizing it at the end only because it's
> not a type that corresponds to any thing outside the schema, it's *just* an
> artefact of our attempt to work around the bug in MSXML 4.
> 
> >
> > I think they are equivalent and should both cancel out spaces and
> contained "{" and "}", but the first correctly disallows the curlies according to
> all four validators, the second incorrectly disallows them.
> 
> I'm not sure I follow.
> 
> > Both correctly refuse spaces. It's off-topic, but I wonder whether I made a
> mistake here or whether you'd agree these are indeed equivalent (the base-
> type is never directly used).
> 
> They look equivalent to me, but I haven't tried to prove it.
> 
> > If we do not want this change (split EQName in base and derived) then we
> (only) lose compatibility with MSXML4. Other validators support subtracting
> regexes.
> 
> Michael
> 
> --
> **********************************************************
> ******
> * C. M. Sperberg-McQueen, Black Mesa Technologies LLC
> * http://www.blackmesatech.com
> * http://cmsmcq.com/mib
> * http://balisage.net
> **********************************************************
> ******
> 
> 
>
Received on Saturday, 25 June 2016 12:55:51 UTC