- From: Abel Braaksma <abel.braaksma@xs4all.nl>
- Date: Sat, 25 Jun 2016 14:55:11 +0200
- To: "'C. M. Sperberg-McQueen'" <cmsmcq@blackmesatech.com>, "'Public Joint XSLT XQuery XPath'" <public-xsl-query@w3.org>
On the whitespace issue, whether it is allowed or disallowed, I stand corrected. I thought from last meeting that the WG agreed that it was disallowed unless specified. Here it is implicitly specified. So I looked up what we say in the XPath text on EQName and there we write: "The namespace URI value is whitespace normalized according to the rules for the xs:anyURI type in Section 3.2.17 anyURI" While I couldn't find anything on whitespace normalization in the XSD spec, it does say, in the XSD spec: "Spaces are, in principle, allowed in the ·lexical space· of anyURI, however, their use is highly discouraged (unless they are encoded by %20)." So, I would assume that normalization does *not* take place (I will raise an XPath bug for this, as the specs seems to disagree on this amongst each other). In short: I agree with your assessment that whitespaces SHOULD be allowed (that it is discouraged does not mean it is disallowed). But it may introduce a new issue, as currently we specify whitespace="collapse", which may not be the right thing to do. I will respond on the other issue in-line in a follow-up mail. (musings: if whitespace is allowed in a namespace URI, how can that namespace ever be used in an xsi:schemaLocation attribute?) Thanks, Abel > -----Original Message----- > From: C. M. Sperberg-McQueen [mailto:cmsmcq@blackmesatech.com] > Sent: Saturday, June 25, 2016 4:11 AM > To: Abel Braaksma > Cc: C. M. Sperberg-McQueen; 'Public Joint XSLT XQuery XPath' > Subject: Re: ACTION A-645-07: schema for serialization parameters > > > On Jun 24, 2016, at 6:40 AM, Abel Braaksma wrote: > > > I'm sending this with HTML layout in the hope it is better readable. > > > > Currently, with this latest version, I tested a few scenarios, primarily with > method/@value, here are the -- sometimes surprising -- results: > > Thank you very much for these test cases. > > > > > Red: either incorrectly valid or incorrectly invalid > > Orange: correct results but for incorrect reasons > > Green: correctly valid or correctly invalid > > > > Note: Red and orange can also mean that the result is unexpected but the > validators pass the XSD correctly, in that case I added an "*", which means it > is likely a bug in the XSD. > > > > 0) xml:id > > MSXML.NET: chokes on xml:id, so I removed that (should we include > > that? Some WG's do in their schemas, I believe) > > I think you are saying that MSXML.NET chokes on the xml:id attributes > included in the test document I prepared. Is that correct? > > If it does, then I think it's in error. I'll spare everyone the details, because on > closer examination I suspect you mean that MSXML.NET chokes on some but > not all of the xml:id attributes; a copy/paste error led to the ID "json-i-v-EQ1" > occurring on three elements. > > Can you clarify? Thanks. > > Of course, we might wish not just to allow xml:id but to include the schema > for the XML namespace; I'm agnostic on that. > > > > > 1) spaces in EQName URI > > value="Q{ http://example.com/nss/foo }bar" > > LibXML: valid > > MSXML4: invalid (for incorrect reasons) > > MSXML.NET: invalid > > Saxon-EE: valid > > This is the value on elements method-v-EQ2 and json-v-EQ2, correct? > (I put the IDs into the test so that it would be easier to discuss specific values, > without having to do a character-by-character comparison between literals.) > > My reply seems to have lost the formatting, so I should specify that here you > label the "invalid" results correct. > > Why do you believe this is (or should be) invalid? I read the grammar of > XPath 3.1 as saying it should be valid. The relevant productions are[1] > > [117] URIQualifiedName ::= BracedURILiteral NCName /* ws: explicit */ [118] > BracedURILiteral ::= "Q" "{" [^{}]* "}" /* ws: explicit */ > > [1] http://www.w3.org/XML/Group/qtspecs/specifications/xquery- > 31/html/xpath-31-diff.html#prod-xpath31-URIQualifiedName > > The "ws: explicit" rule says that "the EBNF notation explicitly notates, with S > or otherwise, where whitespace characters are allowed." Here, the rule says > that any characters other than "{" and "}" are allowed within the braces; I > think the blanks (U+0020 in this case, but the same logic applies to other > whitespace characters) found in this literal. > > I think allowing whitespace within the angle brackets is probably a mistake, > since it's not allowed in URIs or IRIs, but I was trying to match the grammar, > not improve it. > > > > > > 2) invalid EQName, extra "}" > > value="Q{http://example.com/nss/foo}}bar" > > LibXML: valid > > MSXML4: invalid (for incorrect reasons) > > MSXML.NET: valid > > Saxon-EE: valid > > I agree that this value is allowed by the grammar in XPath and should be > made invalid by the schema. > > > 3) spaces within URI > > value="Q{http://e xample.com/nss/foo}bar" > > LibXML: valid > > MSXML4: invalid (for incorrect reasons) > > MSXML.NET: valid > > Saxon-EE: valid > > I believe (but have not checked within the last several years) that the current > specs for URIs and IRIs do not allow whitespace within either. So I agree that > in principle this should probably be disallowed. > > But it's currently allowed by the XPath spec, unless i have missed something, > so I did not try to make the schema disallow it. > > > > > 4) invalid EQName, double starting {{ > > value="Q{{http://example.com/nss/foo}bar" > > LibXML: valid > > MSXML4: invalid (for incorrect reasons) > > MSXML.NET: valid > > Saxon-EE: valid > > As for 2 above. > > > > > 5) invalid NCName part, wrong start-char > > value="Q{http://example.com/nss/foo}-bar" > > LibXML: invalid > > MSXML4: valid > > MSXML.NET: invalid > > Saxon-EE: invalid > > Agreed that this is and should be invalid. > > > > > 6) url-escaped URI (should be allowed) > > value="Q{http://e%20xample.com/nss/foo}bar" > > LibXML: valid > > MSXML4: valid > > MSXML.NET: valid > > Saxon-EE: valid > > Agreed that this is now and should be valid. > > > > > 7) missing NCName part > > value="Q{http://example.com/nss/foo}" > > LibXML: invalid > > MSXML4: invalid > > MSXML.NET: invalid > > Saxon-EE: invalid > > Agreed that this is and should be invalid. > > > > > 8) no-namespace EQName (in "method", this should only be variants of > > "Q{}html", i.e. the allowed defaults) value="Q{}html" > > LibXML: invalid* > > MSXML4: invalid* > > MSXML.NET: invalid* > > Saxon-EE: invalid* > > I do not believe the spec intends for this to be valid. Perhaps I'm wrong; I will > have to reread the text. > > Perhaps it should. > > If it should be valid, is this a small enough change to make at this point? Or is > it too late? > > > > > 9) no-namespace EQName with spaces > > value="Q{ }html" > > LibXML: invalid* (for wrong reasons, missing enum) > > MSXML4: invalid* (for wrong reasons, missing enum) > > MSXML.NET: invalid* (for wrong reasons, missing enum) > > Saxon-EE: invalid* (for wrong reasons, missing enum) > > As for 8 and 1. > > > > > 10) empty value for method > > value="" > > LibXML: invalid > > MSXML4: invalid > > MSXML.NET: invalid > > Saxon-EE: invalid > > I agree that this is and should be invalid. > > > > > Findings: > > - MSXML4 chokes on subtracting regexes, i.e. "[\c-[:]]", fixing that > > by adding a hierarchy resolve it for MSXML4 > > Others will know better than I; is MSXML 4 currently in wide use? > > Actually, I suppose my instinct is to try to make the schema work with it, > even if it's not known to be currently in wide use. Your sketches show a > reasonably simple way. > > > - The current expression "Q\{(.*)\}" can be made stricter to disallow > > whitespace and curlies, or remove allowed whitespace by deriving by > > restriction from a base type > > Agreed as to the curly braces. Unless we change the grammar of XPath, I am > not persuaded as to the whitespace. > > > - the no-namespace EQNames that are allowed in method-type and > > json-node-output-method-type should be added > > I'll have to think about this; I see a certain logic to it, but I don't see that logic > in the spec. Unless I am mistaken, this would require textual changes to the > serialization spec as well as to the schema. > > > - should we include xml.xsd as we do in several other scenarios, to allow > xml:id etc? > > Agnostic. > > > > > Proposal > > I propose a few minor changes that validate in all scenarios above and fixes > a few bugs in the XSD: > > > > I experimented with these two definitions (the base type is needed to > remove the subtracting regex): > > > > <xs:simpleType name="EQName-Base"> > > <xs:restriction base="xs:token"> > > <xs:pattern value="Q\{\S*\}[^:]+"/> > > <xs:whiteSpace value="collapse"/> > > </xs:restriction> > > </xs:simpleType> > > > > <xs:simpleType name="EQName"> > > <xs:restriction base="output:EQName-Base"> > > <xs:pattern value="Q\{[^\s\{\}]*\}[\i][\c]*"/> > > <xs:whiteSpace value="collapse"/> > > </xs:restriction> > > </xs:simpleType> > > > > > > And > > > > <xs:simpleType name="EQName-Base"> > > <xs:restriction base="xs:token"> > > <xs:pattern value="Q\{[^\s\{\}]*\}[^:]+"/> > > <xs:whiteSpace value="collapse"/> > > </xs:restriction> > > </xs:simpleType> > > > > <xs:simpleType name="EQName"> > > <xs:restriction base="output:EQName-Base"> > > <xs:pattern value="Q\{.*\}[\i][\c]*"/> > > <xs:whiteSpace value="collapse"/> > > </xs:restriction> > > </xs:simpleType> > > I might be inclined to make EQName-Base anonymous, but I realize that > some people find nesting in such contexts confusing (and I'd develop the > definitions using a named form, anonymizing it at the end only because it's > not a type that corresponds to any thing outside the schema, it's *just* an > artefact of our attempt to work around the bug in MSXML 4. > > > > > I think they are equivalent and should both cancel out spaces and > contained "{" and "}", but the first correctly disallows the curlies according to > all four validators, the second incorrectly disallows them. > > I'm not sure I follow. > > > Both correctly refuse spaces. It's off-topic, but I wonder whether I made a > mistake here or whether you'd agree these are indeed equivalent (the base- > type is never directly used). > > They look equivalent to me, but I haven't tried to prove it. > > > If we do not want this change (split EQName in base and derived) then we > (only) lose compatibility with MSXML4. Other validators support subtracting > regexes. > > Michael > > -- > ********************************************************** > ****** > * C. M. Sperberg-McQueen, Black Mesa Technologies LLC > * http://www.blackmesatech.com > * http://cmsmcq.com/mib > * http://balisage.net > ********************************************************** > ****** > > >
Received on Saturday, 25 June 2016 12:55:51 UTC