- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Fri, 24 Jun 2016 20:11:29 -0600
- To: "Abel Braaksma" <abel.braaksma@xs4all.nl>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "'Public Joint XSLT XQuery XPath'" <public-xsl-query@w3.org>
On Jun 24, 2016, at 6:40 AM, Abel Braaksma wrote: > I'm sending this with HTML layout in the hope it is better readable. > > Currently, with this latest version, I tested a few scenarios, primarily with method/@value, here are the -- sometimes surprising -- results: Thank you very much for these test cases. > > Red: either incorrectly valid or incorrectly invalid > Orange: correct results but for incorrect reasons > Green: correctly valid or correctly invalid > > Note: Red and orange can also mean that the result is unexpected but the validators pass the XSD correctly, in that case I added an "*", which means it is likely a bug in the XSD. > > 0) xml:id > MSXML.NET: chokes on xml:id, so I removed that (should we include that? Some WG's do in their schemas, I believe) I think you are saying that MSXML.NET chokes on the xml:id attributes included in the test document I prepared. Is that correct? If it does, then I think it's in error. I'll spare everyone the details, because on closer examination I suspect you mean that MSXML.NET chokes on some but not all of the xml:id attributes; a copy/paste error led to the ID "json-i-v-EQ1" occurring on three elements. Can you clarify? Thanks. Of course, we might wish not just to allow xml:id but to include the schema for the XML namespace; I'm agnostic on that. > > 1) spaces in EQName URI > value="Q{ http://example.com/nss/foo }bar" > LibXML: valid > MSXML4: invalid (for incorrect reasons) > MSXML.NET: invalid > Saxon-EE: valid This is the value on elements method-v-EQ2 and json-v-EQ2, correct? (I put the IDs into the test so that it would be easier to discuss specific values, without having to do a character-by-character comparison between literals.) My reply seems to have lost the formatting, so I should specify that here you label the "invalid" results correct. Why do you believe this is (or should be) invalid? I read the grammar of XPath 3.1 as saying it should be valid. The relevant productions are[1] [117] URIQualifiedName ::= BracedURILiteral NCName /* ws: explicit */ [118] BracedURILiteral ::= "Q" "{" [^{}]* "}" /* ws: explicit */ [1] http://www.w3.org/XML/Group/qtspecs/specifications/xquery-31/html/xpath-31-diff.html#prod-xpath31-URIQualifiedName The "ws: explicit" rule says that "the EBNF notation explicitly notates, with S or otherwise, where whitespace characters are allowed." Here, the rule says that any characters other than "{" and "}" are allowed within the braces; I think the blanks (U+0020 in this case, but the same logic applies to other whitespace characters) found in this literal. I think allowing whitespace within the angle brackets is probably a mistake, since it's not allowed in URIs or IRIs, but I was trying to match the grammar, not improve it. > > 2) invalid EQName, extra "}" > value="Q{http://example.com/nss/foo}}bar" > LibXML: valid > MSXML4: invalid (for incorrect reasons) > MSXML.NET: valid > Saxon-EE: valid I agree that this value is allowed by the grammar in XPath and should be made invalid by the schema. > 3) spaces within URI > value="Q{http://e xample.com/nss/foo}bar" > LibXML: valid > MSXML4: invalid (for incorrect reasons) > MSXML.NET: valid > Saxon-EE: valid I believe (but have not checked within the last several years) that the current specs for URIs and IRIs do not allow whitespace within either. So I agree that in principle this should probably be disallowed. But it's currently allowed by the XPath spec, unless i have missed something, so I did not try to make the schema disallow it. > > 4) invalid EQName, double starting {{ > value="Q{{http://example.com/nss/foo}bar" > LibXML: valid > MSXML4: invalid (for incorrect reasons) > MSXML.NET: valid > Saxon-EE: valid As for 2 above. > > 5) invalid NCName part, wrong start-char > value="Q{http://example.com/nss/foo}-bar" > LibXML: invalid > MSXML4: valid > MSXML.NET: invalid > Saxon-EE: invalid Agreed that this is and should be invalid. > > 6) url-escaped URI (should be allowed) > value="Q{http://e%20xample.com/nss/foo}bar" > LibXML: valid > MSXML4: valid > MSXML.NET: valid > Saxon-EE: valid Agreed that this is now and should be valid. > > 7) missing NCName part > value="Q{http://example.com/nss/foo}" > LibXML: invalid > MSXML4: invalid > MSXML.NET: invalid > Saxon-EE: invalid Agreed that this is and should be invalid. > > 8) no-namespace EQName (in "method", this should only be variants of "Q{}html", i.e. the allowed defaults) > value="Q{}html" > LibXML: invalid* > MSXML4: invalid* > MSXML.NET: invalid* > Saxon-EE: invalid* I do not believe the spec intends for this to be valid. Perhaps I'm wrong; I will have to reread the text. Perhaps it should. If it should be valid, is this a small enough change to make at this point? Or is it too late? > > 9) no-namespace EQName with spaces > value="Q{ }html" > LibXML: invalid* (for wrong reasons, missing enum) > MSXML4: invalid* (for wrong reasons, missing enum) > MSXML.NET: invalid* (for wrong reasons, missing enum) > Saxon-EE: invalid* (for wrong reasons, missing enum) As for 8 and 1. > > 10) empty value for method > value="" > LibXML: invalid > MSXML4: invalid > MSXML.NET: invalid > Saxon-EE: invalid I agree that this is and should be invalid. > > Findings: > - MSXML4 chokes on subtracting regexes, i.e. "[\c-[:]]", fixing that by adding a hierarchy resolve it for MSXML4 Others will know better than I; is MSXML 4 currently in wide use? Actually, I suppose my instinct is to try to make the schema work with it, even if it's not known to be currently in wide use. Your sketches show a reasonably simple way. > - The current expression "Q\{(.*)\}" can be made stricter to disallow whitespace and curlies, or remove allowed whitespace by deriving by restriction from a base type Agreed as to the curly braces. Unless we change the grammar of XPath, I am not persuaded as to the whitespace. > - the no-namespace EQNames that are allowed in method-type and json-node-output-method-type should be added I'll have to think about this; I see a certain logic to it, but I don't see that logic in the spec. Unless I am mistaken, this would require textual changes to the serialization spec as well as to the schema. > - should we include xml.xsd as we do in several other scenarios, to allow xml:id etc? Agnostic. > > Proposal > I propose a few minor changes that validate in all scenarios above and fixes a few bugs in the XSD: > > I experimented with these two definitions (the base type is needed to remove the subtracting regex): > > <xs:simpleType name="EQName-Base"> > <xs:restriction base="xs:token"> > <xs:pattern value="Q\{\S*\}[^:]+"/> > <xs:whiteSpace value="collapse"/> > </xs:restriction> > </xs:simpleType> > > <xs:simpleType name="EQName"> > <xs:restriction base="output:EQName-Base"> > <xs:pattern value="Q\{[^\s\{\}]*\}[\i][\c]*"/> > <xs:whiteSpace value="collapse"/> > </xs:restriction> > </xs:simpleType> > > > And > > <xs:simpleType name="EQName-Base"> > <xs:restriction base="xs:token"> > <xs:pattern value="Q\{[^\s\{\}]*\}[^:]+"/> > <xs:whiteSpace value="collapse"/> > </xs:restriction> > </xs:simpleType> > > <xs:simpleType name="EQName"> > <xs:restriction base="output:EQName-Base"> > <xs:pattern value="Q\{.*\}[\i][\c]*"/> > <xs:whiteSpace value="collapse"/> > </xs:restriction> > </xs:simpleType> I might be inclined to make EQName-Base anonymous, but I realize that some people find nesting in such contexts confusing (and I'd develop the definitions using a named form, anonymizing it at the end only because it's not a type that corresponds to any thing outside the schema, it's *just* an artefact of our attempt to work around the bug in MSXML 4. > > I think they are equivalent and should both cancel out spaces and contained "{" and "}", but the first correctly disallows the curlies according to all four validators, the second incorrectly disallows them. I'm not sure I follow. > Both correctly refuse spaces. It's off-topic, but I wonder whether I made a mistake here or whether you'd agree these are indeed equivalent (the base-type is never directly used). They look equivalent to me, but I haven't tried to prove it. > If we do not want this change (split EQName in base and derived) then we (only) lose compatibility with MSXML4. Other validators support subtracting regexes. Michael -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Saturday, 25 June 2016 02:11:57 UTC