- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Fri, 24 Jun 2016 20:11:29 -0600
- To: "Abel Braaksma" <abel.braaksma@xs4all.nl>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "'Public Joint XSLT XQuery XPath'" <public-xsl-query@w3.org>
On Jun 24, 2016, at 6:40 AM, Abel Braaksma wrote:
> I'm sending this with HTML layout in the hope it is better readable.
>
> Currently, with this latest version, I tested a few scenarios, primarily with method/@value, here are the -- sometimes surprising -- results:
Thank you very much for these test cases.
>
> Red: either incorrectly valid or incorrectly invalid
> Orange: correct results but for incorrect reasons
> Green: correctly valid or correctly invalid
>
> Note: Red and orange can also mean that the result is unexpected but the validators pass the XSD correctly, in that case I added an "*", which means it is likely a bug in the XSD.
>
> 0) xml:id
> MSXML.NET: chokes on xml:id, so I removed that (should we include that? Some WG's do in their schemas, I believe)
I think you are saying that MSXML.NET chokes on the xml:id attributes
included in the test document I prepared. Is that correct?
If it does, then I think it's in error. I'll spare everyone the
details, because on closer examination I suspect you mean that
MSXML.NET chokes on some but not all of the xml:id attributes; a
copy/paste error led to the ID "json-i-v-EQ1" occurring on three
elements.
Can you clarify? Thanks.
Of course, we might wish not just to allow xml:id but to
include the schema for the XML namespace; I'm agnostic on
that.
>
> 1) spaces in EQName URI
> value="Q{ http://example.com/nss/foo }bar"
> LibXML: valid
> MSXML4: invalid (for incorrect reasons)
> MSXML.NET: invalid
> Saxon-EE: valid
This is the value on elements method-v-EQ2 and json-v-EQ2, correct?
(I put the IDs into the test so that it would be easier to discuss
specific values, without having to do a character-by-character
comparison between literals.)
My reply seems to have lost the formatting, so I should specify that
here you label the "invalid" results correct.
Why do you believe this is (or should be) invalid? I read the grammar
of XPath 3.1 as saying it should be valid. The relevant productions
are[1]
[117] URIQualifiedName ::= BracedURILiteral NCName /* ws: explicit */
[118] BracedURILiteral ::= "Q" "{" [^{}]* "}" /* ws: explicit */
[1] http://www.w3.org/XML/Group/qtspecs/specifications/xquery-31/html/xpath-31-diff.html#prod-xpath31-URIQualifiedName
The "ws: explicit" rule says that "the EBNF notation explicitly
notates, with S or otherwise, where whitespace characters are
allowed." Here, the rule says that any characters other than "{" and
"}" are allowed within the braces; I think the blanks (U+0020 in this
case, but the same logic applies to other whitespace characters) found
in this literal.
I think allowing whitespace within the angle brackets is probably a
mistake, since it's not allowed in URIs or IRIs, but I was trying to
match the grammar, not improve it.
>
> 2) invalid EQName, extra "}"
> value="Q{http://example.com/nss/foo}}bar"
> LibXML: valid
> MSXML4: invalid (for incorrect reasons)
> MSXML.NET: valid
> Saxon-EE: valid
I agree that this value is allowed by the grammar in XPath
and should be made invalid by the schema.
> 3) spaces within URI
> value="Q{http://e xample.com/nss/foo}bar"
> LibXML: valid
> MSXML4: invalid (for incorrect reasons)
> MSXML.NET: valid
> Saxon-EE: valid
I believe (but have not checked within the last several years)
that the current specs for URIs and IRIs do not allow whitespace
within either. So I agree that in principle this should probably
be disallowed.
But it's currently allowed by the XPath spec, unless i have
missed something, so I did not try to make the schema
disallow it.
>
> 4) invalid EQName, double starting {{
> value="Q{{http://example.com/nss/foo}bar"
> LibXML: valid
> MSXML4: invalid (for incorrect reasons)
> MSXML.NET: valid
> Saxon-EE: valid
As for 2 above.
>
> 5) invalid NCName part, wrong start-char
> value="Q{http://example.com/nss/foo}-bar"
> LibXML: invalid
> MSXML4: valid
> MSXML.NET: invalid
> Saxon-EE: invalid
Agreed that this is and should be invalid.
>
> 6) url-escaped URI (should be allowed)
> value="Q{http://e%20xample.com/nss/foo}bar"
> LibXML: valid
> MSXML4: valid
> MSXML.NET: valid
> Saxon-EE: valid
Agreed that this is now and should be valid.
>
> 7) missing NCName part
> value="Q{http://example.com/nss/foo}"
> LibXML: invalid
> MSXML4: invalid
> MSXML.NET: invalid
> Saxon-EE: invalid
Agreed that this is and should be invalid.
>
> 8) no-namespace EQName (in "method", this should only be variants of "Q{}html", i.e. the allowed defaults)
> value="Q{}html"
> LibXML: invalid*
> MSXML4: invalid*
> MSXML.NET: invalid*
> Saxon-EE: invalid*
I do not believe the spec intends for this to be valid. Perhaps I'm
wrong; I will have to reread the text.
Perhaps it should.
If it should be valid, is this a small enough change to
make at this point? Or is it too late?
>
> 9) no-namespace EQName with spaces
> value="Q{ }html"
> LibXML: invalid* (for wrong reasons, missing enum)
> MSXML4: invalid* (for wrong reasons, missing enum)
> MSXML.NET: invalid* (for wrong reasons, missing enum)
> Saxon-EE: invalid* (for wrong reasons, missing enum)
As for 8 and 1.
>
> 10) empty value for method
> value=""
> LibXML: invalid
> MSXML4: invalid
> MSXML.NET: invalid
> Saxon-EE: invalid
I agree that this is and should be invalid.
>
> Findings:
> - MSXML4 chokes on subtracting regexes, i.e. "[\c-[:]]", fixing that by adding a hierarchy resolve it for MSXML4
Others will know better than I; is MSXML 4 currently in wide use?
Actually, I suppose my instinct is to try to make the schema work
with it, even if it's not known to be currently in wide use. Your
sketches show a reasonably simple way.
> - The current expression "Q\{(.*)\}" can be made stricter to disallow whitespace and curlies, or remove allowed whitespace by deriving by restriction from a base type
Agreed as to the curly braces. Unless we change the grammar of XPath,
I am not persuaded as to the whitespace.
> - the no-namespace EQNames that are allowed in method-type and json-node-output-method-type should be added
I'll have to think about this; I see a certain logic to it, but I don't see
that logic in the spec. Unless I am mistaken, this would require textual
changes to the serialization spec as well as to the schema.
> - should we include xml.xsd as we do in several other scenarios, to allow xml:id etc?
Agnostic.
>
> Proposal
> I propose a few minor changes that validate in all scenarios above and fixes a few bugs in the XSD:
>
> I experimented with these two definitions (the base type is needed to remove the subtracting regex):
>
> <xs:simpleType name="EQName-Base">
> <xs:restriction base="xs:token">
> <xs:pattern value="Q\{\S*\}[^:]+"/>
> <xs:whiteSpace value="collapse"/>
> </xs:restriction>
> </xs:simpleType>
>
> <xs:simpleType name="EQName">
> <xs:restriction base="output:EQName-Base">
> <xs:pattern value="Q\{[^\s\{\}]*\}[\i][\c]*"/>
> <xs:whiteSpace value="collapse"/>
> </xs:restriction>
> </xs:simpleType>
>
>
> And
>
> <xs:simpleType name="EQName-Base">
> <xs:restriction base="xs:token">
> <xs:pattern value="Q\{[^\s\{\}]*\}[^:]+"/>
> <xs:whiteSpace value="collapse"/>
> </xs:restriction>
> </xs:simpleType>
>
> <xs:simpleType name="EQName">
> <xs:restriction base="output:EQName-Base">
> <xs:pattern value="Q\{.*\}[\i][\c]*"/>
> <xs:whiteSpace value="collapse"/>
> </xs:restriction>
> </xs:simpleType>
I might be inclined to make EQName-Base anonymous, but I
realize that some people find nesting in such contexts confusing
(and I'd develop the definitions using a named form, anonymizing
it at the end only because it's not a type that corresponds to any
thing outside the schema, it's *just* an artefact of our attempt to
work around the bug in MSXML 4.
>
> I think they are equivalent and should both cancel out spaces and contained "{" and "}", but the first correctly disallows the curlies according to all four validators, the second incorrectly disallows them.
I'm not sure I follow.
> Both correctly refuse spaces. It's off-topic, but I wonder whether I made a mistake here or whether you'd agree these are indeed equivalent (the base-type is never directly used).
They look equivalent to me, but I haven't tried to prove it.
> If we do not want this change (split EQName in base and derived) then we (only) lose compatibility with MSXML4. Other validators support subtracting regexes.
Michael
--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************
Received on Saturday, 25 June 2016 02:11:57 UTC