RE: ACTION A-645-07: schema for serialization parameters

I'm sending this with HTML layout in the hope it is better readable.

 

Currently, with this latest version, I tested a few  scenarios, primarily with method/@value, here are the -- sometimes surprising -- results:

 

Red: either incorrectly valid or incorrectly invalid

Orange: correct results but for incorrect reasons

Green: correctly valid or correctly invalid

 

Note: Red and orange can also mean that the result is unexpected but the validators pass the XSD correctly, in that case I added an "*", which means it is likely a bug in the XSD.

 

0) xml:id

MSXML.NET: chokes on xml:id, so I removed that (should we include that? Some WG's do in their schemas, I believe) 

 

1) spaces in EQName URI

value="Q{ http://example.com/nss/foo }bar"

LibXML: valid

MSXML4: invalid (for incorrect reasons)

MSXML.NET: invalid 

Saxon-EE: valid

 

2) invalid EQName, extra "}"

value="Q{http://example.com/nss/foo}}bar"

LibXML: valid

MSXML4: invalid (for incorrect reasons)

MSXML.NET: valid

Saxon-EE: valid

 

3) spaces within URI

value="Q{http://e xample.com/nss/foo}bar"

LibXML: valid

MSXML4: invalid (for incorrect reasons)

MSXML.NET: valid

Saxon-EE: valid

 

4) invalid EQName, double starting {{

value="Q{{http://example.com/nss/foo}bar"

LibXML: valid

MSXML4: invalid (for incorrect reasons)

MSXML.NET: valid

Saxon-EE: valid

 

5) invalid NCName part, wrong start-char

value="Q{http://example.com/nss/foo}-bar"

LibXML: invalid

MSXML4: valid

MSXML.NET: invalid

Saxon-EE: invalid

 

6) url-escaped URI (should be allowed)

value="Q{http://e%20xample.com/nss/foo}bar"

LibXML: valid

MSXML4: valid

MSXML.NET: valid

Saxon-EE: valid

 

7) missing NCName part

value="Q{http://example.com/nss/foo}"

LibXML: invalid

MSXML4: invalid

MSXML.NET: invalid

Saxon-EE: invalid

 

8) no-namespace EQName (in "method", this should only be variants of "Q{}html", i.e. the allowed defaults)

value="Q{}html"

LibXML: invalid*

MSXML4: invalid*

MSXML.NET: invalid*

Saxon-EE: invalid*

 

9) no-namespace EQName with spaces

value="Q{  }html"

LibXML: invalid* (for wrong reasons, missing enum)

MSXML4: invalid* (for wrong reasons, missing enum)

MSXML.NET: invalid* (for wrong reasons, missing enum)

Saxon-EE: invalid* (for wrong reasons, missing enum)

 

10) empty value for method

value=""

LibXML: invalid

MSXML4: invalid

MSXML.NET: invalid

Saxon-EE: invalid

 

 

 

Findings:

- MSXML4 chokes on subtracting regexes, i.e. "[\c-[:]]", fixing that by adding a hierarchy resolve it for MSXML4

- The current expression "Q\{(.*)\}" can be made stricter to disallow whitespace and curlies, or remove allowed whitespace by deriving by restriction from a base type

- the no-namespace EQNames that are allowed in method-type and json-node-output-method-type should be added

- should we include xml.xsd as we do in several other scenarios, to allow xml:id etc?

 

Proposal

I propose a few minor changes that validate in all scenarios above and fixes a few bugs in the XSD:

 

I experimented with these two definitions (the base type is needed to remove the subtracting regex):

 

  <xs:simpleType name="EQName-Base">
    <xs:restriction base="xs:token">
      <xs:pattern value="Q\{\S*\}[^:]+"/>      
      <xs:whiteSpace value="collapse"/>        
    </xs:restriction>
  </xs:simpleType>
  
  <xs:simpleType name="EQName">
    <xs:restriction base="output:EQName-Base">
      <xs:pattern value="Q\{[^\s\{\}]*\}[\i][\c]*"/>      
      <xs:whiteSpace value="collapse"/>        
    </xs:restriction>
  </xs:simpleType>

 

 

And

 

  <xs:simpleType name="EQName-Base">
    <xs:restriction base="xs:token">
      <xs:pattern value="Q\{[^\s\{\}]*\}[^:]+"/>      
      <xs:whiteSpace value="collapse"/>        
    </xs:restriction>
  </xs:simpleType>
  
  <xs:simpleType name="EQName">
    <xs:restriction base="output:EQName-Base">
      <xs:pattern value="Q\{.*\}[\i][\c]*"/>      
      <xs:whiteSpace value="collapse"/>        
    </xs:restriction>
  </xs:simpleType>

 

I think they are equivalent and should both cancel out spaces and contained "{" and "}", but the first correctly disallows the curlies according to all four validators, the second incorrectly disallows them. Both correctly refuse spaces. It's off-topic, but I wonder whether I made a mistake here or whether you'd agree these are indeed equivalent (the base-type is never directly used). 

 

If we do not want this change (split EQName in base and derived) then we (only) lose compatibility with MSXML4. Other validators support subtracting regexes.

 

I've attached an XSD as a proposed improvement that contains these and other changes that now correctly validate against all four validators, including the one oXygen uses internally for debugging (not sure what validator that is).

 

Cheers,

Abel

 

 

> -----Original Message-----

> From: C. M. Sperberg-McQueen [mailto:cmsmcq@blackmesatech.com]

> Sent: Tuesday, June 21, 2016 5:49 PM

> To: Abel Braaksma

> Cc: C. M. Sperberg-McQueen; 'Public Joint XSLT XQuery XPath'

> Subject: Re: ACTION A-645-07: schema for serialization parameters

> 

> 

> On Jun 21, 2016, at 5:24 AM, Abel Braaksma wrote:

> > ...

> > I'm curious, where did you see the production *without* the ws:explicit?

> Because in XPath 3.0, 3.1 public CR and 3.1 internal CR I see the following

> production rules (under section A.2.1 Terminal Symbols, the inline rules in

> the body of the text do not carry the ws:explicit comments, which is perhaps

> a bit unfortunate):

> >

> > [117]              URIQualifiedName            ::=       BracedURILiteral NCName

>             /* ws: explicit */

> > [118]              BracedURILiteral                ::=       "Q" "{" [^{}]* "}"             /* ws:

> explicit */

> >

> > In other words, whitespace is prohibited for this production.

> 

> 

> Correction accepted; revised schema is attached.  This changes the

> characterization of the last several examples in the test file; they no

> longer become valid, but stay invalid.

> 

> 

>   <!--* Was invalid, stays invalid after all (whitespace in the EQName) *-->

>   <s:method xml:id="method-i-v-EQ1" value="Q{

> http://example.com/nss/foo } bar"/>

>   <s:method xml:id="method-i-v-EQ2" value="Q {

> http://example.com/nss/foo }bar"/>

>   <s:method xml:id="method-i-v-EQ3" value="Q {

> http://example.com/nss/foo } bar"/>

> 

>   <s:json-node-output-method xml:id="json-i-v-EQ1"

> value="Q{http://example.com/nss/foo} bar"/>

>   <s:json-node-output-method xml:id="json-i-v-EQ1" value="Q

> {http://example.com/nss/foo}bar"/>

>   <s:json-node-output-method xml:id="json-i-v-EQ1" value="Q

> {http://example.com/nss/foo} bar"/>

> 

> 

> --

> **********************************************************

> ******

> * C. M. Sperberg-McQueen, Black Mesa Technologies LLC

> * http://www.blackmesatech.com

> * http://cmsmcq.com/mib

> * http://balisage.net

> **********************************************************

> ******

> 

> 

 

Received on Friday, 24 June 2016 12:41:40 UTC