- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Wed, 7 Mar 2018 11:37:20 -0700
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, public-xformsusers@w3.org
> On Mar 7, 2018, at 10:39 AM, Steven Pemberton <steven.pemberton@cwi.nl> wrote: > > The definition of anyURI doesn't allow IRIs, such as > https://zh.wikipedia.org/wiki/Wikipedia:关于中文维基百科/en. > > Just as we added an iemail address type to match modern email addresses, it seems to me that we ought to also add an anyIRI type that accepts IRIs like the aboce. I am puzzled; what leads you to the conclusion that xsd:anyURI does not accept IRIs? In XSD 1.0 [1], the value space is described as that of RFC 2396, as modified by RFC 2732, and the lexical space is described (roughly) as the set of strings, which after escaping, turn into URIs as defined by those specs. The escaping in question is the then current algorithm for IRIs, as published in the XLink spec. I believe that later revisions of the concept of IRI changed the rules for whitespace, but I don’t recall any other changes likely to be noticeable to users of the datatype. Certainly the intent of XSD 1.0 was to accept IRIs in the lexical space of the type anyURI. The spec says "This [the mapping from lexical space to value space] means that a wide range of internationalized resource identifiers can be specified when an anyURI is called for”. In XSD 1.1 [2], the spec is a little more explicit, since the IRI concept was a little more clearly developed by that time: "anyURI represents an Internationalized Resource Identifier Reference (IRI). An anyURI value can be absolute or relative, and may have an optional fragment identifier (i.e., it may be an IRI Reference). This type should be used when the value fulfills the role of an IRI, as defined in [RFC 3987] or its successor(s) in the IETF Standards Track.” During the development of XSD 1.1 the WG responded to inconsistencies in the 1.0 implementations of the anyURI type (and, perhaps, to fears that future revisions of the RFCs for URIs and IRIs would continue to change the set of legal values) by seeking to simplify and future-proof the rules used for checking schema-validity of IRIs. For reasons I do not think I can successfully reconstruct (at least, not without falling into depression), it chose to do so by stating clearly that the grammar rules specified by the relevant RFCs are effectively only advisory, and that for purposes of schema validation, any sequence of XML characters constitutes a value of the type. So in XSD 1.1 it is doubly untrue to say that IRIs are not accepted as lexical representations of xsd:anyURI: not only is it clearly stated that IRIs are to be accepted, but strings that do not match the current definition of IRIs will *also* be accepted as schema-valid. XForms needs its own IRI type only if stricter validation of the grammar of URIs and IRIs is needed. If in fact stricter validation is needed, the XForms group may wish to consider using the datatypes defined in “XSD datatypes for strict validation of IRIs and URIs” [3]. It would be very disappointing if the amount of work that went into making xsd:anyURI accept IRIs turned out to be for naught. [1] https://www.w3.org/TR/xmlschema-2/#anyURI [2] https://www.w3.org/TR/xmlschema11-2/#anyURI [3] https://www.w3.org/XML/Group/2004/06/exacturi/xsd-rfc-3986-uri-3986-iri.html N.B. I am umable to verify URI [3], since my access privileges no longer seem sufficient to retrieve the document. [3] was prepared for publication as a WG note by the then XML Schema WG but never published, since the WG ran out of resources and time. When the XML Core WG took over responsibility for XSD, they decided they didn’t have the necessary resources, either. I would be glad if the work were finally published. ******************************************** C. M. Sperberg-McQueen Black Mesa Technologies LLC cmsmcq@blackmesatech.com http://www.blackmesatech.com ********************************************
Received on Wednesday, 7 March 2018 18:37:51 UTC