- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 16 Dec 2010 18:45:19 -0700
- To: www-xml-schema-comments@w3.org
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
In the context of bug 6089 http://www.w3.org/Bugs/Public/show_bug.cgi?id=6089 Murata Makoto has suggested that xsd:anyURI of 1.1 should allow LEIRIs of W3C (and IETF) and nothing else. In the course of trying to figure out how to move forward on this issue, I have just reviewed the record of the XML Schema WG's discussions of this and related issues, specifically the issues originally opened as wd-25, wd-28, and wd-29 in the 1.1 issues list at http://www.w3.org/XML/2004/07/xs11-pre-lc-issues/ These were later transferred into Bugzilla as 2751 wd-25: anyURI, RFCs 2396 and 3896 2754 wd-28: Proposal from the i18n-core wg for changes of anyURI 2755 wd-29: URI changes in RFC 3986 The rationale for the WG's decisions emerges tolerably well, I think, from the minutes of the meetings of May 2005 in Morrisville, North Carolina, and of August 2005 in San Mateo, California: http://lists.w3.org/Archives/Member/w3c-xml-schema-wg/2005May/att-0024/Minutes_of_the_W3C_XML_Schema_Working_Group_5th__38th__F2F_meeting.htm http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2005Aug/0020.html Both sets of minutes are member-accessible. For the benefit of others, I summarize some of the technical arguments brought forward in those meetings. - The datatypes xsd:language and xsd:anyURI both depend on and refer to external specifications, and in both cases the external specifications referred to by XSD 1.0 have been made obsolete by new specifications which specify different syntax for the construct. It would be desirable to resolve the attendant problems in the same way for both datatypes. In the case of xsd:lang, XSD 1.1 specifies a simple regular language which is a superset of the languages specified by the old and new specifications for language codes, and type validity in XSD 1.1 is explicitly not sufficient to guarantee conformance to the external specification. Some members of the WG wanted a similar solution here: to specify a simple rule which is a superset of the rules of the various forms of the various external specifications. - There were reliable reports that some URIs accepted by RFC 2396 were not accepted by RFC 3986. - Some members of the WG were and are strongly opposed to any change which would change any document from valid under XSD 1.0 to invalid in XSD 1.1. These WG members are also concerned about changes that loosen the type validity rules and thus move documents from invalid to valid, but such changes are felt (I believe) to be less damaging and are not opposed so firecely. - It was believed at the time by some WG members (including me) that the language defined by the RFCs for URIs and IRIs was not regular and thus could not be reduced to a regular expression, so that the easiest way to specify a superset would be to allow any string as a type-valid form of anyURI. [Subsequent work has shown that this belief was wrong: the language of RFC 3986 is regular.] - Some WG members argued that for realistic validation of URIs, the RFCs for generic URI syntax are insufficient, because they do not cover any of the scheme-specific rules of syntax. They concluded that requiring conformance to the RFC as a condition of type-validity was not actually very helpful. - Some members of the WG felt that XSD 1.0 did not in fact impose tight constraints on anyURI values (at least, not effectively). It does say that The ·lexical space· of anyURI is finite-length character sequences which, when the algorithm defined in Section 5.4 of [XML Linking Language] is applied to them, result in strings which are legal URIs according to [RFC 2396], as amended by [RFC 2732]. but "legal URI" is not a term defined by RFC 2396 or RFC 2732, and it is not clear at first glance to readers of those specifications whether they actually intend to define a clearly bounded class of conforming strings or not, and if so just what it is. The fact that multiple grammars are given, which accept different strings, may be part of the difficulty here. Some experienced Web programmers have claimed that really the only strings forbidden by RFC 2396 are strings with more than one # character in them. (This also turns out to be not quite true: if the RFC prohibits anything, is also prohibits strings with the various prohibited characters.) On this view, the statement in XSD 1.1 that any string is type-valid as an instance of anyURI is not so much a liberalization as a coming clean about the state of affairs. Some WG members were explicit that they believed the intent of 1.0 (whether successfully expressed or not) had been to have a type which allowed pretty much any string. - On the dissenting side, some implementors noted that their implementations did check the rules of RFC 2396 and they had comments from users suggesting that some users at least do use the type in the expectation that it will enforce the RFC rules. - The empirical data available on the behavior of existing XSD 1.0 processor suggested that the existing processors were not consistent in the rules they checked for URI values. - Some WG members argued that XSD 1.0 had made a mistake in coupling the specification of URIs tightly to a specific version of a specific external specification (RFC 2396); users are better served by a loose coupling between specifications. Just as HTML validity does not depend on conformance to a particular version of the URI specification, so schema-validity should not either. Loose coupling allows specs to remain stable even as the external specs they refer to are revised. In international standards, normative references are often (not always) accompanied by text which says, roughly The following standards contain provisions which, through reference in this text, constitute provisions of [this specification]. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements base on [this specification] are encouraged to investigate the possibility of applying the most recent editions of the standards listed below. Some readers (including me) take this as an indication that conforming implementations of those ISO specifications are allowed to support the current version of the other specifications referred to, without losing their claim to conformance. Some WG members (including me) had believed that this was such a self-evidently necessary rule that no one could read any W3C specification (specifically including XSD 1.0) as forbidding implementations from supporting (for example) later versions of the URI spec, or the language-code spec, or Unicode, or XML, than those listed in the references. Experience has taught differently. Some readers, including some members of the XML Schema WG, read XSD 1.0 as requiring the use of specific versions of external specifications, and not allowing upgrades to later versions. Others may (and do) believe that those readers are wrong, but it is clear that they exist. It's empirically observable that the definitions of URI and IRI have changed as older versions of the specifications have been replaced by newer ones. Some members of the WG felt, when we made this decision, that it would be better NOT to try to track the details of external specifications; users (it was argued) would be better served by being able to use the current version of the IRI spec, rather than being trapped by their schema processor with an outdated version of the spec. Murata-san's original comment illustrates concisely the problem faced by users when XSD is tightly coupled to external specifications. So much for the technical arguments advanced at the time. Reviewing the decision record has persuaded me that the arguments for loose coupling are good ones. I think now that XSD would perhaps do better to encourage, or even require, implementations to enforce the rules of *some* implementation-specified version of the relevant RFCs, but it's clear from the minutes that a proposal to require support for an implementation-specified RFC would never have gotten anywhere; a proposal to encourage it without requiring it was in fact made, and got nowhere. I hope this helps clarify the design rationale for the current state of affairs in XSD 1.1 both with regard to IRIs and with regard to language codes. CMSMcQ -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Friday, 17 December 2010 01:45:54 UTC