[Bug 2754] wd-28: Proposal from the i18n-core wg for changes of anyURI

http://www.w3.org/Bugs/Public/show_bug.cgi?id=2754

           Summary: wd-28: Proposal from the i18n-core wg for changes of
                    anyURI
           Product: XML Schema
           Version: 1.1 only
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Keywords: needsAgreement
          Severity: normal
          Priority: P2
         Component: Datatypes: XSD Part 2
        AssignedTo: cmsmcq@w3.org
        ReportedBy: holstege@mathling.com
         QAContact: www-xml-schema-comments@w3.org


This is a proposals for changes of the datatype anyURI, as described by xml 
schema (cf. http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#anyURI). It is 
send on behalf of the i18n-core wg.

The i18n-core-wg proposes an update of the datatype anyURI which is defined in 
the current version of XML Schema part 2, cf. http://www.w3.org/TR/2004/REC-
xmlschema-2-20041028/#anyURI Currently the mapping from anyURI values to URIs is 
defined in terms of the XLINK specification, cf. http://www.w3.org/TR/2001/REC-
xlink-20010627/#link-locators . We think that anyURI should refer to the 
specification of Internationalized Resource Identifiers (IRIs) instead, cf. 
http://www.ietf.org/rfc/rfc3987. The IRI specification has achieved a stable 
status. It is a specification of how to expand the set of characters in URIs 
from a subset of US-ASCII to the Universal Character Set (Unicode/ISO 10646). 
W3C has announced to support the IRI specification, so we propose its 
application for anyURI. Our proposal for anyURI consists of 4 points:

(1) anyURI should refer to sec. 3.1 of the IRI-spec, instead of XLINK. This is 
important for example because of the normalization requirements as described in 
the IRI specification: if a legacy-encoding is not normalized before mapping 
from anyURI to URIs, the result might be different from the normalized case. The 
IRI specification gives an example for such a legacy-encoding from Vietnamese 
encoded as windows-1258, cf. also sec. 3.1. The normalization problem is only an 
example of many other important details which are discussed in the IRI 
specification.

(2) Any reference to URI should be updated from RFC 2396 to RFC 3987. For domain 
names, anyURI should refer to the IDN-part of the ABNF of the IRI-spec, cf. sec. 
2.2 of the IRI-spec. This will allow access to internationalized domain names.

(3) The definition of anyURI may want to point to the following paragraph from 
section 3.1 of the IRI specification: "Systems accepting IRIs MAY also deal with 
the printable characters in US-ASCII that are not allowed in URIs, namely "<", 
">", '"', space, "{", "}", "|", "\", "^", and "`", in step 2 above. If these 
characters are found but are not converted, then the conversion SHOULD fail. 
Please note that the number sign ("#"), the percent sign ("%"), and the square 
bracket characters ("[", "]") are not part of the above list and MUST NOT be 
converted. Protocols and formats that have used earlier definitions of IRIs 
including these characters MAY require percent-encoding of these characters as a 
preprocessing step to extract the actual IRI from a given field. This 
preprocessing MAY also be used by applications allowing the user to enter an 
IRI."

(4) an editorial issue: the reference from anyURI to section 8 of the old 
version of the "character model for the world wide web" specification should be 
changed to the new charmod-resid specification, cf. http://www.w3.org/TR/2004/
CR-charmod-resid-20041122/

Proposal concerning
Part 2
anyURI

Transition history
raised on 4 Apr 2005 by fsasaki@w3.org, on behalf of I18N Core WG (http://lists.
w3.org/Archives/Public/www-xml-schema-comments/2005AprJun/0000.html)

Received on Friday, 20 January 2006 21:40:59 UTC