Re: IRIs

The tighter types defined in [1] may be what you want.

[1] https://www.w3.org/XML/Group/2004/06/exacturi/xsd-rfc-3986-uri-3986-iri.html

Actually, though, since it appears the access privileges for /XML/Group
have been made more restrictive, it's not clear that the document at [1]
is available to anyone outside the Team anymore.  So I attach a copy,
which I have munged to try to make it display plausibly from the
lists.w3.org archives.

The actual definitions of the types appear to accessible in [2], [3], and [4].

[2] http://www.w3.org/2011/04/XMLSchema/TypeLibrary-IRI-URI-driver.xsd
[3] http://www.w3.org/2011/04/XMLSchema/TypeLibrary-URI-RFC3986.xsd
[4] http://www.w3.org/2011/04/XMLSchema/TypeLibrary-IRI-RFC3987.xsd

Note that these are marked as drafts, and contain text saying that the
current version of the types is in the schema datatype library at [5], which
is not true: since the draft note was never published by the Schema WG,
the types were never added to the type library.  If XForms wants to use
them, you should probably re-issue them by revising the schema documents
and publishing them in an appropriate location.  (And if you want to
publish [1] as a group document, that would probably be useful for
those who need to understand how the schema documents are 
constructed.)

Michael


On Mar 8, 2018, at 7:06 AM, Steven Pemberton wrote:

> But that said, the anyURI type is extremely liberal, accepting literally *any* string of characters. The only purpose of the type seems to be mandating transforming the characters into something acceptable to a URI when necessary.
> 
> It would still be useful to have a type that validates according to http://www.ietf.org/rfc/rfc3987.txt.
> 
> Steven
> 
> 
> On Thu, 08 Mar 2018 13:46:33 +0100, Steven Pemberton <steven.pemberton@cwi.nl> wrote:
> 
>> You are absolutely right, and I am absolutely wrong.
>> 
>> What led me to the conclusion was writing the test suite for anyURI, and IRIs showing up as invalid, and me then following the wrong link.
>> 
>> So all is well, I can breathe a sigh of relief, and carry on with the test suite.
>> 
>> I'm happy that you are reading the XForms mailing list :-)
>> 
>> Steven
>> 
>> On Wed, 07 Mar 2018 19:37:20 +0100, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
>> 
>>> 
>>>> On Mar 7, 2018, at 10:39 AM, Steven Pemberton <steven.pemberton@cwi.nl> wrote:
>>>> 
>>>> The definition of anyURI doesn't allow IRIs, such as
>>>>  https://zh.wikipedia.org/wiki/Wikipedia:关于中文维基百科/en.
>>>> 
>>>> Just as we added an iemail address type to match modern email addresses, it seems to me that we ought to also add an anyIRI type that accepts IRIs like the aboce.
>>> 
>>> I am puzzled; what leads you to the conclusion that xsd:anyURI
>>> does not accept IRIs?
>>> 
>>> In XSD 1.0 [1], the value space is described as that of
>>> RFC 2396, as modified by RFC 2732, and the lexical space
>>> is described (roughly) as the set of strings, which after
>>> escaping, turn into URIs as defined by those specs.  The
>>> escaping in question is the then current algorithm for IRIs,
>>> as published in the XLink spec. I believe that later revisions
>>> of the concept of IRI changed the rules for whitespace, but
>>> I don’t recall any other changes likely to be noticeable to
>>> users of the datatype.  Certainly the intent of XSD 1.0
>>> was to accept IRIs in the lexical space of the type anyURI.
>>> 
>>> The spec says "This [the mapping from lexical space to value
>>> space] means that a wide range of internationalized resource
>>> identifiers can be specified when an anyURI is called for”.
>>> 
>>> In XSD 1.1 [2], the spec is a little more explicit, since the
>>> IRI concept was a little more clearly developed by that time:
>>> "anyURI represents an Internationalized Resource Identifier
>>> Reference (IRI).  An anyURI value can be absolute or relative,
>>> and may have an optional fragment identifier (i.e., it may be
>>> an IRI Reference).  This type should be used when the value
>>> fulfills the role of an IRI, as defined in [RFC 3987] or its
>>> successor(s) in the IETF Standards Track.”
>>> 
>>> During the development of XSD 1.1 the WG responded to
>>> inconsistencies in the 1.0 implementations of the anyURI
>>> type (and, perhaps, to fears that future revisions of the RFCs
>>> for URIs and IRIs would continue to change the set of legal
>>> values) by seeking to simplify and future-proof the rules used
>>> for checking schema-validity of IRIs.  For reasons I do not think
>>> I can successfully reconstruct (at least, not without falling
>>> into depression), it chose to do so by stating clearly that the
>>> grammar rules specified by the relevant RFCs are effectively
>>> only advisory, and that for purposes of schema validation,
>>> any sequence of XML characters constitutes a value of the
>>> type.
>>> 
>>> So in XSD 1.1 it is doubly untrue to say that IRIs are not
>>> accepted as lexical representations of xsd:anyURI:  not only
>>> is it clearly stated that IRIs are to be accepted, but strings
>>> that do not match the current definition of IRIs will *also*
>>> be accepted as schema-valid.
>>> 
>>> XForms needs its own IRI type only if stricter validation of the
>>> grammar of URIs and IRIs is needed.
>>> 
>>> If in fact stricter validation is needed, the XForms group may
>>> wish to consider using the datatypes defined in “XSD datatypes
>>> for strict validation of IRIs and URIs” [3].
>>> 
>>> It would be very disappointing if the amount of work that went
>>> into making xsd:anyURI accept IRIs turned out to be for
>>> naught.
>>> 
>>> [1] https://www.w3.org/TR/xmlschema-2/#anyURI
>>> [2] https://www.w3.org/TR/xmlschema11-2/#anyURI
>>> [3] https://www.w3.org/XML/Group/2004/06/exacturi/xsd-rfc-3986-uri-3986-iri.html
>>> 
>>> N.B. I am umable to verify URI [3], since my access privileges
>>> no longer seem sufficient to retrieve the document.  [3] was
>>> prepared for publication as a WG note by the then XML Schema
>>> WG but never published, since the WG ran out of resources and
>>> time.  When the XML Core WG took over responsibility for
>>> XSD, they decided they didn’t have the necessary resources, either.
>>> I would be glad if the work were finally published.
>>> 
>>> ********************************************
>>> C. M. Sperberg-McQueen
>>> Black Mesa Technologies LLC
>>> cmsmcq@blackmesatech.com
>>> http://www.blackmesatech.com
>>> ********************************************
>>> 

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************

Received on Thursday, 8 March 2018 17:57:02 UTC