W3C home > Mailing lists > Public > public-xformsusers@w3.org > March 2018

Re: IRIs

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Thu, 08 Mar 2018 15:06:17 +0100
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "Steven Pemberton" <steven.pemberton@cwi.nl>
Cc: public-xformsusers@w3.org
Message-ID: <op.zfj9grzasmjzpq@steven-xps>
But that said, the anyURI type is extremely liberal, accepting literally  
*any* string of characters. The only purpose of the type seems to be  
mandating transforming the characters into something acceptable to a URI  
when necessary.

It would still be useful to have a type that validates according to  
http://www.ietf.org/rfc/rfc3987.txt.

Steven


On Thu, 08 Mar 2018 13:46:33 +0100, Steven Pemberton  
<steven.pemberton@cwi.nl> wrote:

> You are absolutely right, and I am absolutely wrong.
>
> What led me to the conclusion was writing the test suite for anyURI, and  
> IRIs showing up as invalid, and me then following the wrong link.
>
> So all is well, I can breathe a sigh of relief, and carry on with the  
> test suite.
>
> I'm happy that you are reading the XForms mailing list :-)
>
> Steven
>
> On Wed, 07 Mar 2018 19:37:20 +0100, C. M. Sperberg-McQueen  
> <cmsmcq@blackmesatech.com> wrote:
>
>>
>>> On Mar 7, 2018, at 10:39 AM, Steven Pemberton  
>>> <steven.pemberton@cwi.nl> wrote:
>>>
>>> The definition of anyURI doesn't allow IRIs, such as
>>> 	https://zh.wikipedia.org/wiki/Wikipedia:关于中文维基百科/en.
>>>
>>> Just as we added an iemail address type to match modern email  
>>> addresses, it seems to me that we ought to also add an anyIRI type  
>>> that accepts IRIs like the aboce.
>>
>> I am puzzled; what leads you to the conclusion that xsd:anyURI
>> does not accept IRIs?
>>
>> In XSD 1.0 [1], the value space is described as that of
>> RFC 2396, as modified by RFC 2732, and the lexical space
>> is described (roughly) as the set of strings, which after
>> escaping, turn into URIs as defined by those specs.  The
>> escaping in question is the then current algorithm for IRIs,
>> as published in the XLink spec. I believe that later revisions
>> of the concept of IRI changed the rules for whitespace, but
>> I don’t recall any other changes likely to be noticeable to
>> users of the datatype.  Certainly the intent of XSD 1.0
>> was to accept IRIs in the lexical space of the type anyURI.
>>
>> The spec says "This [the mapping from lexical space to value
>> space] means that a wide range of internationalized resource
>> identifiers can be specified when an anyURI is called for”.
>>
>> In XSD 1.1 [2], the spec is a little more explicit, since the
>> IRI concept was a little more clearly developed by that time:
>> "anyURI represents an Internationalized Resource Identifier
>> Reference (IRI).  An anyURI value can be absolute or relative,
>> and may have an optional fragment identifier (i.e., it may be
>> an IRI Reference).  This type should be used when the value
>> fulfills the role of an IRI, as defined in [RFC 3987] or its
>> successor(s) in the IETF Standards Track.”
>>
>> During the development of XSD 1.1 the WG responded to
>> inconsistencies in the 1.0 implementations of the anyURI
>> type (and, perhaps, to fears that future revisions of the RFCs
>> for URIs and IRIs would continue to change the set of legal
>> values) by seeking to simplify and future-proof the rules used
>> for checking schema-validity of IRIs.  For reasons I do not think
>> I can successfully reconstruct (at least, not without falling
>> into depression), it chose to do so by stating clearly that the
>> grammar rules specified by the relevant RFCs are effectively
>> only advisory, and that for purposes of schema validation,
>> any sequence of XML characters constitutes a value of the
>> type.
>>
>> So in XSD 1.1 it is doubly untrue to say that IRIs are not
>> accepted as lexical representations of xsd:anyURI:  not only
>> is it clearly stated that IRIs are to be accepted, but strings
>> that do not match the current definition of IRIs will *also*
>> be accepted as schema-valid.
>>
>> XForms needs its own IRI type only if stricter validation of the
>> grammar of URIs and IRIs is needed.
>>
>> If in fact stricter validation is needed, the XForms group may
>> wish to consider using the datatypes defined in “XSD datatypes
>> for strict validation of IRIs and URIs” [3].
>>
>> It would be very disappointing if the amount of work that went
>> into making xsd:anyURI accept IRIs turned out to be for
>> naught.
>>
>> [1] https://www.w3.org/TR/xmlschema-2/#anyURI
>> [2] https://www.w3.org/TR/xmlschema11-2/#anyURI
>> [3]  
>> https://www.w3.org/XML/Group/2004/06/exacturi/xsd-rfc-3986-uri-3986-iri.html
>>
>> N.B. I am umable to verify URI [3], since my access privileges
>> no longer seem sufficient to retrieve the document.  [3] was
>> prepared for publication as a WG note by the then XML Schema
>> WG but never published, since the WG ran out of resources and
>> time.  When the XML Core WG took over responsibility for
>> XSD, they decided they didn’t have the necessary resources, either.
>> I would be glad if the work were finally published.
>>
>> ********************************************
>> C. M. Sperberg-McQueen
>> Black Mesa Technologies LLC
>> cmsmcq@blackmesatech.com
>> http://www.blackmesatech.com
>> ********************************************
>>
Received on Thursday, 8 March 2018 14:06:52 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:37:49 UTC