W3C home > Mailing lists > Public > xml-dist-app@w3.org > October 2004

RE: Closing Issue 502 ( was RE: Issue 502 is closed )

From: Martin Duerst <duerst@w3.org>
Date: Thu, 21 Oct 2004 15:59:58 +0900
Message-Id: <6.0.0.20.2.20041019162744.0606a208@localhost>
To: "Martin Gudgin" <mgudgin@microsoft.com>, <aphillips@webmethods.com>, "I18n WSTF" <public-i18n-ws@w3.org>, <xml-dist-app@w3.org>
Cc: "Yves Lafon" <ylafon@w3.org>

Hello Martin,

I'm not sure anymore about the exact wording of the original comment,
but the intention was definitely to make sure that IRIs worked, and
that the spec, test cases, and implementations would not do anything
that contradicted that.

I think the problem in the CR text at
http://www.w3.org/TR/2004/CR-soap12-rep-20040826/#rep-resource is that
it says that "the value of the resource attribute information item
is a URI", while the definition of anyURI at
http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#anyURI
very clearly does NOT say that the value space of anyURI is URIs.
In particular, it says "The mapping from anyURI values to URIs is...",
and so makes it clear that in terms of XML Schema, the value space
is the space of IRIs, not URIs.
(see also
http://www.w3.org/TR/2004/PER-xmlschema-2-20040318/datatypes.html#anyURI,
which hasn't changed this).

So I was taking the CR text as restricting the attribute to URIs only,
and I think that anybody else may also easily read it that way.
If that, as you say, is not the intention of the XMLP WG, then the
text should be changed. I propose the following:

 >>>>>
The type of the resource attribute information item is xs:anyURI.
The value of the resource attribute information item
identifies the Web resource whose representation is carried in the
rep:Representation element information item parent of the resource
attribute information item.
 >>>>

And maybe add a note such as:

 >>>>
Note: The anyURI type allows non-ASCII characters, and defines how
       to convert an anyURI value to an (ASCII-only) URI if necessary.
 >>>>

At 14:43 04/10/19, Martin Gudgin wrote:
 >
 >Speaking for myself; my understanding from the issue raised was that
 >IRIs contain actual Unicode octets outside the ASCII range, hence the
 >examples you provided.

Yes indeed.

 >xs:anyURI allows this. The type of the attribute
 >in question is xs:anyURI.

Yes, but your language seemed to disallow this, as explained above.


 >The HTTP spec clearly disallows this as only
 >ASCII characters are allowed in the URI portion, hence the encoding as
 >UTF-8 using %HH

Yes, but this only applies to URIs in the HTTP protocol (e.g. in
a GET request). In the resource attribute, non-ASCII characters are
allowed, independent of the URI scheme (i.e. even for http://....).

 >If you really believe that IRI == Unicode octets == ASCII encoded
 >unicode octets

Well, this is not a matter of believing, this is a matter of specifying
and implementation. And it depends on your use of "==", it's exact meaning.

 >then I really don't understand your original issue
 >because as far as I can tell ALL three versions of the text we have
 >provided to you would allow one or more of the two encodings. Out
 >original text in the CR spec allowed both. The first amended version
 >provided to you allowed ASCII encoded unicode octets, the latest version
 >allows both.

As I have shown above, that doesn't seem to be the case.

 >So I don't understand your concern. You wanted the spec to allow IRIs.
 >As far as I can tell, given your definiton below, it always has.

No, it hasn't, because it restricts the value space of anyURI from
IRIs to URIs. If that wasn't the intention of the XMLP WG, then it's
easy to fix.

Regards,    Martin.

 >Gudge
 >
 >> -----Original Message-----
 >> From: Martin Duerst [mailto:duerst@w3.org]
 >> Sent: 18 October 2004 21:59
 >> To: Martin Gudgin; aphillips@webmethods.com; I18n WSTF;
 >> xml-dist-app@w3.org
 >> Cc: Yves Lafon
 >> Subject: RE: Closing Issue 502 ( was RE: Issue 502 is closed )
 >>
 >> At 23:51 04/10/15, Martin Gudgin wrote:
 >>  >I think the sentence makes sense as is, but I've added the
 >> 'the' anyway. We
 >>  >used 'schemes' because our understanding is that it's the
 >> scheme which
 >>  >defines what characters are legal in an identifier per that scheme.
 >>
 >> I was confused quite a bit by this because I assumed that 'scheme'
 >> was referring to the XML Schema that would restrict the use of anyURI
 >> to ASCII only for the time being.
 >>
 >> Now that I have again read through the thread, my understanding is
 >> that by "scheme", you mean URI scheme. If that's the case, then
 >> the text (independent of the various tweaks discussed) is based on
 >> some very wrong assumptions.
 >>
 >> As discussed quite explicitly and extensively in issue iri-scheme-38
 >> (http://www.w3.org/International/iri-edit/Overview.html#iri-sc
 >heme-38),
 >> and reflected in the spec itself in many ways (not the least being
 >> various examples), there is no a priori distinction between URI
 >> schemes and IRI schemes. There are only URI schemes, but every
 >> URI scheme can, potentially at least, be used with IRIs.
 >>
 >> The condition for use with IRIs is, roughly, that the scheme requires
 >> or allows non-ASCII characters to be encoded in UTF-8 and %HH in the
 >> URI scheme or actual URIs or parts thereoff.
 >>
 >> As such, in particular the HTTP scheme definitely qualifies for use
 >> with IRIs, because it allows non-ASCII characters to be encoded in
 >> UTF-8 and %HH. Because it only allows, rather than requires, this,
 >> individual HTTP URIs, or parts theroff, may work more or less well
 >> with IRIs. Indeed, if you put a HTTP URI containing a %HH sequence
 >> based on UTF-8 in its path into the location field of a modern
 >> browser (e.g. Opera or Safari), it will automatically convert
 >> this to actual (Unicode) characters. On the other hand, if you
 >> input an http: IRI there, these browsers (and some others) will
 >> automatically convert using UTF-8 and %HH as part of their
 >> HTTP resolution.
 >>
 >> So the fundamental assumption behind the text is wrong; IRIs
 >> can be used already with many existing URI schemes.
 >>
 >>
 >> Regards,     Martin.
 >>
 >>
 >>  >> > Dear Martin and I18N,
 >>  >> >
 >>  >> > Regarding issue 502[1], the XMLP Working Group has amended
 >>  >> section 4.2.2
 >>  >> > if the Resource Representation SOAP Header Block
 >>  >> specification to read:
 >>  >> >
 >>  >> > "The type of the resource attribute information item is
 >>  >> xs:anyURI. The
 >>  >> > value of the resource attribute information item is a URI that
 >>  >> > identifies the Web resource whose representation is
 >> carried in the
 >>  >> > rep:Representation element information item parent of
 >> the resource
 >>  >> > attribute information item. NOTE: the use of the xs:anyURI type
 >>  >> > anticipates the possibility that in future schemes will
 >> be developed
 >>  >> > that use IRI rather than URI naming for resources."
 >>  >> >
 >>  >> > We trust this addresses your concern about allowing IRIs in
 >>  >> the resource
 >>  >> > attribute.
 >>  >> >
 >>  >> > Regards
 >>  >> >
 >>  >> > Martin Gudgin
 >>  >> > For the XMLP WG
 >>  >> >
 >>  >> > [1] http://www.w3.org/2000/xp/Group/xmlp-cr-issues.html#x502
 >>  >>
 >>  >>
 >>
 >> 
Received on Thursday, 21 October 2004 07:02:49 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 23:39:14 UTC