W3C home > Mailing lists > Public > public-i18n-geo@w3.org > June 2005

Re: Feeback on tutorials

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 02 Jun 2005 11:09:37 +0900
Message-ID: <429E6A61.2080409@w3.org>
To: Chris Lilley <chris@w3.org>
Cc: GEO <public-i18n-geo@w3.org>

Chris Lilley wrote:

>On Wednesday, June 1, 2005, 6:59:29 PM, Felix wrote:
>FS> - "Various document formats already support IRIs. Examples include HTML
>FS> 4.0, XML (system identifiers), the XLink href  attribute, XML Schema's
>FS> anyURI datatype,": Unfortunately this is not true, as I had to realize
>FS> myself while reviewing QT: anyURI does not support IRI directly yet, it
>FS> still refers to XLink 1.0
>Which is most of IRI, apart from the international domain names. It has
>all the "utf-8 and hexify" stuff.
>and XLink 1.1 should fix that

There is a crucial part missing in XLink 1.0, that is the normalization 
procedure as you map IRI to URI, see secction 3.1 of the IRI spec, step 
1.b. But you are right, XLink 1.1 will fix that.

>  3.1 Processing Dependencies
>  XLink processing depends on [XML], [XML Names], [XML Base], and [IETF
>  RFC 3987].
>  The value of the href attribute must be an IRI reference as defined in
>  [IETF RFC 3987] or must result in an IRI reference after the escaping
>  procedure described below is applied. (By design, all URIs (Uniform
>  Resource Identifiers) as defined in [IETF RFC 3986] are also IRIs.)
>  XLink 1.0 described a procedure for escaping characters found in the
>  href attribute value that were not allowed in URIs. For XLink 1.1,
>  those details are normatively described in Section 3.1 of [IETF RFC
>  3987]. However, for backwards compatibility, XLink 1.1 processors must
>  escape one additional character, the space. All occurrences of a space
>  in the value of an href attribute must be replaced by %20.
>/me wonders about that last sentence.

See http://www.w3.org/TR/REC-xml/#AVNormalize (sec. 3.3) from the XML 
spec., step 3 of white space normalization, the third bullet point, for 
an explanation why this has to go trough %20. There are further 
processing steps in the XML spec., which rely only on #x20, but not on 
#xD, #xA, or #x9, e.g. normalization of attribute values those type is 
other than CDATA.
In section 4.2.2, the XML spec also says:
"System identifiers (and other XML strings meant to be used as URI 
references) MAY contain characters that, according to [IETF RFC 2396] 
and [IETF RFC 2732], must be escaped before a URI can be used to 
retrieve the referenced resource. The characters to be escaped are the 
control characters #x0 to #x1F and #x7F (most of which cannot appear in 
XML), space #x20, ..."

So this explains why %20 is the bottleneck for backwards compatibility.

>FS> - "Unfortunately, not so many protocols allow IRIs to pass through 
>FS> unchanged.": Why unfortunately? The mapping from IRI to URI is 
>FS> reversible,
>But comparisons before and after the mapping do not yield identity. 

That's true.

>it would be easier if the mapping was not needed. So "unfortunately" is
>reasonable there.
You are right in saying that comparisons before and after the mapping do 
not yield identity, but taking into account the special security 
considerations for IRI (see sec. 8 of the IRI spec.), I think it is 
reasonable to have the mapping, which - as I said - encompasses a 
normalization step. The normalization form C [1] helps to get rid of 
some of the security issues.

>FS> and the protocols you mention have good reasons for the 
>FS> ASCII-escaping; the IRI spec itself mentions HTTP:  "The intent is not
>FS> to introduce IRIs into contexts that are not defined to accept them.
>FS> For example, XML schema [XMLSchema] has an explicit type "anyURI" that
>FS> includes IRIs and IRI references. Therefore, IRIs and IRI references can
>FS> be in attributes and elements of type "anyURI".  On the other  hand, in
>FS> the HTTP protocol [RFC2616], the Request URI is defined as a URI, which
>FS> means that direct use of IRIs is not allowed in HTTP requests. "
>Right. Are there any protocols that accept IRI without mapping?
I'mt not aware of any. RFC 2718 (sec. 2.2.5) recommends the mapping for 
new protocols.

[1] http://www.unicode.org/reports/tr15/

-- Felix
Received on Thursday, 2 June 2005 02:09:45 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:02 UTC