Re: URIs, RFCs, IRIs

On Thursday, June 1, 2006, 11:09:23 AM, Maciej wrote:

MS> On Jun 1, 2006, at 12:55 AM, Chris Lilley wrote:


>> Hello public-webapi,

>>  In http://www.w3.org/TR/2006/WD-Window-20060407/ there are  
>> multiple references to RFC 2396. This is obsoleted by RFC 3986;  
>> please replace references to the former by the latter and be sure  
>> to use terminology consistent with 3986 which has some  
>> clarifications and changes relative to RFC 2396.

MS> That's the plan. There is already an editorial note to this effect.

I saw that, but interpreted the closing question mark to indicate a
point of discussion rather than a definite plan.

Thanks for clarifying, that satisfies the part of my comment relating to
RFC 2396 vs 3886 assuming the change appears in the next version of the
spec.

>> Also, why is a URI rather than an IRI or an XRI specified here? Is  
>> this an oversight, or is it a deliberate decision to exclude usage  
>> in countries such as China and Korea that make extensive use of  
>> IRIs,or to require such IRIs to be hexified before using this  
>> interface?

MS> It is neither a mistake nor a deliberate malfeasance. The goal for  
MS> the first version of the Window spec is to provide a rationalized  
MS> subset of features that already exist in web browsers. Many do not  
MS> support IRIs either in these APIs or anywhere else. As a new feature,  
MS> IRI support would be out of scope for the first version.

IRIs are not a new feature. Most of them (apart from the IDNS part) has
been supported for years. The definition of the XLink href and of the
W3C XML Schema 'anyURI' as used, for example, in the XHTML schema
definition, are in fact IRIs not URIs.

MS> In fact, it is hard to see how browsers will ever roll out universal  
MS> support for IRIs, since the current practice for characters outside  
MS> the ASCII range is to use the page encoding rather than UTF-8.

Thus leading to well-known problems where URIs are bookmarked, or copy
-and-pasted into another document. Particularly acute problem in the
case of form submissions.

However, my point here was not to debate the merits of IRIs, which are
becoming widely deployed (something like 40% of Korean domain name
registrations, for example, and the new TLDs in China) but merely to
enquire whether the restriction to URIs in this API was

a) an accidental omission
b) a deliberate decision

You have clarified that its the latter, so I expect that I18n Core will
be contacting you in due course.

MS>  And in  
MS> fact there is a lot of content, particularly in China and Korea, that  
MS> depends on the practice of using the page encoding. On UTF-8 pages  
MS> (like google's chinese search results) this is equivalent to IRIs,  
MS> but when using other common encodings like gb2312, changing over  
MS> would be incompatible. An example of a site affected by this is  
MS> http://www.sz.net.cn/.

MS> XRIs would be even less appropriate, since an arbitrary valid URI is  
MS> not a valid XRI, so far as I can tell. So this would be even more of  
MS> an incompatible change.

Agreed there, I was just listing all the possibilities since scripts can
be an are used inside XML. But yes, IRI would be better than XRI for an
API call.

MS> Given these complexities I think it's best to table this issue until  
MS> the next version of the Window spec. 

Note that this makes it less likely that SVG will be able to adopt this
version. Along with the rest of the XML family of specifications, we
already moved to IRIs.
 
MS> It is clear that using  
MS> international characters in resource identifiers is desirable, but  
MS> specifying something that contradicts existing practice would be bad,  

Just to clarify - existing practice also includes using IRIs. This is
not a far future scenario - its something that has already happened.
Current practice is, as you say, mixed; but IRIs are in daily use
already.

MS> and it's not clear if there is a spec that is compatible with  
MS> existing implementations and content.

There is a range of practice in existing content. Some of it depends on
the page encoding and some (eg IDNS usage) does not.

However, if the Window spec intends to codify a 'use the page encoding,
not UTF-8' rule for characters outside US-ASCII, please say so
explicitly in the spec and please note this would contradict RFC 3986,
as well as 3987.

MS> Regards,
MS> Maciej




-- 
 Chris Lilley                    mailto:chris@w3.org
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG

Received on Thursday, 1 June 2006 10:47:22 UTC