W3C home > Mailing lists > Public > public-i18n-core@w3.org > April to June 2010

Re: mediafragment track names and IRIs.

From: Jack Jansen <Jack.Jansen@cwi.nl>
Date: Wed, 23 Jun 2010 23:18:41 +0200
Cc: Yves Lafon <ylafon@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "public-media-fragment@w3.org" <public-media-fragment@w3.org>
Message-Id: <6DE5CA8A-556B-470C-A204-B2BB650D6BDB@cwi.nl>
To: "Phillips, Addison" <addison@lab126.com>

On 23 jun 2010, at 17:12, Phillips, Addison wrote:

> Hi Jack,
> 
>> I am confused, maybe you can enlighten me.
> 
> Hopefully I can untangle any confusion I've caused!
> 
>> 
>> We have based our media fragment URIs on rfc3987. For the one area
>> where things make a difference (encoding track name and ID
>> parameters), this document specifically states that percent-escapes
>> should be interpreted as UTF-8 (last paragraph of section 3.2.2).
> 
> When I looked at the Media Fragments draft, I didn't see a reference to IRI (3987) and a number of references to URI (3986). I may not have been looking in the right place, of course. I'm looking at:
> 
>   http://www.w3.org/TR/2010/WD-media-frags-20100413/
> 
> The key thing about the section regarding track names to me would be to put things "the other way around". That is, if you're using IRI, then a track name would be a sequence of Unicode characters. The sequence is encoded to a URI by percent-encoding using UTF-8 according to the rules in IRI. Instead I see a definition in terms of URI in which a "utf8string" is the percent-encoded representation of the underlying track name.

You're right, on both accounts (of course:-):
1) The document currently refers to 3986 (URIs) and not 3987 (IRIs).
2) 3986 indeed specifies that percent-encoding encodes bytes, not unicode. Here I was misled by the various other places in 3986 where it explicitly says "use unicode".

> 
>> 
>> But, the CharMod <http://www.w3.org/TR/CharMod-resid> reference you
>> cite refers to the much older URI specification rfc2396 (and then
>> adds stuff to it to say things should be utf-8 encoded). Rfc2396 is
>> indeed "not good enough" for us, as it talks about byte values for
>> percent encoding.
> 
> CharMod-Resid has the problem of having been published before IRI or the most recent URI were final. So it couldn't reference them normatively. 

Ah! Thanks for the clarification.

> I'm concerned that perhaps there is confusion about what an IRI is vs. what a URI is. Would it be useful for (selected members of) our WG to attend one of your teleconferences (or vice versa)?


For me personally, it would work better if you could explain things over email. Not that I don't want you to attend our teleconf, far from it:-), but if we have it in writing it is much easier to refer back to.

Specifically, I would like to know whether there's any documentation on transitioning from URIs to IRIs. In another group I was active in (SYMM) we decided at the last moment to use IRIs in stead of URIs. It was a simple drop-in replacement, because SMIL is primarily a standalone format. But for Media Fragments this is different: our spec is not meant as a standalone spec, but to be used in numerous client formats (HTML5, to name an important one), protocols (HTTP) and server implementations (HTTP servers and caches). One thing I'm worried about is what it means if we specify IRIs: how would it interact with existing standards (html, http) that now specify URIs? What would it mean for client and server implementations: does it mean they cannot formally be Media-Fragment-Compliant unless they convert their code to use IRIs in stead of URIs?

If the latter is the case, a standard such as ours would be best served with a URI-or-IRI-whatever-suits-you-best scheme...

OTOH, if there is documentation about this, and how to handle it, I'd love to get a pointer.
--
Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma Goldman
Received on Wednesday, 23 June 2010 21:19:26 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 23 June 2010 21:19:28 GMT