Re: Web Character Model and IRI spec (Re: FW: character encoding.)

> OK, what do we need to specify:
>
> 1) display of member names in collections: take the last URI
>    segment, then URL-unescape, then UTF-8-decode
>
> 2) creation of new resources:
>
>   2a) all URI segments MUST be UTF-8-decodable after URL-unescaping
>   2b) if the "local display name" (for instance document name when
>       typing into file selector box) contains non-ASCII characters,
>       it MUST be UTF-8 encoded then URL-escaped.
>
> 3) Forbid member names in collections that aren't
>    Unicode-normalized after URL-de-escaping and UTF-8-decoding.

Yes, these should cover most of the requirements. But rather than
specifying each case separately, it might be easier to say all
strings derived from, or related to resource URI to fullfill
following requirements:

  - All URI segments MUST be URL-escaped after converted into
    "normalized form" of UTF-8 string.

  - Each implementation MUST do the necessary conversion between
    above form and "local representation" of the resource (name
    typed in selector box, name stored on client/server filesystem, etc.).

For 2b, you don't need to handle ASCII/non-ASCII cases separately
since UTF-8 covers ASCII.

Also, adding notes on several HTTP headers such as Destination:
and Status-URI: might be needed as they seem to be protocol-specific
headers.

Regards,
--
Taisuke Yamada <tai@iij.ad.jp>
Internet Initiative Japan Inc., Technical Planning Division

Received on Tuesday, 30 July 2002 09:47:18 UTC