Re: Comments on charmod from Chris from Tim Bray on 2002-05-27 (www-tag@w3.org from May 2002)

From: Tim Bray <tbray@textuality.com>
Date: Mon, 27 May 2002 13:56:13 -0700
Cc: www-tag@w3.org
Message-ID: <3CF29D6D.7020405@textuality.com>

Chris Lilley wrote:
> Hello folks,
> 
>  Here are my own, personal, comments on the charmod last call draft
>  [1]. I am posting them here to encourage discussion and as a step
>  towards a TAG finding on issue 17 [2].

My comments too are personal.

> These sections (collectively "character 101"): 
> 3 Characters. 
> 5 Compatibility and Formatting Characters. 
> 6 String Identity Matching. 
> 7 String Indexing 
> 9 Referencing the Unicode Standard and ISO/IEC 10646
> 
> taken as a group, are great, in general, and should be collected together 
 >with appropritate intrioductory and reference material as a separate 
document
 >and move to Proposed Rec once it exits Last Call.

Agreed.

> Section 4 Early Uniform Normalization is very important, but affects a lot 
 >of specifications and needs, I believe, a CR period as does section 8

Haven't made up my mind about 8, but I agree about 4.  In particular, 
this one clearly has a complex cost/benefit trade-off; there is a 
substantial advantage to enforcing Early Uniform Normalization on 
everything, but also a nontrivial cost.  I think we need more 
information on costs and benefits before we can make this call, and CR 
seems like the right way to get it.

> 3.7
> 
> "[S] Escaped characters SHOULD be acceptable wherever unescaped characters are; 
 >this does not preclude that a syntax-significant character, when 
escaped, loses
 >its significance in the syntax. In particular, escaped characters 
SHOULD be
 >acceptable in identifiers and comments."
> 
> XML should allow NCRs everywhere, for example inside element and attribute names?

Yes, it probably should have.  If I'm stuck in an ASCII environment I 
can create Arabic content in XML using NCRs, but if there's even one 
Arabic attribute/tag name I'm probably stuck.  It may be too late to fix 
XML, but I think the SHOULD is accurate.

> 8 Character Encoding in URI References
> 
> Why not go further and say that the IRI form is used in the document instance 
> and the hexified URI form when it goes over the wire? 

Indeed, why not?  -Tim

Received on Monday, 27 May 2002 16:56:04 UTC