Re: Liam's comments on Authoring Techniques for XHTML & HTML Internationalization 1.0, 9th October 2003 draft from 'Liam Quin' on 2004-05-06 (www-i18n-comments@w3.org from May 2004)

From: 'Liam Quin' <liam@w3.org>
Date: Thu, 6 May 2004 18:35:46 -0400
To: www-i18n-comments@w3.org
Cc: Richard Ishida <ishida@w3.org>
Message-ID: <20040506223546.GI8106@w3.org>

Many thanks for the (helpful and satisfactory) response to my comments.

The I18N WG is clearly doing very useful work in these documents!

You asked for clarification here:

>> [6] 3.3 Avoid escapes when the characters to be expressed are 
>> representable in the character encoding of the document.
>> 
> > Why?  It can be very convenient in some environments, for 
>> example, to stick to 7-bit ("ASCII") values and to use 
>> character references for all codpoints that require 8 or more 
>> bits.  This can make document transportation and processing 
>> significantly more robust.
> 
> 
> I'm not sure we can argue for better internationalization by recommending
> the use of ASCII.  Note that we are talking specifically about HTML/XHTML
> pages here.  Do you have some specific examples you could share with us?

Keeping to the 7-bit subset and using escaping to represent other
characters is the safest and most conservative approach when one is
working in a mixed character encoding environment.

One might be editing a document that's marked as using ISO 8859-15 but be
in a UTF-8 environment.  Copy and paste of the 8-bit characters from the
document into one's other documents (or the other way round) won't then
always work as one might desire.

Another example might be a Macintoh user -- the default Mac character
encoding is not compatible with Unicode, so in editing an HTML document
with Mac-native software the non-ASCII chracters are incorrectly
displayed.  HTML-specific tools are far from perfect in this area.


>> [8] 6.5
>> 
>> Suggest documenting interaction with <span> and inline bidi 
>> text; RTL and LTR marks are evil becasue non-hierarchical and 
>> stateful.
> 
> We disagree and wonder whether you are thinking about the RLE and LRE (with
> PDF) characters.
In principle one could use additional span elements in XHTML to
avoid the need for LTR and RTL marks, but maybe it's a case of
pragmatism vs elegance.  You are right, though, I was thinking of RLE
and LRE, sorry.

> We agree that the latter shouldn't be used for the reason
> you mention. Perhaps we should make the distinction clearer and point to
> http://www.w3.org/TR/unicode-xml/#Bidi .  In fact, we could even add a
> technique saying 'don't use RLE/RLO/etc'.

I think either of those would be useful.

> For information about RTL and LTR see
> http://www.w3.org/International/articles/inline-bidi-markup/#where (you
> might want to read the whole article).

That article doesn't reall make clear whether RTL (for eample) affects
only the next character, or from here to the end of the document, or
from here to the end of the containing element (which is how I read it).
But on re-reading, maybe it applies only to he single character before
it.  Thanks for the reference.  I'm not an expert in RTL/RLE, as you can
tell, so I'll accept your decision on how (or whether) to clarify!


The Hebrew text appears to be OK now.

Liam

-- 
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/

Received on Thursday, 6 May 2004 18:49:09 UTC