Re: Two new encoding related articles for review from 신정식 on 2014-03-20 (www-international@w3.org from January to March 2014)

From: 신정식 <jshin1987+w3@gmail.com>
Date: Thu, 20 Mar 2014 13:18:09 -0700
To: Gunnar Bittersmann <gunnar@bittersmann.de>
Cc: "www-international@w3.org" <www-international@w3.org>
Message-ID: <CAE1ONj9p9aaD0ihnicpCyYbNSbwA+EFPHvfthO0NnEkaqPz7XA@mail.gmail.com>
Documents must not use JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB
(Windows code page 1361), encodings based on ISO-2022, or encodings based
on EBCDIC. This is because they allow ASCII code points to represent
non-ASCII characters, which poses a security threat.

Well, JOHAB is ASCII-compatible (NOT that I would encourage anybody to use
it. Nobody has actually used it on the web except for testing. So, it may
not be worth mentioning it here. Whether it's mentioned it or not, nobody
will use it). I don't know what encoding JIS X 0212-1990 is like (it's a
coded character set that can be used in one of encodings like EUC-JP -
ISO-2022-based definition. Well, a new definition of EUC-JP in the encoding
standard does not allow it.).

Moreover, 'encodings based on ISO-2022' include EUC-JP, EUC-KR (well, in
the new encoding standard,  it's now synonymous with Windows-949 and it's
not ISO-2022-based any more) as well as ISO-2022-{KR,JP,CN} etc. Obviously,
the encodings in the former group are ASCII-compatible (with a possible
exception of \x5C).

Therefore, to be precise (pedantic ) , 'encodings based on ISO-2022' has to
be replaced with 'ISO-2022-JP*, ISO-2022-KR, ISO-2022-CN*'.

Jungshik









On Thu, Mar 20, 2014 at 12:08 PM, Gunnar Bittersmann
<gunnar@bittersmann.de>wrote:

> Richard Ishida scripsit (2014-03-17 17:29+01:00):
>
>  http://www.w3.org/International/questions/qa-choosing-encodings-new
>>>
>>>
>>> However, I don’t think that the keywords should be marked-up as <strong
>>> class="kw">
>>>
>>> Stick with code elements, or use span or b. Or for the character
>>> encodings, no markup at all, as before.
>>>
>>> (Don’t replace all occurences of ‘strong’ with ‘code’, there’s a
>>> ‘strongly’ in the text.)
>>>
>>
>> The idea was to make them stand out visually. I replaced strong with b.
>>
>
> You were using ‘ASCII’, “UTF-8’, ‘UTF-16’ and ‘UTF-32’ with no special
> visual emphasis throughout the upper three quarters of the article. Why
> here?
>
> To my taste, it does not improve the readability of the text, quite the
> contrary.
>
> If you really want to make them stand out visually: There’s still ‘UTF-8’
> and ‘ISO-8859-8-i’ without that markup in one of these paragraphs. And in
> other articles, such keywords are marked-up as <code class="kw"> and set in
> normal font weight. Here it’s <b class="kw">, bold font, inconsistently.
>
> My proposal is: Display encoding names as normal text, no markup.
>
> ‘replacement’ and ‘x-user-defined’ are good candidates for that keyword
> markup, though. But not in bold, but in normal monospaced font, i.e. use
> the code element.
>
>
>
>  And shouldn’t this link to
>>> http://www.w3.org/International/questions/qa-visual-vs-logical#term_
>>> visualordering
>>>
>>> given that ‘logically ordered’ links to
>>> http://www.w3.org/International/questions/qa-visual-vs-logical#term_
>>> logicalordering
>>>
>>> ?
>>>
>> No. That's what i wanted.
>>
>
> To me it’s strange that ‘logically ordered’ (marked-up as "termref")
> points to the description of that term while ‘visual encoding’ (also
> marked-up as "termref") does not accordingly, but points to the whole
> article instead.
>
> I think the best phrase to use as link title for the whole article would
> be ‘should also be avoided’.
>
> The anchor links might be out of the scope of this article; most of the
> target audience of qa-choosing-encodings don’t have to deal with RTL
> scripts, and Hebrew in particular. And those who do will read the entire
> article qa-visual-vs-logical anyway.
>
> My proposal is: Link to that article just once, without fragment
> identifier:
>
> … (Hebrew visual encoding) <a href="/International/questions/qa-visual-vs-logical">should
> also be avoided</a>, in favour of an encoding that works with logically
> ordered text …
>
>
>
> »»
> that maps every octet to the Unicode code point
> ««
>
> This is the only time when the term ‘octet’ is used in this article. Would
> the term be clear to the reader? Or would it be better to use ‘byte’ in
> this context (even though that might be less accurate)?
>
> Cheers,
> Gunnar
>
>
Received on Thursday, 20 March 2014 20:18:37 UTC