Re: Two new encoding related articles for review from 신정식 on 2014-03-24 (www-international@w3.org from January to March 2014)

From: 신정식 <jshin1987+w3@gmail.com>
Date: Sun, 23 Mar 2014 23:47:40 -0700
To: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Cc: Gunnar Bittersmann <gunnar@bittersmann.de>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <CAE1ONj-8-EeVn9YQ3hb+vgDXB7-Z3ZBKq4sM9PK_G_Yw=Z2NWg@mail.gmail.com>
On Sun, Mar 23, 2014 at 10:16 PM, "Martin J. Dürst"
<duerst@it.aoyama.ac.jp>wrote:

> What Jungshik says. In addition, even encodings such as iso-8859-* can be
> understood in terms of the ISO-2022 framework/toolbox, so 'encodings based
> on ISO-2022' is really best avoided.
>

Yup.  I meant to mention that, too but forgot while actually writing it.

Jungshik


> Regards,    Martin.
>
>
> On 2014/03/21 05:18, Jungshik SHIN (신정식) wrote:
>
>> Documents must not use JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB
>> (Windows code page 1361), encodings based on ISO-2022, or encodings based
>> on EBCDIC. This is because they allow ASCII code points to represent
>> non-ASCII characters, which poses a security threat.
>>
>> Well, JOHAB is ASCII-compatible (NOT that I would encourage anybody to use
>> it. Nobody has actually used it on the web except for testing. So, it may
>> not be worth mentioning it here. Whether it's mentioned it or not, nobody
>> will use it). I don't know what encoding JIS X 0212-1990 is like (it's a
>> coded character set that can be used in one of encodings like EUC-JP -
>> ISO-2022-based definition. Well, a new definition of EUC-JP in the
>> encoding
>> standard does not allow it.).
>>
>> Moreover, 'encodings based on ISO-2022' include EUC-JP, EUC-KR (well, in
>> the new encoding standard,  it's now synonymous with Windows-949 and it's
>> not ISO-2022-based any more) as well as ISO-2022-{KR,JP,CN} etc.
>> Obviously,
>> the encodings in the former group are ASCII-compatible (with a possible
>> exception of \x5C).
>>
>> Therefore, to be precise (pedantic ) , 'encodings based on ISO-2022' has
>> to
>> be replaced with 'ISO-2022-JP*, ISO-2022-KR, ISO-2022-CN*'.
>>
>> Jungshik
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 20, 2014 at 12:08 PM, Gunnar Bittersmann
>> <gunnar@bittersmann.de>wrote:
>>
>>  Richard Ishida scripsit (2014-03-17 17:29+01:00):
>>>
>>>   http://www.w3.org/International/questions/qa-choosing-encodings-new
>>>
>>>>
>>>>>
>>>>> However, I don’t think that the keywords should be marked-up as <strong
>>>>> class="kw">
>>>>>
>>>>> Stick with code elements, or use span or b. Or for the character
>>>>> encodings, no markup at all, as before.
>>>>>
>>>>> (Don’t replace all occurences of ‘strong’ with ‘code’, there’s a
>>>>> ‘strongly’ in the text.)
>>>>>
>>>>>
>>>> The idea was to make them stand out visually. I replaced strong with b.
>>>>
>>>>
>>> You were using ‘ASCII’, “UTF-8’, ‘UTF-16’ and ‘UTF-32’ with no special
>>> visual emphasis throughout the upper three quarters of the article. Why
>>> here?
>>>
>>> To my taste, it does not improve the readability of the text, quite the
>>> contrary.
>>>
>>> If you really want to make them stand out visually: There’s still ‘UTF-8’
>>> and ‘ISO-8859-8-i’ without that markup in one of these paragraphs. And in
>>> other articles, such keywords are marked-up as <code class="kw"> and set
>>> in
>>> normal font weight. Here it’s <b class="kw">, bold font, inconsistently.
>>>
>>> My proposal is: Display encoding names as normal text, no markup.
>>>
>>> ‘replacement’ and ‘x-user-defined’ are good candidates for that keyword
>>> markup, though. But not in bold, but in normal monospaced font, i.e. use
>>> the code element.
>>>
>>>
>>>
>>>   And shouldn’t this link to
>>>
>>>> http://www.w3.org/International/questions/qa-visual-vs-logical#term_
>>>>> visualordering
>>>>>
>>>>> given that ‘logically ordered’ links to
>>>>> http://www.w3.org/International/questions/qa-visual-vs-logical#term_
>>>>> logicalordering
>>>>>
>>>>> ?
>>>>>
>>>>>  No. That's what i wanted.
>>>>
>>>>
>>> To me it’s strange that ‘logically ordered’ (marked-up as "termref")
>>> points to the description of that term while ‘visual encoding’ (also
>>> marked-up as "termref") does not accordingly, but points to the whole
>>> article instead.
>>>
>>> I think the best phrase to use as link title for the whole article would
>>> be ‘should also be avoided’.
>>>
>>> The anchor links might be out of the scope of this article; most of the
>>> target audience of qa-choosing-encodings don’t have to deal with RTL
>>> scripts, and Hebrew in particular. And those who do will read the entire
>>> article qa-visual-vs-logical anyway.
>>>
>>> My proposal is: Link to that article just once, without fragment
>>> identifier:
>>>
>>> … (Hebrew visual encoding) <a href="/International/
>>> questions/qa-visual-vs-logical">should
>>> also be avoided</a>, in favour of an encoding that works with logically
>>> ordered text …
>>>
>>>
>>>
>>> »»
>>> that maps every octet to the Unicode code point
>>> ««
>>>
>>> This is the only time when the term ‘octet’ is used in this article.
>>> Would
>>> the term be clear to the reader? Or would it be better to use ‘byte’ in
>>> this context (even though that might be less accurate)?
>>>
>>> Cheers,
>>> Gunnar
>>>
>>>
>>>
>>
Received on Monday, 24 March 2014 06:48:07 UTC