Re: Two new encoding related articles for review from Martin J. Dürst on 2014-03-24 (www-international@w3.org from January to March 2014)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Mon, 24 Mar 2014 14:16:34 +0900
To: "Jungshik SHIN (신정식)" <jshin1987+w3@gmail.com>, Gunnar Bittersmann <gunnar@bittersmann.de>
CC: "www-international@w3.org" <www-international@w3.org>
Message-ID: <532FBFB2.5030900@it.aoyama.ac.jp>
What Jungshik says. In addition, even encodings such as iso-8859-* can 
be understood in terms of the ISO-2022 framework/toolbox, so 'encodings 
based on ISO-2022' is really best avoided.

Regards,    Martin.

On 2014/03/21 05:18, Jungshik SHIN (신정식) wrote:
> Documents must not use JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB
> (Windows code page 1361), encodings based on ISO-2022, or encodings based
> on EBCDIC. This is because they allow ASCII code points to represent
> non-ASCII characters, which poses a security threat.
>
> Well, JOHAB is ASCII-compatible (NOT that I would encourage anybody to use
> it. Nobody has actually used it on the web except for testing. So, it may
> not be worth mentioning it here. Whether it's mentioned it or not, nobody
> will use it). I don't know what encoding JIS X 0212-1990 is like (it's a
> coded character set that can be used in one of encodings like EUC-JP -
> ISO-2022-based definition. Well, a new definition of EUC-JP in the encoding
> standard does not allow it.).
>
> Moreover, 'encodings based on ISO-2022' include EUC-JP, EUC-KR (well, in
> the new encoding standard,  it's now synonymous with Windows-949 and it's
> not ISO-2022-based any more) as well as ISO-2022-{KR,JP,CN} etc. Obviously,
> the encodings in the former group are ASCII-compatible (with a possible
> exception of \x5C).
>
> Therefore, to be precise (pedantic ) , 'encodings based on ISO-2022' has to
> be replaced with 'ISO-2022-JP*, ISO-2022-KR, ISO-2022-CN*'.
>
> Jungshik
>
>
>
>
>
>
>
>
>
> On Thu, Mar 20, 2014 at 12:08 PM, Gunnar Bittersmann
> <gunnar@bittersmann.de>wrote:
>
>> Richard Ishida scripsit (2014-03-17 17:29+01:00):
>>
>>   http://www.w3.org/International/questions/qa-choosing-encodings-new
>>>>
>>>>
>>>> However, I don’t think that the keywords should be marked-up as <strong
>>>> class="kw">
>>>>
>>>> Stick with code elements, or use span or b. Or for the character
>>>> encodings, no markup at all, as before.
>>>>
>>>> (Don’t replace all occurences of ‘strong’ with ‘code’, there’s a
>>>> ‘strongly’ in the text.)
>>>>
>>>
>>> The idea was to make them stand out visually. I replaced strong with b.
>>>
>>
>> You were using ‘ASCII’, “UTF-8’, ‘UTF-16’ and ‘UTF-32’ with no special
>> visual emphasis throughout the upper three quarters of the article. Why
>> here?
>>
>> To my taste, it does not improve the readability of the text, quite the
>> contrary.
>>
>> If you really want to make them stand out visually: There’s still ‘UTF-8’
>> and ‘ISO-8859-8-i’ without that markup in one of these paragraphs. And in
>> other articles, such keywords are marked-up as <code class="kw"> and set in
>> normal font weight. Here it’s <b class="kw">, bold font, inconsistently.
>>
>> My proposal is: Display encoding names as normal text, no markup.
>>
>> ‘replacement’ and ‘x-user-defined’ are good candidates for that keyword
>> markup, though. But not in bold, but in normal monospaced font, i.e. use
>> the code element.
>>
>>
>>
>>   And shouldn’t this link to
>>>> http://www.w3.org/International/questions/qa-visual-vs-logical#term_
>>>> visualordering
>>>>
>>>> given that ‘logically ordered’ links to
>>>> http://www.w3.org/International/questions/qa-visual-vs-logical#term_
>>>> logicalordering
>>>>
>>>> ?
>>>>
>>> No. That's what i wanted.
>>>
>>
>> To me it’s strange that ‘logically ordered’ (marked-up as "termref")
>> points to the description of that term while ‘visual encoding’ (also
>> marked-up as "termref") does not accordingly, but points to the whole
>> article instead.
>>
>> I think the best phrase to use as link title for the whole article would
>> be ‘should also be avoided’.
>>
>> The anchor links might be out of the scope of this article; most of the
>> target audience of qa-choosing-encodings don’t have to deal with RTL
>> scripts, and Hebrew in particular. And those who do will read the entire
>> article qa-visual-vs-logical anyway.
>>
>> My proposal is: Link to that article just once, without fragment
>> identifier:
>>
>> … (Hebrew visual encoding) <a href="/International/questions/qa-visual-vs-logical">should
>> also be avoided</a>, in favour of an encoding that works with logically
>> ordered text …
>>
>>
>>
>> »»
>> that maps every octet to the Unicode code point
>> ««
>>
>> This is the only time when the term ‘octet’ is used in this article. Would
>> the term be clear to the reader? Or would it be better to use ‘byte’ in
>> this context (even though that might be less accurate)?
>>
>> Cheers,
>> Gunnar
>>
>>
>
Received on Monday, 24 March 2014 05:17:11 UTC