RE: Rephrasing of Document Character Set article from Martin Duerst on 2008-06-10 (public-i18n-core@w3.org from April to June 2008)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Tue, 10 Jun 2008 13:03:04 +0900
To: "Phillips, Addison" <addison@amazon.com>, Richard Ishida <ishida@w3.org>
Cc: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-Id: <6.0.0.20.2.20080610125339.0586c7c0@localhost>

I think that improving this sentence is good. However,
I'm not completely happy with the proposals below.

At 03:31 08/06/10, Phillips, Addison wrote:
>I think we do need to rephrase it. The first sentence, while entirely 
>correct, is nearly impenetrable. The following clarification should, thus, 
>be a lot clearer.
>
>I would suggest:
>
>--
>This means that XML or HTML documents are always processed as a sequence of 
>characters from the Unicode character set.
>--

This may not always be true. It is perfectly fine to have an
XML parser that works in US-ASCII for US-ASCII documents, and
so on. It may not be a good idea in terms of implementation,
but it wouldn't be against the XML Rec.



>> -----Original Message-----
>> From: public-i18n-core-request@w3.org [mailto:public-i18n-core-
>> request@w3.org] On Behalf Of Richard Ishida
>> Sent: Monday, June 09, 2008 7:21 AM
>> To: 'Martin Duerst'
>> Cc: public-i18n-core@w3.org
>> Subject: Rephrasing of Document Character Set article
>>
>>
>> Hi Martin, others,
>>
>> In our article FAQ: Document character set
>> (http://www.w3.org/International/questions/qa-doc-charset) we have
>> a key sentence that says:
>>
>> "This means that the logical model describing how XML and HTML are
>> processed is described in terms of the Unicode character set."

The double use of 'describe' is definitely a problem.


>> A German translator just suggested that we work on making that
>> clearer, and I tend to agree that it might help.  I thought of a
>> minimal rewording as follows:
>>
>> "This means that when XML and HTML are processed, they use a model
>> that describes characters in terms of the Unicode character set."

'they' is unclear. Grammatically, it refers to XML and HTML, but
it's not XML and HTML that use the model, but the processing.


Some ideas:

"This means that the logical model of how XML and HTML are
processed is described in terms of the Unicode character set."

"This means that the logical model of how XML and HTML are
processed is based on the Unicode character set."

"This means that the logical model of how XML and HTML are
processed uses the Unicode character set."

"This means that the logical model of how XML and HTML are
processed makes use of the Unicode character set."

"This means that when XML and HTML are processed, this is done
using a model based on the Unicode character set."

"This means that XML or HTML documents are always processed as,
or as if they were, a sequence of characters from the Unicode character set."

"This means that XML or HTML documents are logically processed
as a sequence of characters from the Unicode character set."


Hope this helps.     Regards,    Martin.

>> I'd welcome any further suggestions for improvement.
>>
>> Cheers,
>> RI
>>
>> ============
>> Richard Ishida
>> Internationalization Lead
>> W3C (World Wide Web Consortium)
>>
>> http://www.w3.org/International/
>> http://rishida.net/blog/
>> http://rishida.net/
>>
>>
>>
>>


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp

Received on Tuesday, 10 June 2008 06:33:35 UTC