Re: [www-font] WOFF metadata - should we require (rather than recommend) the use of UTF-8?

Martin J. Dürst wrote:
> Just for the record, some comments below.
> 
> On 2011/06/01 13:13, Robert O'Callahan wrote:
>> On Wed, Jun 1, 2011 at 4:09 PM, Robert 
>> O'Callahan<robert@ocallahan.org>wrote:
>>
>>> On Wed, Jun 1, 2011 at 4:03 PM,<mpsuzuki@hiroshima-u.ac.jp>  wrote:
>>>
>>>> "Anything" is too broad to understand... Excuse me,
>>>> could you give me a concrete example of the system
>>>> or usecase that consumes WOFF but has some difficulty
>>>> to handle an XML in UTF-16?
> 
> The requirement to support UTF-16 in XML in addition to UTF-8 was added 
> because there was a concern that otherwise, e.g. Japanese data might 
> expand considerably. That concern turned out to be mostly non-justified, 
> because in the arbitrary XML, there is a fair percentage of ASCII 
> characters. The compatibility with US-ASCII has led to UTF-8 being way, 
> way more popular on the Web than UTF-16. Various XML applications as 
> well as non-XML formats have switched to UTF-8 only. The reduction from 
> two encodings to one is very significant for interoperability, to the 
> extent that in the networking/protocol area, there is a saying 
> "zero-one-many".
> 
>>> Jonathan already gave an example in his first message.
>>
>>
>> Hmm, maybe that example wasn't clear enough.
>>
>> What Jonathan is actually doing is creating a Javascript API that 
>> returns a
>> string containing the WOFF metadata. So that code isn't going to be 
>> parsing
>> the XML, but it does need to know the encoding so the text can be 
>> correctly
>> converted to a Javascript string.
> 
> That is a good example, with one twist: Strings in Javascript happen to 
> be UTF-16. But that's all under the hood, nothing to worry about.
> 
> 
>> Any consumer that needs to convert the WOFF metadata to some kind of 
>> string
>> (and isn't immediately parsing the XML) will have the same problem.
> 
> Yes. Saying UTF-8 and only UTF-8 is a good solution.

Umm, I'm not discussing which is better; UTF-8 or UTF-16.
The point I was asking was; WOFF-specific restriction for
the XML in WOFF is really needed.

My current understanding is that XML in WOFF is not only
for XML parser, and some frameworks for web related technology
do not support UTF-16 as text encoding, so excluding UTF-16
is safer. (JavaScript is it or not?)

"writing XML document for the system without XML parser, or human"
is slightly confusing task, but, I think WOFF is primarily designed
for web technology, not for pure XML, so the extra restriction
from non-XML issue would be acceptable.

Regards,
mpsuzuki

Received on Wednesday, 1 June 2011 06:24:14 UTC