- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Thu, 21 Jul 2011 18:41:45 +0900
- To: Richard Ishida <ishida@w3.org>
- CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Hello Richard, On 2011/07/20 23:55, Richard Ishida wrote: > 8.2.2.2 Character encodings > http://www.w3.org/TR/html5/parsing.html#character-encodings-0 > > "When a user agent is to use the UTF-16 encoding but no BOM has been > found, user agents must default to UTF-16LE." > > If the HTTP header declares the file to be UTF-16BE, which I believe it > can, and in which case a BOM should *not* be used, then I think that > this would not be true. This strictly depends on what "the UTF-16 encoding" means in the sentence you cite. If it means "the encoding labeled as 'UTF-16'", then this doesn't include encodings labeled UTF-16BE, and therefore there is no problem. If "the UTF-16 encoding" means "any encoding that works like UTF-16, independent of the label and other details", then you are right. My impression from reading "8.2.2.2 Character encodings" is that it's talking about the encoding labeled "UTF-16", but it might be helpful to check and/or clarify. UTF-16 is a very special case (UTF-32 has similar issues, but is much less important in practice, in particular across the network), because it's easy to mix up UTF-16 the general encoding method used for Unicode with code units of 16 bits and 'UTF-16' the character encoding (charset) label. (Also, in implementations, it's sometimes important to be able to separately set "BOM/noBOM", "LE/BE", and the actual label, which is difficult if a converter or output routine only takes a 'charset' label as a parameter.) > If the HTTP header declares the file to be > UTF-16, then there must be a BOM, so I assume that this is a recovery > mechanism if someone does declare UTF-16 in HTTP but omits the BOM. I'd > think that some kind of error message would be in order though. You want an error message like "missing BOM on UTF-16 page"? That's good for a validator, but not for a browser. > 4.6.7 The q element > http://www.w3.org/TR/html5/text-level-semantics.html#the-q-element > > The default stylesheet of browsers should render quotes differently > according to the language of the text. It would be helpful to point this > out in this section. It would also be helpful to clarify that the > default stylesheet rendering can be overridden by a user stylesheet. It > would be nice to have an example that illustrated this. > > It would also be useful to provide a few ready-made examples in section > http://www.w3.org/TR/html5/rendering.html#punctuation-and-decorations, > including styles for quotes within quotes, which are also done > differently in non-English text. > > See http://www.w3.org/TR/CSS2/generate.html#quotes-specify for the CSS > quotes property, which would be more appropriate for the rendering section. > > [I need to consider this last comment more carefully after reading the > relevant CSS info. I'm leaving here just to remind me to do that.] The story of <q> is really interesting. I think Francois was the one proposing it, or at least the one proposing the language-dependent quotes thing. Semantically, this was the right thing, but for about 15 years, implementations were hopelessly behind, to the extent that I thought we'd have to give up on the quotes (adding them in the text wouldn't be that big of a problem; people are used to adding ./;/:/!/?/...). Apparently, lately browsers have finally caught up, and it looks like this is going to work out. As for the default stylesheet, it would be great to have lots of languages specified, but it'll be a lot of work, and no end. In any case, please make sure that the quotes are added based on the language outside of the quotation, not the language of the quotation itself. As an example, <p lang='fr'>Il dit <q lang='en'>Hello everybody!</q>.</p> should be rendered something like Il dit «Hello everybody!». and <p lang='en'>He said <q lang='fr'>Bonjour tout le monde!</q>.</p> should be rendered something like He said "Bonjour tout le monde!". But if you look at http://en.wikipedia.org/wiki/Non-English_usage_of_quotation_marks, you see that's only the start, there's issues with spacing, with multi-line quotes, and so on. Regards, Martin.
Received on Thursday, 21 July 2011 09:43:07 UTC