W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2015

Review of tracker issues for best practices (Part IV)

From: Phillips, Addison <addison@lab126.com>
Date: Tue, 31 Mar 2015 16:13:47 +0000
To: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <7C0AF84C6D560544A17DDDEB68A9DFB52ECF717B@ex10-mbx-9007.ant.amazon.com>
Continuing from where I left off...


BP: When matching strings (such as attribute values), use case-sensitive matching
BP: When matching strings case-insensitively, use canonical caseless matching, not compatibility caseless matching


BP: When processing query strings in URIs and applying the character encoding of the host document, supply a health warning about the handling of characters not in that character encoding.
// probably more BPs here related to encoding in query


BP: when specifying the handling of unencodable sequences, specify that the character encoding's replacement character should be used. For example, in Unicode this is U+FFFD. In Latin-1 this is ? (0x3F). In some other encodings is may be another character.


BP: When referring to character encodings, do not use the term "encoding", as this can be confused with other types of encoding, such as transfer encoding


// this issue is about the date/time format used in html lastModified being MM/DD/YYYY hh:mm:ss


BP: APIs should provide a way to obtain the direction of element, attribute, or text values


BP: when the language of a text value is set to the empty string, this should be taken to mean that the language of the value is explicitly unknown (??)
BP: when the language of a text value is explicitly unknown, implementations should be permitted to apply appropriate processing


BP: when describing bidirectional processing, an informative or normative reference to UAX9 (UBA) should be provided


BP: for markup grammars that provide their own directionality, Unicode bidirectional controls should be restricted so that any embedding or overrides generated by these characters do not start and end with different parent elements, and so that all such embeddings and overrides are treated as if the character U+202C were inserted at that point.


BP: when a markup grammar includes bidirectional markup, the document's bidirectional markup (such as HTML's @dir) should be normatively preferred to Unicode bidi control characters and the need for document authors to manually control directionality.


BP: for document formats that allow for a document wide language declaration, there should be a health warning that this information should be supplied. (??)
BP: ditto for direction


// do not override Content-Language ?


BP: when defining a document format that can use legacy character encodings, a file-internal character encoding declaration must be supplied
BP: define UTF-8 as the preferred encoding for any document format
BP: for new document formats, define UTF-8 as the only accepted character encoding


// <pre> handling for newlines for Unicode bidi and references to the CSS. Needs more investigation


BP: when defining list styles, refer explicitly to CSS and to our WG note
// note: should we reexamine the WONTFIX on this issue? Potentially some text should be included even if the attribute values don't change


BP: the internal storage format for numeric values should be locale-neutral even if the display or input of the value is not.
BP: numeric values should be formatted for display and accepted for input from users in a locale-sensitive manner
BP: the locale used for display/formatting of numbers should be controllable by the content author using language attributes


// multilingual <q> nesting again, someone should spend time to extract what we've learned


// complex ruby support

Addison Phillips
Globalization Architect (Amazon Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Received on Tuesday, 31 March 2015 16:14:26 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:02:05 UTC