W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2015

Review of tracker issues for best practices

From: Phillips, Addison <addison@lab126.com>
Date: Wed, 25 Mar 2015 21:50:43 +0000
To: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <7C0AF84C6D560544A17DDDEB68A9DFB52ECD7DDE@ex10-mbx-9007.ant.amazon.com>

I have started to review all of the issues in our tracker for potential best practices ("BP") items to include into our guidelines for specification writers. The BP text here is not a fully-fleshed proposal per-se. Missing issues did not contain something actionable in my view. The last current issue number in the modern tracker is 436, so I've completed the first 10% of the survey.



Here are results so far:


BP: do not create your own lang attribute for XML formats. Use existing @lang.


BP: reference BCP 47 for language tags
BP: reference BCP 47 for language tag matching
BP: be specific about the form of language tags you expect. The word "valid" has special meaning in BCP 47. Generally "well-formed" is a better choice. 


BP: In the schema description, various items that contain human readable text are stored as attribute values. We normally recommend that you don't do this (see http://www.w3.org/TR/xml-i18n-bp/#DevAttributes) because of potential translation and annotation difficulties (eg. markup of bidi text). In several cases these attributes are the only content on empty elements.


BP: for any natural language text elements, provide a localization mechanism that allows for in-language presentation of the item
Example: A Japanese font vendor would probably want a Japanese audience to see its name in kanji, but present a Latin transcription to non-Japanese viewers. To enable this, the localised version access mechanism (ie. use of the text element) should also apply to the content of the vendor element.


BP: provide inline markup, most especially a <span> type element for flowing or paragraph-organized text.


BP: provide a direction attribute for all natural language text elements


"The automatic removal of OpenType features such as GPOS and GSUB information at any stage in the process of deploying a WOFF file is strongly discouraged. Many writing systems around the world rely on these features for very basic display of text in the script that they use."


BP: when writing specs that show examples of mixed direction text, use UPPER CASE FOR RTL and lower case for LTR in ASCII-examples. E.g. "SDROW EMOS ERA EREH


BP: do not isolate bidi on line break; use paragraph separators to reset the bidi algorithm
Comment: is this comment obsolete?


BP: Use proper U+XXXX syntax to represent Unicode code points. These are space separated when appearing in a sequence. No additional decoration is needed. Note that a code point may contain four, five, or six hexadecimal digits. When fewer than four digits are needed, the code point number is zero filled. E.g. U+0020.


Needs further investigation as to whether there is a BP here.


Needs further investigation. This one has to do with recommendations regarding character rotation in vertical formats.


BP: do not assume that the positioning of text decoration, such as underlines or bouten marks will follow your expectations from some other language (???)


Needs further investigation as to whether there is a BP here. Extensive text in this issue suggests there is one.


BP: Allow in-file encoding declarations, even when they are superfluous. For example, allow <meta charset=utf-16> even though the BOM and byte-sniffing will already have determined the encoding long before this tag is read. This allows files to be self-documenting and encourages content authors to write the encoding down in the file consistently and not just as a special case.


this became part of issue-88. HTML's resolution of this issue may not override the need for a BP related to content language (target audience) declaration in other specifications.


BP: use (allow) and prefer markup in a document format for bidirectionality to the use of document-external styling


this and other comments related to bidi isolation and bidi direction auto detection probably need to be examined for BP


BP: Provide a means to send the direction (either detected from the content or supplied by the document) with any natural language text submitted in forms to the server.


Needs further investigation as to whether there is a BP here. Has to do with bidibreak.


BP: U+2028 and U+2029 should affect UBA processing


BP: In elements where line breaks are not collapsed, e.g. <textarea> and elements with white-space:pre|pre-line|pre-wrap, line breaks should constitute UBA paragraph breaks.

BP: Line breaks in the plain text displayed by the page's scripts using functions such as Javascript's alert() and confirm() should constitute UBA paragraph breaks.

Addison Phillips
Globalization Architect (Amazon Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Received on Wednesday, 25 March 2015 21:51:09 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:02:05 UTC