- From: r12a via GitHub <sysbot+gh@w3.org>
- Date: Mon, 08 May 2017 12:07:03 +0000
- To: public-i18n-archive@w3.org
@chaals how's this? ## Language ### Language basics 1. [ ] <a class="self" href="#lang_basics_1"></a>It should be possible to associate a language with any piece of natural language text that will be read by a user. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#sec_lang_decl">more</a> 1. [ ] <a class="self" href="#lang_basics_inline"></a>Where possible, there should be a way to label natural language changes in inline text. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#sec_lang_decl">more</a> 1. [ ] <a class="self" href="#lang_basics_meta"></a>Consider whether it is useful to express the intended linguistic audience of a resource, in addition to specifying the language used for text processing. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#sec_lang_decl">more</a> 1. [ ] <a class="self" href="#tp_lang_values"></a>A language declaration that indicates the text-processing language for a range of text must associate a single language value with a specific range of text. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#sec_text_processing_lang">more</a> 1. [ ] <a class="self" href="#lang_attribute_xml"></a>Use the HTML <code class="kw" translate="no">lang</code> and XML <code class="kw" translate="no">xml:lang</code> language attributes where appropriate, rather than creating a new attribute or mechanism. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#lang_attribute_xml">more</a> 1. [ ] <a class="self" href="#metadata_lang_values"></a>A metadata-type language declaration that indicates the intended use of the resource, rather than the language of a specific range of text, may be associated with multiple language values. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#sec_lang_meta">more</a> ### Defining language values 1. [ ] <a class="self" href="#lang_use_bcp47"></a>Values for language declarations must use BCP 47. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#sec_lang_values">more</a> 1. [ ] <a class="self" href="#lang_bcp_not_rfc"></a>Refer to BCP 47, not to RFC 5646. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#sec_lang_values">more</a> 1. [ ] <a class="self" href="#lang_values_valid"></a>Be specific about what level of conformance you expect for language tags. The word "valid" has special meaning in BCP 47. Generally "well-formed" is a better choice. 1. [ ] <a class="self" href="#lang_matching_bcp"></a>Reference BCP47 for language tag matching. ### Declaring language at the resource level 1. [ ] <a class="self" href="#lang_whole_res"></a>The specification should indicate how to define the default text-processing language for the resource as a whole. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#lang_whole_res">more</a> 1. [ ] <a class="self" href="#lang_inherit"></a>Content within the resource should inherit the language of the text-processing declared at the resource level, unless it is specifically overridden. 1. [ ] <a class="self" href="#lang_tp_meta"></a>Consider whether it is necessary to have separate declarations to indicate the text-processing language versus metadata about the expected use of the resource. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#lang_tp_meta">more</a> 1. [ ] <a class="self" href="#lang_mixing"></a>If there is only one language declaration for a resource, and it has more than one language tag as a value, it must be possible to identify the default text-processing language for the resource. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#lang_tp_meta">more</a> ### Establishing the language of a content block 1. [ ] <a class="self" href="#lang_block_inherit"></a>By default, blocks of content should inherit any text-processing language set for the resource as a whole. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#lang_block_inherit">more</a> 1. [ ] <a class="self" href="#lang_block_change"></a>It should be possible to indicate a change in language for blocks of content where the language changes. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#lang_block_change">more</a> ### Establishing the language of inline runs 1. [ ] <a class="self" href="#lang_inline_spans"></a>It should be possible to indicate language for spans of inline text where the language changes. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#lang_inline_spans">more</a> ## Text direction ### Basic requirements 1. [ ] <a class="self" href="#dir_paragraphs"></a>It must be possible to indicate base direction for each individual paragraph-level item of natural language text that will be read by someone. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#dir_paragraphs">more</a> 1. [ ] <a class="self" href="#dir_inline"></a>It must be possible to indicate base direction changes for embedded runs of inline bidirectional text for all natural language text that will be read by someone. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#dir_inline">more</a> 1. [ ] <a class="self" href="#dir_inline"></a>Annotating right-to-left text must require the minimum amount of effort for people who work natively with right-to-left scripts. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#dir_reasonable">more</a> ### Background information 1. [ ] <a class="self" href="#bidi_lang"></a>Do not assume that direction can be determined from language information. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#bidi_lang">more</a> ### Handling direction in markup 1. [ ] <a class="self" href="#bidi_whole_res"></a>The spec should indicate how to define a default base direction for the resource as a whole, ie. set the overall base direction. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#bidi_whole_res">more</a> 1. [ ] <a class="self" href="#bidi_res_default"></a>The default base direction, in the absence of other information, should be LTR. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#bidi_res_default">more</a> 1. [ ] <a class="self" href="#bidi_values"></a>Values for the default base direction should include left-to-right, right-to-left, and auto. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#bidi_values">more</a> 1. [ ] <a class="self" href="#bidi_block_change"></a>The content author must be able to indicate parts of the text where the base direction changes. At the block level, this should be achieved using attributes or metadata, and should not rely on Unicode control characters. 1. [ ] <a class="self" href="#bidi_block_auto"></a>It must be possible to also set the direction for content fragments to <code class="kw" translate="no">auto</code>. This means that the base direction will be determined by examining the content itself. 1. [ ] <a class="self" href="#bidi_block_para"></a>If the overall base direction is set to <code class="kw" translate="no">auto</code> for plain text, the direction of content paragraphs should be determined on a paragraph by paragraph basis. 1. [ ] <a class="self" href="#bidi_block_befaft"></a>To indicate the sides of a block of text where relative to the start and end of its contained lines, you should use 'before' and 'after' (maybe block-start/block-end – the terminology is changing), rather than 'top' and 'bottom'. 1. [ ] <a class="self" href="#bidi_inline_start_end"></a>To indicate the start/end of a line you should use 'start' and 'end' rather than 'left' and 'right'. 1. [ ] <a class="self" href="#bidi_dedicated_attr"></a>Provide dedicated attributes for control of base direction and bidirectional overrides; do not rely on the user applying style properties to arbitrary markup to achieve bidi control. 1. [ ] <a class="self" href="#bidi_inline_change"></a>It must be possible to indicate spans of inline text where the base direction changes. If markup is available, this is the preferred method. Otherwise your specification must require that Unicode control characters are recognized by the receiving application, and correctly implemented. 1. [ ] <a class="self" href="#bidi_inline_auto"></a>It must be possible to also set the direction for a span to auto. This means that the base direction will be determined by examining the content itself. A typical approach here would be to set the direction based on the first strong directional character outside of any markup. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#bidi_inline_auto">more</a> 1. [ ] <a class="self" href="#bidi_inline_rli"></a>If users use Unicode bidirectional control characters, the RLI/LRI/FSI with PDI characters must be supported by the application and recommended (rather than RLE/LRE with PDF) by the spec. 1. [ ] <a class="self" href="#bidi_inline_rlm"></a>Use of RLM/LRM should be appropriate, and expectations of what those controls can and cannot do should be clear in the spec. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#bidi_inline_rlm">more</a> 1. [ ] <a class="self" href="#bidi_inline_dedicated_attr"></a>Provide dedicated attributes for control of base direction and bidirectional overrides; do not rely on the user applying style properties to arbitrary markup to achieve bidi control. 1. [ ] <a class="self" href="#bidi_inline_all_elems"></a>Allow bidi attributes on all inline elements in markup that contain text. 1. [ ] <a class="self" href="#bidi_inline_embed"></a>Provide attributes that allow the user to (a) create an embedded base direction or (b) override the bidirectional algorithm altogether; the attribute should allow the user to set the direction to LTR or RTL in either of these two scenarios. ## Characters ### Choosing a definition of 'character' 1. [ ] <a class="self" href="#char_sounds"></a>Specifications, software and content <em class="rfc2119" title="MUST NOT">MUST NOT</em> require or depend on a one-to-one correspondence between characters and the sounds of a language. <a href="https://www.w3.org/TR/charmod/#C001">more</a> 1. [ ] <a class="self" href="#char_display"></a>Specifications, software and content <em class="rfc2119" title="MUST NOT">MUST NOT</em> require or depend on a one-to-one mapping between characters and units of displayed text. <a href="https://www.w3.org/TR/charmod/#C002">more</a> 1. [ ] <a class="self" href="#char_logical"></a>Protocols, data formats and APIs <em class="rfc2119" title="MUST">MUST</em> store, interchange or process text data in logical order. <a href="https://www.w3.org/TR/charmod/#C003">more</a> 1. [ ] <a class="self" href="#char_logical_storage"></a>Independent of whether some implementation uses logical selection or visual selection, characters selected <em class="rfc2119" title="MUST">MUST</em> be kept in logical order in storage. <a href="https://www.w3.org/TR/charmod/#C075">more</a> 1. [ ] <a class="self" href="#char_logical_discontiguous"></a>Specifications of protocols and APIs that involve selection of ranges <em class="rfc2119" title="SHOULD">SHOULD</em> provide for discontiguous logical selections, at least to the extent necessary to support implementation of visual selection on screen on top of those protocols and APIs. <a href="https://www.w3.org/TR/charmod/#C004">more</a> 1. [ ] <a class="self" href="#char_keystroke"></a>Specifications and software <em class="rfc2119" title="MUST NOT">MUST NOT</em> require nor depend on a single keystroke resulting in a single character, nor that a single character be input with a single keystroke (even with modifiers), nor that keyboards are the same all over the world. <a href="https://www.w3.org/TR/charmod/#C005">more</a> 1. [ ] <a class="self" href="#char_physical_storage"></a>Specifications, software and content <em class="rfc2119" title="MUST NOT">MUST NOT</em> require or depend on a one-to-one relationship between characters and units of physical storage. <a href="https://www.w3.org/TR/charmod/#C009">more</a> 1. [ ] <a class="self" href="#char_define"></a>When specifications use the term 'character' the specifications <em class="rfc2119" title="MUST">MUST</em> define which meaning they intend. <a href="https://www.w3.org/TR/charmod/#C010">more</a> 1. [ ] <a class="self" href="#char_specific"></a>Specifications <em class="rfc2119" title="SHOULD">SHOULD</em> use specific terms, when available, instead of the general term 'character'. <a href="https://www.w3.org/TR/charmod/#C067">more</a> ### Defining a Reference Processing Model 1. [ ] <a class="self" href="#char_single_enc"></a>Textual data objects defined by protocol or format specifications <em class="rfc2119" title="MUST">MUST</em> be in a single character encoding. <a href="https://www.w3.org/TR/charmod/#C013">more</a> 1. [ ] <a class="self" href="#char_rpm"></a>All specifications that involve processing of text <em class="rfc2119" title="MUST">MUST</em> specify the processing of text according to the Reference Processing Model described by the rest of the recommendations in this list. <a href="https://www.w3.org/TR/charmod/#C014">more</a> 1. [ ] <a class="self" href="#char_unicode_chars"></a>Specifications <em class="rfc2119" title="MUST">MUST</em> define text in terms of Unicode characters, not bytes or glyphs. <a href="https://www.w3.org/TR/charmod/#C014">more</a> 1. [ ] <a class="self" href="#char_transcode"></a>For their textual data objects specifications <em class="rfc2119" title="MAY">MAY</em> allow use of any character encoding which can be transcoded to a Unicode encoding form. <a href="https://www.w3.org/TR/charmod/#C014">more</a> 1. [ ] <a class="self" href="#char_as_unicode"></a>Specifications <em class="rfc2119" title="MAY">MAY</em> choose to disallow or deprecate some character encodings and to make others mandatory. Independent of the actual character encoding, the specified behavior <em class="rfc2119" title="MUST">MUST</em> be the same as if the processing happened as follows: (a) The character encoding of any textual data object received by the application implementing the specification <em class="rfc2119" title="MUST">MUST</em> be determined and the data object <em class="rfc2119" title="MUST">MUST</em> be interpreted as a sequence of Unicode characters - this <em class="rfc2119" title="MUST">MUST</em> be equivalent to transcoding the data object to some Unicode encoding form, adjusting any character encoding label if necessary, and receiving it in that Unicode encoding form, (b) All processing <em class="rfc2119" title="MUST">MUST</em> take place on this sequence of Unicode characters, (c) If text is output by the application, the sequence of Unicode characters <em class="rfc2119" title="MUST">MUST</em> be encoded using a character encoding chosen among those allowed by the specification. <a href="https://www.w3.org/TR/charmod/#C014">more</a> 1. [ ] <a class="self" href="#char_different_enc"></a>If a specification is such that multiple textual data objects are involved (such as an XML document referring to external parsed entities), it <em class="rfc2119" title="MAY">MAY</em> choose to allow these data objects to be in different character encodings. In all cases, the Reference Processing Model <em class="rfc2119" title="MUST">MUST</em> be applied to all textual data objects. <a href="https://www.w3.org/TR/charmod/#C014">more</a> ### Including and excluding character ranges 1. [ ] <a class="self" href="#char_exclude"></a>Specifications <em class="rfc2119" title="SHOULD NOT">SHOULD NOT</em> arbitrarily exclude code points from the full range of Unicode code points from U+0000 to U+10FFFF inclusive. <a href="https://www.w3.org/TR/charmod/#C070">more</a> 1. [ ] <a class="self" href="#char_10ffff"></a>Specifications <em class="rfc2119" title="MUST NOT">MUST NOT</em> allow code points above U+10FFFF. <a href="https://www.w3.org/TR/charmod/#C077">more</a> 1. [ ] <a class="self" href="#char_internal_use"></a>Specifications <em class="rfc2119" title="SHOULD NOT">SHOULD NOT</em> allow the use of codepoints reserved by Unicode for internal use. <a href="https://www.w3.org/TR/charmod/#C079">more</a> 1. [ ] <a class="self" href="#char_surrogate"></a>Specifications <em class="rfc2119" title="MUST NOT">MUST NOT</em> allow the use of surrogate code points. <a href="https://www.w3.org/TR/charmod/#C078">more</a> 1. [ ] <a class="self" href="#char_compatibility"></a>Specifications <em class="rfc2119" title="SHOULD">SHOULD</em> exclude compatibility characters in the syntactic elements (markup, delimiters, identifiers) of the formats they define. <a href="https://www.w3.org/TR/charmod/#C050">more</a> ### Using the Private Use Area 1. [ ] <a class="self" href="#char_not_pua"></a>Specifications <em class="rfc2119" title="MUST NOT">MUST NOT</em> require the use of private use area characters with particular assignments. <a href="https://www.w3.org/TR/charmod/#C038">more</a> 1. [ ] <a class="self" href="#char_pua_mechanisms"></a>Specifications <em class="rfc2119" title="MUST NOT">MUST NOT</em> require the use of mechanisms for defining agreements of private use code points. <a href="https://www.w3.org/TR/charmod/#C039">more</a> 1. [ ] <a class="self" href="#char_pua_allow"></a>Specifications and implementations <em class="rfc2119" title="SHOULD NOT">SHOULD NOT</em> disallow the use of private use code points by private agreement. <a href="https://www.w3.org/TR/charmod/#C040">more</a> 1. [ ] <a class="self" href="#char_symbols"></a>Specifications <em class="rfc2119" title="MAY">MAY</em> define markup to allow the transmission of symbols not in Unicode or to identify specific variants of Unicode characters. <a href="https://www.w3.org/TR/charmod/#C041">more</a> 1. [ ] <a class="self" href="#char_pictures"></a>Specifications <em class="rfc2119" title="SHOULD">SHOULD</em> allow the inclusion of or reference to pictures and graphics where appropriate, to eliminate the need to (mis)use character-oriented mechanisms for pictures or graphics. <a href="https://www.w3.org/TR/charmod/#C068">more</a> ### Choosing character encodings 1. [ ] <a class="self" href="#char_identification"></a>Specifications <em class="rfc2119" title="MUST">MUST</em> either specify a unique character encoding, or provide character encoding identification mechanisms such that the encoding of text can be reliably identified. <a href="https://www.w3.org/TR/charmod/#C015">more</a> 1. [ ] <a class="self" href="#char_unique_for_new"></a>When designing a new protocol, format or API, specifications <em class="rfc2119" title="SHOULD">SHOULD</em> require a unique character encoding. <a href="https://www.w3.org/TR/charmod/#C016">more</a> 1. [ ] <a class="self" href="#char_enc_rules"></a>When basing a protocol, format, or API on a protocol, format, or API that already has rules for character encoding, specifications <em class="rfc2119" title="SHOULD">SHOULD</em> use rather than change these rules. <a href="https://www.w3.org/TR/charmod/#C017">more</a> 1. [ ] <a class="self" href="#char_use_utf8"></a>When a unique character encoding is required, the character encoding <em class="rfc2119" title="MUST">MUST</em> be UTF-8, UTF-16 or UTF-32. <a href="https://www.w3.org/TR/charmod/#C018">more</a> 1. [ ] <a class="self" href="#char_charset"></a>Specifications <em class="rfc2119" title="SHOULD">SHOULD</em> avoid using the terms 'character set' and 'charset' to refer to a character encoding, except when the latter is used to refer to the MIME charset parameter or its IANA-registered values. The term 'character encoding', or in specific cases the terms 'character encoding form' or 'character encoding scheme', are <em class="rfc2119" title="RECOMMENDED">RECOMMENDED</em>. <a href="https://www.w3.org/TR/charmod/#C020">more</a> 1. [ ] <a class="self" href="#char_iana"></a>If the unique encoding approach is not taken, specifications <em class="rfc2119" title="SHOULD">SHOULD</em> require the use of the IANA charset registry names, and in particular the names identified in the registry as 'MIME preferred names', to designate character encodings in protocols, data formats and APIs. <a href="https://www.w3.org/TR/charmod/#C021">more</a> 1. [ ] <a class="self" href="#char_non_iana"></a>Character encodings that are not in the IANA registry <em class="rfc2119" title="SHOULD NOT">SHOULD NOT</em> be used, except by private agreement. <a href="https://www.w3.org/TR/charmod/#C022">more</a> 1. [ ] <a class="self" href="#char_x"></a>If an unregistered character encoding is used, the convention of using 'x-' at the beginning of the name <em class="rfc2119" title="MUST">MUST</em> be followed. <a href="https://www.w3.org/TR/charmod/#C023">more</a> 1. [ ] <a class="self" href="#char_not_unique"></a>If the unique encoding approach is not chosen, specifications <em class="rfc2119" title="MUST">MUST</em> designate at least one of the UTF-8 and UTF-16 encoding forms of Unicode as admissible character encodings and <em class="rfc2119" title="SHOULD">SHOULD</em> choose at least one of UTF-8 or UTF-16 as required encoding forms (encoding forms that <em class="rfc2119" title="MUST">MUST</em> be supported by implementations of the specification). <a href="https://www.w3.org/TR/charmod/#C026">more</a> 1. [ ] <a class="self" href="#char_default"></a>Specifications that require a default encoding <em class="rfc2119" title="MUST">MUST</em> define either UTF-8 or UTF-16 as the default, or both if they define suitable means of distinguishing them. <a href="https://www.w3.org/TR/charmod/#C027">more</a> ### Identifying character encodings 1. [ ] <a class="self" href="#char_heuristics"></a>Specifications <em class="rfc2119" title="MUST NOT">MUST NOT</em> propose the use of heuristics to determine the encoding of data. <a href="https://www.w3.org/TR/charmod/#C028">more</a> 1. [ ] <a class="self" href="#char_conflict"></a>Specifications <em class="rfc2119" title="MUST">MUST</em> define conflict-resolution mechanisms (e.g. priorities) for cases where there is multiple or conflicting information about character encoding. <a href="https://www.w3.org/TR/charmod/#C035">more</a> ### Designing character escapes 1. [ ] <a class="self" href="#char_escaping"></a>Specifications should provide a mechanism for escaping characters, particularly those which are invisible or ambiguous. <a href="https://w3c.github.io/bp-i18n-specdev/#char_heuristics">more</a> 1. [ ] <a class="self" href="#char_esc_new"></a>Specifications <em class="rfc2119" title="SHOULD NOT">SHOULD NOT</em> invent a new escaping mechanism if an appropriate one already exists. <a href="https://www.w3.org/TR/charmod/#C042">more</a> 1. [ ] <a class="self" href="#char_esc_alternates"></a>The number of different ways to escape a character <em class="rfc2119" title="SHOULD">SHOULD</em> be minimized (ideally to one). <a href="https://www.w3.org/TR/charmod/#C043">more</a> 1. [ ] <a class="self" href="#char_esc_end"></a>Escape syntax <em class="rfc2119" title="SHOULD">SHOULD</em> require either explicit end delimiters or a fixed number of characters in each character escape. Escape syntaxes where the end is determined by any character outside the set of characters admissible in the character escape itself <em class="rfc2119" title="SHOULD">SHOULD</em> be avoided. <a href="https://www.w3.org/TR/charmod/#C044">more</a> 1. [ ] <a class="self" href="#char_esc_hex"></a>Whenever specifications define character escapes that allow the representation of characters using a number, the number <em class="rfc2119" title="MUST">MUST</em> represent the Unicode code point of the character and <em class="rfc2119" title="SHOULD">SHOULD</em> be in hexadecimal notation. <a href="https://www.w3.org/TR/charmod/#C045">more</a> 1. [ ] <a class="self" href="#char_esc_acceptable"></a>Escaped characters <em class="rfc2119" title="SHOULD">SHOULD</em> be acceptable wherever their unescaped forms are; this does not preclude that syntax-significant characters, when escaped, lose their significance in the syntax. In particular, if a character is acceptable in identifiers and comments, then its escaped form should also be acceptable. <a href="https://www.w3.org/TR/charmod/#C046">more</a> ### Storing text 1. [ ] <a class="self" href="#char_storing_logical"></a>Protocols, data formats and APIs <em class="rfc2119" title="MUST">MUST</em> store, interchange or process text data in logical order. <a href="https://www.w3.org/TR/charmod/#C003">more</a> 1. [ ] <a class="self" href="#char_storing_discontiguous"></a>Specifications of protocols and APIs that involve selection of ranges <em class="rfc2119" title="SHOULD">SHOULD</em> provide for discontiguous logical selections, at least to the extent necessary to support implementation of visual selection on screen on top of those protocols and APIs. <a href="https://www.w3.org/TR/charmod/#C004">more</a> ### Specifying sort and search functionality 1. [ ] <a class="self" href="#char_sort_units"></a>Software that sorts or searches text for users <em class="rfc2119" title="SHOULD">SHOULD</em> do so on the basis of appropriate collation units and ordering rules for the relevant language and/or application. <a href="https://www.w3.org/TR/charmod/#C006">more</a> 1. [ ] <a class="self" href="#char_sort_user"></a>Where searching or sorting is done dynamically, particularly in a multilingual environment, the 'relevant language' <em class="rfc2119" title="SHOULD">SHOULD</em> be determined to be that of the current user, and may thus differ from user to user. <a href="https://www.w3.org/TR/charmod/#C007">more</a> 1. [ ] <a class="self" href="#char_sort_alternatives"></a>Software that allows users to sort or search text <em class="rfc2119" title="SHOULD">SHOULD</em> allow the user to select alternative rules for collation units and ordering. <a href="https://www.w3.org/TR/charmod/#C066">more</a> 1. [ ] <a class="self" href="#char_sort_anything"></a>Specifications and implementations of sorting and searching algorithms <em class="rfc2119" title="SHOULD">SHOULD</em> accommodate text that contains any character in Unicode. <a href="https://www.w3.org/TR/charmod/#C008">more</a> ### Converting to a Common Unicode Form 1. [ ] <a class="self" href="#char_n11n_nfc"></a>Specifications of text-based formats and protocols <em class="rfc2119" title="MAY">MAY</em> specify that all or part of the textual content of that format or protocol is normalized using Unicode Normalization Form C (NFC). <a href="https://www.w3.org/TR/charmod-norm/#h-convertingtocommonunicodeform">more</a> 1. [ ] <a class="self" href="#char_n11n_security"></a>Specifications that do not normalize <em class="rfc2119" title="MUST">MUST</em> document or provide a health-warning if canonically equivalent but disjoint Unicode character sequences represent a security issue. <a href="https://www.w3.org/TR/charmod-norm/#h-non-normalizing">more</a> 1. [ ] <a class="self" href="#char_n11n_assumptions"></a>Specifications and implementations <em class="rfc2119" title="MUST NOT">MUST NOT</em> assume that content is in any particular normalization form. <a href="https://www.w3.org/TR/charmod-norm/#h-non-normalizing">more</a> 1. [ ] <a class="self" href="#char_n11n_comparison"></a>Specifications <em class="rfc2119" title="MUST">MUST</em> specify that string matching takes the form of "code point-by-code point" comparison of the Unicode character sequence, or, if a specific Unicode character encoding is specified, code unit-by-code unit comparison of the sequences. <a href="https://www.w3.org/TR/charmod-norm/#h-non-normalizing">more</a> 1. [ ] <a class="self" href="#char_n11n_regex"></a>Specifications that define a regular expression syntax <em class="rfc2119" title="MUST">MUST</em> provide at least Basic Unicode Level 1 support per <cite>Unicode Technical Standard #18: Unicode Regular Expressions</cite> and <em class="rfc2119" title="SHOULD">SHOULD</em> provide Extended or Tailored (Levels 2 and 3) support. <a href="https://www.w3.org/TR/charmod-norm/#h-non-normalizing">more</a> 1. [ ] <a class="self" href="#char_n11n_comparison2"></a>Specifications of text-based formats and protocols that, as part of their syntax definition, require that the text be in normalized form <em class="rfc2119" title="MUST">MUST</em> define string matching in terms of normalized string comparison and <em class="rfc2119" title="MUST">MUST</em> define the normalized form to be NFC. <a href="https://www.w3.org/TR/charmod-norm/#h-normalizing-spec">more</a> 1. [ ] <a class="self" href="#char_n11n_suspect"></a>A normalizing text-processing component which receives suspect text <em class="rfc2119" title="MUST NOT">MUST NOT</em> perform any normalization-sensitive operations unless it has first either confirmed through inspection that the text is in normalized form or it has re-normalized the text itself. Private agreements <em class="rfc2119" title="MAY">MAY</em>, however, be created within private systems which are not subject to these rules, but any externally observable results <em class="rfc2119" title="MUST">MUST</em> be the same as if the rules had been obeyed. <a href="https://www.w3.org/TR/charmod-norm/#h-normalizing-spec">more</a> 1. [ ] <a class="self" href="#char_n11n_constructs"></a>Specifications of text-based languages and protocols <em class="rfc2119" title="SHOULD">SHOULD</em> define precisely the construct boundaries necessary to obtain a complete definition of full-normalization. These definitions <em class="rfc2119" title="SHOULD">SHOULD</em> include at least the boundaries between markup and character data as well as entity boundaries (if the language has any include mechanism) , <em class="rfc2119" title="SHOULD">SHOULD</em> include any other boundary that may create denormalization when instances of the language are processed, but <em class="rfc2119" title="SHOULD NOT">SHOULD NOT</em> include character escapes designed to express arbitrary characters. <a href="https://www.w3.org/TR/charmod-norm/#h-normalizing-spec">more</a> 1. [ ] <a class="self" href="#char_n11n_implementation"></a>Where operations can produce denormalized output from normalized text input, specifications of API components (functions/methods) that implement these operations <em class="rfc2119" title="MUST">MUST</em> define whether normalization is the responsibility of the caller or the callee. Specifications <em class="rfc2119" title="MAY">MAY</em> state that performing normalization is optional for some API components; in this case the default <em class="rfc2119" title="SHOULD">SHOULD</em> be that normalization is performed, and an explicit option <em class="rfc2119" title="SHOULD">SHOULD</em> be used to switch normalization off. Specifications <em class="rfc2119" title="SHOULD NOT">SHOULD NOT</em> make the implementation of normalization optional. <a href="https://www.w3.org/TR/charmod-norm/#h-normalizing-spec">more</a> 1. [ ] <a class="self" href="#char_n11n_mechanism"></a>Specifications that define a mechanism (for example an API or a defining language) for producing textual data object <em class="rfc2119" title="SHOULD">SHOULD</em> require that the final output of this mechanism be normalized. <a href="https://www.w3.org/TR/charmod-norm/#h-normalizing-spec">more</a> ### Handling Case Folding 1. [ ] <a class="self" href="#char_case_sensitive"></a>Case sensitive matching is <em class="rfc2119" title="RECOMMENDED">RECOMMENDED</em> as the default for new protocols and formats. <a href="https://www.w3.org/TR/charmod-norm/#h-handlingcasefolding">more</a> 1. [ ] <a class="self" href="#char_case_unicodecf"></a>Because the "simple" case-fold mapping removes information that can be important to forming an identity match, the "Common plus Full" (or "Unicode C+F") case fold mapping is <em class="rfc2119" title="RECOMMENDED">RECOMMENDED</em> for Unicode case-insensitive matching. <a href="https://www.w3.org/TR/charmod-norm/#h-handlingcasefolding">more</a> 1. [ ] <a class="self" href="#char_case_asciici"></a>ASCII case-insensitive matching <em class="rfc2119" title="MUST">MUST</em> only be applied to vocabularies that are restricted to ASCII. Unicode case-insensitivity <em class="rfc2119" title="MUST">MUST</em> be used for all other vocabularies. <a href="https://www.w3.org/TR/charmod-norm/#h-handlingcasefolding">more</a> 1. [ ] <a class="self" href="#char_case_asciicinot"></a>If the vocabulary is not restricted to ASCII or permits user-defined values that use a broader range of Unicode, ASCII case-insensitive matching <em class="rfc2119" title="MUST NOT">MUST NOT</em> be required. <a href="https://www.w3.org/TR/charmod-norm/#h-handlingcasefolding">more</a> 1. [ ] <a class="self" href="#char_case_vocabularies"></a>The Unicode C+F case-fold form is <em class="rfc2119" title="RECOMMENDED">RECOMMENDED</em> as the case-insensitive matching for vocabularies. The Unicode C+S form <em class="rfc2119" title="MUST NOT">MUST NOT</em> be used for string identity matching on the Web. <a href="https://www.w3.org/TR/charmod-norm/#h-handlingcasefolding">more</a> 1. [ ] <a class="self" href="#char_case_options"></a>Specifications and implementations that define string matching as part of the definition of a format, protocol, or formal language (which might include operations such as parsing, matching, tokenizing, etc.) <em class="rfc2119" title="MUST">MUST</em> define the criteria and matching forms used. These <em class="rfc2119" title="MUST">MUST</em> be one of: (a) Case-sensitive (b) Unicode case-insensitive using Unicode case-folding C+F (c) ASCII case-insensitive. <a href="https://www.w3.org/TR/charmod-norm/#h-handlingcasefolding">more</a> 1. [ ] <a class="self" href="#char_case_noci"></a>Specifications <em class="rfc2119" title="SHOULD NOT">SHOULD NOT</em> specify case-insensitive comparison of strings. <a href="https://www.w3.org/TR/charmod-norm/#h-handlingcasefolding">more</a> 1. [ ] <a class="self" href="#char_case_unicodecf2"></a>Specifications that specify case-insensitive comparison for non-ASCII vocabularies <em class="rfc2119" title="SHOULD">SHOULD</em> specify Unicode case-folding C+F. <a href="https://www.w3.org/TR/charmod-norm/#h-handlingcasefolding">more</a> 1. [ ] <a class="self" href="#char_case_asciionly"></a>Specifications <em class="rfc2119" title="MAY">MAY</em> specify ASCII case-insensitive comparison for portions of a format or protocol that are restricted to an ASCII-only vocabulary. <a href="https://www.w3.org/TR/charmod-norm/#h-handlingcasefolding">more</a> 1. [ ] <a class="self" href="#char_case_nonascii"></a>Specifications and implementations <em class="rfc2119" title="MUST NOT">MUST NOT</em> specify ASCII-only case-insensitive matching for values or constructs that permit non-ASCII characters. <a href="https://www.w3.org/TR/charmod-norm/#h-handlingcasefolding">more</a> ### Defining 'string' 1. [ ] <a class="self" href="#char_string_byte"></a>Specifications <em class="rfc2119" title="SHOULD NOT">SHOULD NOT</em> define a string as a 'byte string'. <a href="https://www.w3.org/TR/charmod/#C011">more</a> 1. [ ] <a class="self" href="#char_string_char"></a>The 'character string' definition <em class="rfc2119" title="SHOULD">SHOULD</em> be used by most specifications. <a href="https://www.w3.org/TR/charmod/#C012">more</a> ### Indexing strings 1. [ ] <a class="self" href="#char_index_char"></a>The character string is <em class="rfc2119" title="RECOMMENDED">RECOMMENDED</em> as a basis for string indexing. <a href="https://www.w3.org/TR/charmod/#C051">more</a> 1. [ ] <a class="self" href="#char_index_codeunit"></a>A code unit string <em class="rfc2119" title="MAY">MAY</em> be used as a basis for string indexing if this results in a significant improvement in the efficiency of internal operations when compared to the use of character string. <a href="https://www.w3.org/TR/charmod/#C052">more</a> 1. [ ] <a class="self" href="#char_index_grapheme"></a>Grapheme clusters <em class="rfc2119" title="MAY">MAY</em> be used as a basis for string indexing in applications where user interaction is the primary concern. <a href="https://www.w3.org/TR/charmod/#C071">more</a> 1. [ ] <a class="self" href="#char_index_grapheme_plus"></a>Specifications that define indexing in terms of grapheme clusters <em class="rfc2119" title="MUST">MUST</em> either: (a) define grapheme clusters in terms of default grapheme clusters as defined in Unicode Standard Annex #29, Text Boundaries [UTR #29], or (b) define specifically how tailoring is applied to the indexing operation. <a href="https://www.w3.org/TR/charmod/#C074">more</a> 1. [ ] <a class="self" href="#char_index_byte"></a>The use of byte strings for indexing is <em class="rfc2119" title="NOT RECOMMENDED">NOT RECOMMENDED</em>. <a href="https://www.w3.org/TR/charmod/#C072">more</a> 1. [ ] <a class="self" href="#char_index_substrings"></a>Specifications that need a way to identify substrings or point within a string <em class="rfc2119" title="SHOULD">SHOULD</em> provide ways other than string indexing to perform this operation. <a href="https://www.w3.org/TR/charmod/#C053">more</a> 1. [ ] <a class="self" href="#char_index_counting"></a>Specifications <em class="rfc2119" title="SHOULD">SHOULD</em> understand and process single characters as substrings, and treat indices as boundary positions between counting units, regardless of the choice of counting units. <a href="https://www.w3.org/TR/charmod/#C055">more</a> 1. [ ] <a class="self" href="#char_index_api"></a>Specifications of APIs <em class="rfc2119" title="SHOULD NOT">SHOULD NOT</em> specify single characters or single 'units of encoding' as argument or return types. <a href="https://www.w3.org/TR/charmod/#C056">more</a> 1. [ ] <a class="self" href="#char_index_0"></a>When the positions between the units are counted for string indexing, starting with an index of 0 for the position at the start of the string is the <em class="rfc2119" title="RECOMMENDED">RECOMMENDED</em> solution, with the last index then being equal to the number of counting units in the string. <a href="https://www.w3.org/TR/charmod/#C057">more</a> ### Referring to Unicode characters 1. [ ] <a class="self" href="#char_ref_Uchar"></a>Use U+XXXX syntax to represent Unicode code points in the specification. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#char_ref_Uchar">more</a> ### Referencing the Unicode Standard 1. [ ] <a class="self" href="#char_unicoderef_do"></a>Since specifications in general need both a definition for their characters and the semantics associated with these characters, specifications <em class="rfc2119" title="SHOULD">SHOULD</em> include a reference to the Unicode Standard, whether or not they include a reference to ISO/IEC 10646. <a href="https://www.w3.org/TR/charmod/#C062">more</a> 1. [ ] <a class="self" href="#char_unicoderef_generic"></a>A generic reference to the Unicode Standard <em class="rfc2119" title="MUST">MUST</em> be made if it is desired that characters allocated after a specification is published are usable with that specification. A specific reference to the Unicode Standard <em class="rfc2119" title="MAY">MAY</em> be included to ensure that functionality depending on a particular version is available and will not change over time. <a href="https://www.w3.org/TR/charmod/#C063">more</a> 1. [ ] <a class="self" href="#char_unicoderef_latest"></a>All generic references to the Unicode Standard <em class="rfc2119" title="MUST">MUST</em> refer to the latest version of the Unicode Standard available at the date of publication of the containing specification. <a href="https://www.w3.org/TR/charmod/#C064">more</a> 1. [ ] <a class="self" href="#char_unicoderef_10646"></a>All generic references to ISO/IEC 10646 <em class="rfc2119" title="MUST">MUST</em> refer to the latest version of ISO/IEC 10646 available at the date of publication of the containing specification. <a href="https://www.w3.org/TR/charmod/#C065">more</a> ## Resource identifiers ### Basics 1. [ ] <a href="#resid_use_iris"></a>Resource identifiers must permit the use of characters outside those of plain ASCII. <a href="https://github.com/w3c/web-annotation/issues/241">discussion</a> 1. [ ] <a class="self" href="#resid_iri_conversion"></a> Specifications <em class="rfc2119" title="MUST">MUST</em> define when the conversion from IRI references to URI references (or subsets thereof) takes place, in accordance with Internationalized Resource Identifiers (IRIs). <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#resid_iri_conversion">more</a> ## Markup & syntax ### Defining elements and attributes 1. [ ] <a class="self" href="#markup_attributes"></a>Do not define attribute values that will contain user readable content. Use elements for such content. <a href="https://www.w3.org/TR/xml-i18n-bp/#DevAttributes">more</a> 1. [ ] <a class="self" href="#markup_attributes_fallback"></a>If you do define attribute values containing user readable content, provide a means to indicate directional and language information for that text separately from the text contained in the element. 1. [ ] <a class="self" href="#markup_span"></a>Provide a way for authors to annotate arbitrary inline content using a <code class="kw" translate="no">span</code>-like element or construct. <a href="https://www.w3.org/TR/xml-i18n-bp/#DevSpan">more</a> ### Defining identifiers 1. [ ] <a class="self" href="#identifier_case"></a>Identifiers should be case-sensitive. ### Working with plain text 1. [ ] <a class="self" href="#plain_avoid"></a>Avoid natural language text in elements that only allow for plain text and in attribute values. 1. [ ] <a class="self" href="#plain_span"></a>Provide a span-like element that can be used for any text content to apply information needed for internationalization. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#plain_span">more</a> ## Typographic support ### Text decoration 1. [ ] <a class="self" href="#textdec_skip"></a>Text decoration such as underline and overline should allow lines to skip ink. 1. [ ] <a class="self" href="#textdec_distance"></a>It should be possible to specify the distance of overlines and underlines from the text. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#textdec_distance">more</a> ### Vertical text 1. [ ] <a class="self" href="#vertical_support"></a>It should be possible to render text vertically for languages such as Japanese, Chinese, Korean, Mongolian, etc. 1. [ ] <a class="self" href="#vertical_lr_rl"></a>Vertical text must support line progression from LTR (eg. Mongolian) and RTL (eg. Japanese) 1. [ ] <a class="self" href="#vertical_lr_rl"></a>By default, text decoration, ruby, and the like in vertical text where lines are stacked from left to right (eg. Mongolian) should appear on the same side as for CJK vertical text. Placement should not rely on the <code class="kw" translate="no">before</code> and <code class="kw" translate="no">after</code> line locations. 1. [ ] <a class="self" href="#vertical_utr50"></a>Vertical writing modes that are equivalent to the <code class="kw" translate="no">vertical-</code> values in CSS (only) should use UTR50 to apply default text orientation of characters. (This does not apply to writing modes that are equivalent to <code class="kw" translate="no">sideways-</code> in CSS.) 1. [ ] <a class="self" href="#vertical_upright"></a>By default, glyphs of scripts that are normally horizontal should run along a line in vertical text such that the top of the character is toward the right side of the vertical line, but there should also be a mechanism to allow them to progress down the line in upright orientation. Such a mechanism should use grapheme clusters as a minimum text unit, but where necessary allow syllabic clusters to be treated as a unit when they involve more than one grapheme cluster. 1. [ ] <a class="self" href="#vertical_upright_arabic"></a>Upright Arabic text in vertical lines should use isolated letter forms and the order of text should read top to bottom. 1. [ ] <a class="self" href="#vertical_tatechuyoko"></a>It should be possible for some sequences of characters (particularly digits) to run horizontally within vertical lines (tate chu yoko). 1. [ ] <a class="self" href="#vertical_sideways"></a>Writing modes should provide values like <code class="kw" translate="no">sideways-lr</code> and <code class="kw" translate="no">sideways-rl</code> in CSS to allow for vertical rotation of lines of horizontal script text. UTR50 is not applicable for these cases. ### Setting box positioning coordinates when text direction varies 1. [ ] <a class="self" href="#vertical_box_posn"></a>Box positioning coordinates must take into account whether the text is horizontal or vertical. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#vertical_box_posn">more</a> ### Ruby text annotations 1. [ ] <a class="self" href="#type_ruby"></a>'Ruby' style annotations alongside base text should be supported for Chinese, Japanese, Korean and Mongolian text, in both horizontal and vertical writing modes. 1. [ ] <a class="self" href="#ruby_zhuyin"></a>Ruby implementations should support zhuyin fuhao (bopomofo) ruby for Traditional Chinese. 1. [ ] <a class="self" href="#ruby_tabular"></a>Ruby implementations should support a tabular content model (such that ruby contents can be arranged in a sequence approximating to <code class="kw" translate="no">rb rb rt rt</code>). 1. [ ] <a class="self" href="#ruby_rb"></a>Ruby implementations should make it possible to use an explicit <code class="kw" translate="no">rb</code> tag for ruby bases. 1. [ ] <a class="self" href="#ruby_dblsided"></a>Ruby implementations should allow annotations to appear on either or both sides of the base text. ### Miscellaneous 1. [ ] <a class="self" href="#type_line_height"></a>Line heights must allow for characters that are taller than English. 1. [ ] <a class="self" href="#type_box_size"></a>Box sizes must allow for text expansion in translation. 1. [ ] <a class="self" href="#type_linebreak"></a>Line wrapping should take into account the special rules needed for non-Latin scripts. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#type_linebreak">more</a> 1. [ ] <a class="self" href="#type_presentational_tags"></a>Avoid specifying presentational tags, such as <code class="kw" translate="no">b</code> for bold, and <code class="kw" translate="no">i</code> for italic. <a class="local" href="https://w3c.github.io/bp-i18n-specdev/#type_presentational_tags">more</a> ## Local dates, times and formats ### Working with time 1. [ ] <a class="self" href="#loc_time_preCE"></a>When defining calendar and date systems, be sure to allow for dates prior to the common era, or at least define handling of dates outside the most common range. 1. [ ] <a class="self" href="#loc_time_utc"></a>When defining time or date data types, ensure that the time zone or relationship to UTC is always defined. 1. [ ] <a class="self" href="#loc_time_floating"></a>Provide a health warning for conversion of time or date data types that are "floating" to/from incremental types, referring as necessary to the <a href="https://www.w3.org/TR/timezone/"><cite>Time Zones</cite> WG Note</a>. <a href="https://www.w3.org/TR/timezone/">more</a> 1. [ ] <a class="self" href="#loc_time_leapsec"></a>Allow for leap seconds in date and time data types. <a href="https://w3c.github.io/bp-i18n-specdev/#loc_time_leapsec">more</a> 1. [ ] <a class="self" href="#loc_time_consistency"></a>Use consistent terminology when discussing date and time values. Use 'floating' time for time zone independent values. 1. [ ] <a class="self" href="#loc_time_zone_offset"></a>Keep separate the definition of time zone from time zone offset. 1. [ ] <a class="self" href="#loc_time_zone_ids"></a>Use IANA time zone IDs to identify time zones. Do not use offsets or LTO as a proxy for time zone. 1. [ ] <a class="self" href="#loc_time_zone_field"></a>Use a separate field to identify time zone. 1. [ ] <a class="self" href="#loc_time_week"></a>When defining rules for a "week", allow for culturally specific rules to be applied. <a href="https://w3c.github.io/bp-i18n-specdev/#loc_time_week">more</a> 1. [ ] <a class="self" href="#loc_time_week_number"></a>When defining rules for week number of year, allow for culturally specific rules to be applied. 1. [ ] <a class="self" href="#loc_time_13"></a>When non-Gregorian calendars are permitted, note that the "month" field can go to 13 (undecimber). ### Designing forms 1. [ ] <a class="self" href="#loc_forms_eai"></a>When defining email field validation, allow for EAI (smtputf8) names. ### Working with numbers 1. [ ] <a class="self" href="#loc_numbers_shape_parse"></a>When parsing user input of numeric values, allow for digit shaping (non-ASCII digits). 1. [ ] <a class="self" href="#loc_numbers_shape_display"></a>When formatting numeric values for display, allow for culturally sensitive display, including the use of non-ASCII digits (digit shaping). ## Navigation ### Providing for content negotiation based on language 1. [ ] <a class="self" href="#lang_neg"></a>In a multilingual environment it must be possible for the user to receive text in the language they prefer. This may depend on implicit user preferences based on the user's system or browser setup, or on user settings explicitly negotiated with the user. -- GitHub Notification of comment by r12a Please view or discuss this issue at https://github.com/w3c/bp-i18n-specdev/issues/22#issuecomment-299849301 using your GitHub account
Received on Monday, 8 May 2017 12:07:15 UTC