Fwd: Comments on Considerations in draft-hollenbeck-ietf-xml-guide.txt

Misha, Workshop participants,

I've commented on draft-hollenbeck-ietf-xml-guide.txt, written by Scott
Hollenbeck, Marshall Rose, and Larry Masinter. This memo depricates the
use of any encoding other than UTF-8 for use with XML, which I think is
inconsistent with rfc2277. It also depricates mechanisms for language
indication that do not rely upon iso639 or iso3166, and as neither of
my two non-European languages have 639 codes, and neither of my non-State
polities have 3166 codes, I must "non-hum". It also continues an abuse of
language that confuses "i18n" with Unicode (or encodings generally), and
fails to state the collation issue, which arises when objects are "named"
using strings, and names are matched, searched, or sorted.

The text of the memo is available at:
http://www.imc.org/ietf-xml-use/draft-hollenbeck-ietf-xml-guide.{html,txt}

Comments to me, or this list, or to that list, if so inclined. I'll get
them either way.

Eric
------- Forwarded Message

Message-Id: <200204111713.g3BHD4X74227@nic-naa.net>
To: ietf-xml-use@imc.org
Subject: Comments on Section 5
Date: Thu, 11 Apr 2002 13:13:04 -0400
From: Eric Brunner-Williams in Portland Maine <brunner@nic-naa.net>
Sender: owner-ietf-xml-use@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-use/mail-archive/>
List-Unsubscribe: <mailto:ietf-xml-use-request@imc.org?body=unsubscribe>
List-ID: <ietf-xml-use.imc.org>



5. Internationalization Considerations

   This section describes internationalization considerations for the
   use of XML to represent data in IETF protocols.  Readers should be
   familiar with IETF policy on the use of character sets and languages
   as described in RFC 2277 [3].

Suggestion:

   This section describes character set and language attribute declarations
   available to authors of protocols using XML, and the text directionality
   attribute declarations available using XHTML.

   Readers are encouraged to be familiar with RFC 2277, which requires
   protocols MUST identify which charset is used and suggests protocols
   contain a mechanism for charset negotiation, and additionaly requires
   that UTF-8 support MUST be possible.

   RFC 2277 also requires protocols MUST provide a mechanism capable of
   carrying information about the language of that text, and also suggests
   protocols contain a mechanism for language naming , and for language
   negotiation, and additionaly requires a default value for language,
   which MUST be understandable by an English-speaking person.

   This section does not describe considerations for the use of locales
   in XML to represent character properties, such as collation orderings,
   word breaking or formats for dates, numbers, or currency.

[Meta-Comment: I doubt the wisdom of leaving locales out of IETF i18n
 boiler-plate, and its my experience that most IETF contributors who
 encounter i18n casually read 2277 as an issue-free license cum requirement
 to use Unicode.]

5.1 Character Sets

   XML provides native support for encoding information using the
   Unicode character set and its more compact representations including
   UTF-8 [4] and UTF-16 [26].  Other encodings are also supported and
   can be specified using an "encoding" attribute in a document's XML
   declaration.  It is strongly recommended that UTF-8 be mandated for
   protocols that represent data using XML.

Suggestion:

   ...
   UTF-8 [4] and UTF-16 [26].  Other encodings are also supported and
   may be specified using the encoding pseudo-attribute in the xml
   declaration at the start of a document or the text declaration at the
   start of an entity.

   Examples:
   <?xml version="1.0" encoding='iso-8859-1' ?> 
   <?xml version="1.0" encoding='iso-8859-2' ?> 
   ...
   <?xml version="1.0" encoding='iso-2022-JP' ?>
   <?xml version="1.0" encoding='Shift-JIS' ?>
   <?xml version="1.0" encoding='EUC-JP' ?>
   ...
   <?xml version="1.0" encoding='i-mingo' ?>

[Comment: Even if I agreed with the last sentance of the original paragraph,
 real examples are a good thing.]

   Guidelines for the use of XML declarations can be found in Section
   4.1.

[Comment: I don't see the import of this back-reference. How does sec. 4.1 
 provide guidelines for use, and meaningfully for charsets?]

   ...  If an XML declaration is omitted, it is strongly urged to
   require use of a consistent character set, and to require UTF-8 as
   the most appropriate character set.  If an XML declaration is
   allowed, it is again strongly urged to require use of a consistent
   character set, to require UTF-8 as the most appropriate character
   set, and to recommend inclusion of an "encoding" attribute that
   explicitly notes use of UTF-8 encoding.

Suggestion:

   ... and to require UTF-8 as the most appropriate character set, if
   it is in fact the most appropriate character set. ...

[Comment: The original text is over-reaching. Either it is repeating, and
 removing the conditional applicability from, 2277, or it is promoting a
 universal structured data over protocol-specific data.

 Now the W3C can discard the encoding pseudo-attribute, and mandate UTF-8,
 in XML, but that's their business. Ours is interoperability, legacy systems
 included. This shouldn't seem to be an end-run on the non-UTF-8 bits of 2277
 and 3066. Remove the UTF-8 theocracy, and retain secular data exchange. Thx.]
 
5.2 Language Declaration

[Comment: The reference to http://www.w3.org/TR/2000/REC-xml-20001006, sec.
 1.12, refers to rfc1766, which in turn refers to iso636 and iso3166.
 
 The substitution of International Treaty Organization normative references
 and Nation State identifiers for the purpose of interoperable text concerns
 me. My concerns of course may be misplaced, I'm just an ignorant Indian.
 I suggest that we invite more comments, e.g.,

	To:	 golla@ssila.org <SSILA LIST>,
		 endangered-languages-l@cleo.murdoch.edu.au
	Subject: Request for Comments: Proposed IETF Guidelines for Language
		 Identifiers in protocols using XML 

	Body
		The IETF is considering a proposal for the authors of protocols
		using XML to limit the identifiers for human languages to the
		set defined in iso639-1 [1] and iso3166 [2].

		This would have the effect of discouraging the development of
		internet protocols which use XML for structured data exchange,
		and which use identifiers not defined in these references. 

		For reference, using North America as an example

		The set of Indigenous Languages of North America for which
		an iso639-1 identifers exist is:
		ik (inupiaq), iu (inuktitut), nv (navajo), qu (quechua)

		The set of additional identifers for which a value exists
		in iso3166 for North America is:
		gr (greenland), ca (canada), us (united states of america),
		mx (united states of mexico)

		Comments, particularly those that describe use cases for which
		the above is unsuitable, and any alternatives, should be sent
		to the ietf-xml-use@imc.org mailing list. To join the list,
		send a message to "ietf-xml-use-request@imc.org" with the
		word "subscribe" in the body of the message. There is a web
		site for the list archives at

			http:// www.imc.org/ietf-xml-use/.

		[1] http://linux.infoterm.org/infoterm-e/i-infoterm.htm?raiso639-1_start.htm~Mitte
		[2] http://www.din.de/gremien/nas/nabd/iso3166ma/codlstp1/en_listp1.html

	EOT

	SSILA == Society for the Study of the Indigenous Languages of the
	Americas.

 Both scholarly and affected community input would be useful. The alternative
 is an inapplicability statement, something of the form "the use of minority,
 rare, infrequently taught, endangered, or extinct languages is depricated in
 IETF standards-track protocols that use XML for the delivery of structured
 text."
]

Suggestion:

   language used to represent data in an XML document.  The xml:lang
   attribute is defined in section 2.12 of [8], and has and values
   defined in [ISO 639]. [Add to References]

   It is strongly recommended that protocols representing data in a
   human language use of an xml:lang attribute if the XML instance
   might be interpreted in language-dependent contexts, and if the
   language identifier is defined in [ISO 639].



Meta-Nit: 2277, sec 6, lines 325 - 328, suggests that "Internationalization
considerations", be placed next to the Security Considerations section.

Suggestion: Reorder sections 6 and 7.

Eric

------- End of Forwarded Message

Received on Thursday, 11 April 2002 23:52:01 UTC