- From: Jeremy Carroll <jeremy@topquadrant.com>
- Date: Tue, 23 Aug 2011 09:18:44 -0700
- To: public-rdf-wg@w3.org
- Message-ID: <4E53D2E4.2050907@topquadrant.com>
On 8/19/2011 6:34 PM, Zhe Wu wrote: > I don't see how adding UTF8 encoding can make N-TRIPLES much more useful. Dear Zhe The simple answer is that several groups of experts on making the internet work world wide have considered the general problem for many years and come up with an answer that almost everyone seems happy enough with. Please have your manager and your AC rep read http://www.w3.org/TR/charmod/#sec-Background and RFC 2277 _*Charmod*_ The choice of Unicode was motivated by the fact that Unicode: * is the only universal character repertoire available, * provides a way of referencing characters independent of the encoding of the text, * is being updated/completed carefully, * is widely accepted and implemented by industry. Characters outside the US-ASCII [ISO/IEC 646] <http://www.w3.org/TR/charmod/#iso646>[MIME-charset] <http://www.w3.org/TR/charmod/#MIME-charset> repertoire are being used in more and more places. With the international Internet follows an absolute requirement to interchange data in a multiplicity of languages, which in turn utilize a bewildering number of characters. _*RFC 2277*_ Internationalization is for humans. This means that protocols are not subject to internationalization; text strings are. Where protocol elements look like text tokens, such as in many IETF application layer protocols, protocols MUST specify which parts are protocol and which are text. [WR 2.2.1.1] Names are a problem, because people feel strongly about them, many of them are mostly for local usage, and all of them tend to leak out of the local context at times. RFC 1958 [RFC 1958] recommends US-ASCII for all globally visible names. This document does not mandate a policy on name internationalization, but requires that all protocols describe whether names are internationalized or US-ASCII. *** Jeremy's note: in RDF the names are explicitly IRIs i.e. internationalized. _*RFC 2277*_ Protocols MUST be able to use the UTF-8 charset **** Zhe - I currently believe Oracle is threatening a formal objection if this WG follows mandated practice from IETF and W3C policy documents. Is this the intent? Jeremy
Received on Tuesday, 23 August 2011 16:19:03 UTC