- From: Jeremy Carroll <jeremy@topquadrant.com>
- Date: Tue, 23 Aug 2011 09:18:44 -0700
- To: public-rdf-wg@w3.org
- Message-ID: <4E53D2E4.2050907@topquadrant.com>
On 8/19/2011 6:34 PM, Zhe Wu wrote:
> I don't see how adding UTF8 encoding can make N-TRIPLES much more useful.
Dear Zhe
The simple answer is that several groups of experts on making the
internet work world wide have considered the general problem for many
years and come up with an answer that almost everyone seems happy enough
with.
Please have your manager and your AC rep read
http://www.w3.org/TR/charmod/#sec-Background
and RFC 2277
_*Charmod*_
The choice of Unicode was motivated by the fact that Unicode:
* is the only universal character repertoire available,
* provides a way of referencing characters independent of the encoding
of the text,
* is being updated/completed carefully,
* is widely accepted and implemented by industry.
Characters outside the US-ASCII [ISO/IEC 646]
<http://www.w3.org/TR/charmod/#iso646>[MIME-charset]
<http://www.w3.org/TR/charmod/#MIME-charset> repertoire are being used
in more and more places.
With the international Internet follows an absolute requirement to
interchange data in a multiplicity of languages, which in turn utilize a
bewildering number of characters.
_*RFC 2277*_
Internationalization is for humans. This means that protocols are not
subject to internationalization; text strings are. Where protocol
elements look like text tokens, such as in many IETF application layer
protocols, protocols MUST specify which parts are protocol and which are
text. [WR 2.2.1.1] Names are a problem, because people feel strongly
about them, many of them are mostly for local usage, and all of them
tend to leak out of the local context at times. RFC 1958 [RFC 1958]
recommends US-ASCII for all globally visible names. This document does
not mandate a policy on name internationalization, but requires that all
protocols describe whether names are internationalized or US-ASCII.
***
Jeremy's note: in RDF the names are explicitly IRIs i.e. internationalized.
_*RFC 2277*_
Protocols MUST be able to use the UTF-8 charset
****
Zhe - I currently believe Oracle is threatening a formal objection if this
WG follows mandated practice from IETF and W3C policy documents.
Is this the intent?
Jeremy
Received on Tuesday, 23 August 2011 16:19:03 UTC