- From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
- Date: Wed, 24 Aug 2011 12:57:47 +0200
- To: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Just a thought on that matter, IIRC, N-Triple is supposed to be declared as "text/plain". As I see it, what we intend to standardize (whether it is still called N-Triples or not) will have its own mimetype, right? So there *should* be no interoperability problem: parsers provided with "text/plain" will continue to interpret it as old-style ASCII-encoded N-Triples, while they can expect UTF-8 when provided with "text/whatever-we-call-it". The spec could make include something like: The content-type for N-Triple is "text/n-triples". For legacy reasons [ref], some N-Triples content may be delivered with the "text/plain" content-type; however, in that case, the deliver content MUST/SHOULD only use the ASCII subset of UTF-8. In order to avoid interoperability problems with software that would rely on file extension rather than explicit content-type, we can recommend to use the ".nt8" extension for UTF-8 N-Triple (as suggested by Steve), and reserve the legacy extension ".nt" for pure ASCII. So it seems to me that Zhe's concern about interoperability are a bit overestimated. pa On 08/23/2011 06:18 PM, Jeremy Carroll wrote: > On 8/19/2011 6:34 PM, Zhe Wu wrote: >> I don't see how adding UTF8 encoding can make N-TRIPLES much more useful. > > Dear Zhe > > The simple answer is that several groups of experts on making the > internet work world wide have considered the general problem for many > years and come up with an answer that almost everyone seems happy enough > with. > > Please have your manager and your AC rep read > http://www.w3.org/TR/charmod/#sec-Background > > and RFC 2277 > > > _*Charmod*_ > The choice of Unicode was motivated by the fact that Unicode: > > * is the only universal character repertoire available, > * provides a way of referencing characters independent of the > encoding of the text, > * is being updated/completed carefully, > * is widely accepted and implemented by industry. > > Characters outside the US-ASCII [ISO/IEC 646] > <http://www.w3.org/TR/charmod/#iso646>[MIME-charset] > <http://www.w3.org/TR/charmod/#MIME-charset> repertoire are being used > in more and more places. > > With the international Internet follows an absolute requirement to > interchange data in a multiplicity of languages, which in turn utilize a > bewildering number of characters. > > _*RFC 2277*_ > > Internationalization is for humans. This means that protocols are not > subject to internationalization; text strings are. Where protocol > elements look like text tokens, such as in many IETF application layer > protocols, protocols MUST specify which parts are protocol and which are > text. [WR 2.2.1.1] Names are a problem, because people feel strongly > about them, many of them are mostly for local usage, and all of them > tend to leak out of the local context at times. RFC 1958 [RFC 1958] > recommends US-ASCII for all globally visible names. This document does > not mandate a policy on name internationalization, but requires that all > protocols describe whether names are internationalized or US-ASCII. > > *** > > Jeremy's note: in RDF the names are explicitly IRIs i.e. internationalized. > > _*RFC 2277*_ > Protocols MUST be able to use the UTF-8 charset > > > **** > > > > Zhe - I currently believe Oracle is threatening a formal objection if this > WG follows mandated practice from IETF and W3C policy documents. > Is this the intent? > > Jeremy > > > > >
Received on Wednesday, 24 August 2011 12:04:50 UTC