Re: Oracle's stand regarding N-TRIPLES from Pierre-Antoine Champin on 2011-08-24 (public-rdf-wg@w3.org from August 2011)

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Wed, 24 Aug 2011 12:57:47 +0200
To: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Message-ID: <4E54D92B.6000406@liris.cnrs.fr>
Just a thought on that matter,

IIRC, N-Triple is supposed to be declared as "text/plain". As I see it,
what we intend to standardize (whether it is still called N-Triples or
not) will have its own mimetype, right?

So there *should* be no interoperability problem: parsers provided with
"text/plain" will continue to interpret it as old-style ASCII-encoded
N-Triples, while they can expect UTF-8 when provided with
"text/whatever-we-call-it". The spec could make include something like:

  The content-type for N-Triple is "text/n-triples". For legacy reasons
  [ref], some N-Triples content may be delivered with the "text/plain"
  content-type; however, in that case, the deliver content MUST/SHOULD
  only use the ASCII subset of UTF-8.

In order to avoid interoperability problems with software that would
rely on file extension rather than explicit content-type, we can
recommend to use the ".nt8" extension for UTF-8 N-Triple (as suggested
by Steve), and reserve the legacy extension ".nt" for pure ASCII.

So it seems to me that Zhe's concern about interoperability are a bit
overestimated.

  pa

On 08/23/2011 06:18 PM, Jeremy Carroll wrote:
> On 8/19/2011 6:34 PM, Zhe Wu wrote:
>> I don't see how adding UTF8 encoding can make N-TRIPLES much more useful.
> 
> Dear Zhe
> 
> The simple answer is that several groups of experts on making the
> internet work world wide have considered the general problem for many
> years and come up with an answer that almost everyone seems happy enough
> with.
> 
> Please have your manager and your AC rep read
> http://www.w3.org/TR/charmod/#sec-Background
> 
> and RFC 2277
> 
> 
> _*Charmod*_
> The choice of Unicode was motivated by the fact that Unicode:
> 
>     * is the only universal character repertoire available,
>     * provides a way of referencing characters independent of the
>       encoding of the text,
>     * is being updated/completed carefully,
>     * is widely accepted and implemented by industry.
> 
> Characters outside the US-ASCII [ISO/IEC 646]
> <http://www.w3.org/TR/charmod/#iso646>[MIME-charset]
> <http://www.w3.org/TR/charmod/#MIME-charset> repertoire are being used
> in more and more places.
> 
> With the international Internet follows an absolute requirement to
> interchange data in a multiplicity of languages, which in turn utilize a
> bewildering number of characters.
> 
> _*RFC 2277*_
> 
> Internationalization is for humans. This means that protocols are not
> subject to internationalization; text strings are. Where protocol
> elements look like text tokens, such as in many IETF application layer
> protocols, protocols MUST specify which parts are protocol and which are
> text. [WR 2.2.1.1] Names are a problem, because people feel strongly
> about them, many of them are mostly for local usage, and all of them
> tend to leak out of the local context at times. RFC 1958 [RFC 1958]
> recommends US-ASCII for all globally visible names. This document does
> not mandate a policy on name internationalization, but requires that all
> protocols describe whether names are internationalized or US-ASCII.  
> 
> ***
> 
> Jeremy's note: in RDF the names are explicitly IRIs i.e. internationalized.
> 
> _*RFC 2277*_
> Protocols MUST be able to use the UTF-8 charset
> 
> 
> ****
> 
> 
> 
> Zhe - I currently believe Oracle is threatening a formal objection if this
> WG follows mandated practice from IETF and W3C policy documents.
> Is this the intent?
> 
> Jeremy
> 
> 
> 
> 
>
Received on Wednesday, 24 August 2011 12:04:50 UTC