Re: Oracle's stand regarding N-TRIPLES from Richard Cyganiak on 2011-08-25 (public-rdf-wg@w3.org from August 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 25 Aug 2011 14:53:44 +0100
To: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Cc: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Message-Id: <094B132A-82A6-484A-A7CF-D7228A81D07C@cyganiak.de>
Paraphrasing:

US-ASCII
- media type: text/plain
- extension: .nt

UTF-8
- media type: text/n-triples;charset=utf-8
- extension: .nt8

And add some text:

[[
Some N-Triples parsers only support the US-ASCII subset of N-Triples. N-Triples serializers that intend to maximize compatibility with such parsers SHOULD restrict output to the US-ASCII subset.

To maximize forward compatibility, N-Triples parsers SHOULD accept UTF-8 encoded content even if it is served with a text/plain media type or with a .nt file extension.
]]

I think that would be great.

Best,
Richard



On 24 Aug 2011, at 11:57, Pierre-Antoine Champin wrote:
> Just a thought on that matter,
> 
> IIRC, N-Triple is supposed to be declared as "text/plain". As I see it,
> what we intend to standardize (whether it is still called N-Triples or
> not) will have its own mimetype, right?
> 
> So there *should* be no interoperability problem: parsers provided with
> "text/plain" will continue to interpret it as old-style ASCII-encoded
> N-Triples, while they can expect UTF-8 when provided with
> "text/whatever-we-call-it". The spec could make include something like:
> 
>  The content-type for N-Triple is "text/n-triples". For legacy reasons
>  [ref], some N-Triples content may be delivered with the "text/plain"
>  content-type; however, in that case, the deliver content MUST/SHOULD
>  only use the ASCII subset of UTF-8.
> 
> In order to avoid interoperability problems with software that would
> rely on file extension rather than explicit content-type, we can
> recommend to use the ".nt8" extension for UTF-8 N-Triple (as suggested
> by Steve), and reserve the legacy extension ".nt" for pure ASCII.
> 
> So it seems to me that Zhe's concern about interoperability are a bit
> overestimated.
> 
>  pa
> 
> On 08/23/2011 06:18 PM, Jeremy Carroll wrote:
>> On 8/19/2011 6:34 PM, Zhe Wu wrote:
>>> I don't see how adding UTF8 encoding can make N-TRIPLES much more useful.
>> 
>> Dear Zhe
>> 
>> The simple answer is that several groups of experts on making the
>> internet work world wide have considered the general problem for many
>> years and come up with an answer that almost everyone seems happy enough
>> with.
>> 
>> Please have your manager and your AC rep read
>> http://www.w3.org/TR/charmod/#sec-Background
>> 
>> and RFC 2277
>> 
>> 
>> _*Charmod*_
>> The choice of Unicode was motivated by the fact that Unicode:
>> 
>>    * is the only universal character repertoire available,
>>    * provides a way of referencing characters independent of the
>>      encoding of the text,
>>    * is being updated/completed carefully,
>>    * is widely accepted and implemented by industry.
>> 
>> Characters outside the US-ASCII [ISO/IEC 646]
>> <http://www.w3.org/TR/charmod/#iso646>[MIME-charset]
>> <http://www.w3.org/TR/charmod/#MIME-charset> repertoire are being used
>> in more and more places.
>> 
>> With the international Internet follows an absolute requirement to
>> interchange data in a multiplicity of languages, which in turn utilize a
>> bewildering number of characters.
>> 
>> _*RFC 2277*_
>> 
>> Internationalization is for humans. This means that protocols are not
>> subject to internationalization; text strings are. Where protocol
>> elements look like text tokens, such as in many IETF application layer
>> protocols, protocols MUST specify which parts are protocol and which are
>> text. [WR 2.2.1.1] Names are a problem, because people feel strongly
>> about them, many of them are mostly for local usage, and all of them
>> tend to leak out of the local context at times. RFC 1958 [RFC 1958]
>> recommends US-ASCII for all globally visible names. This document does
>> not mandate a policy on name internationalization, but requires that all
>> protocols describe whether names are internationalized or US-ASCII.  
>> 
>> ***
>> 
>> Jeremy's note: in RDF the names are explicitly IRIs i.e. internationalized.
>> 
>> _*RFC 2277*_
>> Protocols MUST be able to use the UTF-8 charset
>> 
>> 
>> ****
>> 
>> 
>> 
>> Zhe - I currently believe Oracle is threatening a formal objection if this
>> WG follows mandated practice from IETF and W3C policy documents.
>> Is this the intent?
>> 
>> Jeremy
>> 
>> 
>> 
>> 
>> 
> 
>
Received on Thursday, 25 August 2011 13:54:17 UTC