W3C home > Mailing lists > Public > public-rdf-wg@w3.org > August 2011

Re: Oracle's stand regarding N-TRIPLES

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 25 Aug 2011 14:53:44 +0100
Cc: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Message-Id: <094B132A-82A6-484A-A7CF-D7228A81D07C@cyganiak.de>
To: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>

- media type: text/plain
- extension: .nt

- media type: text/n-triples;charset=utf-8
- extension: .nt8

And add some text:

Some N-Triples parsers only support the US-ASCII subset of N-Triples. N-Triples serializers that intend to maximize compatibility with such parsers SHOULD restrict output to the US-ASCII subset.

To maximize forward compatibility, N-Triples parsers SHOULD accept UTF-8 encoded content even if it is served with a text/plain media type or with a .nt file extension.

I think that would be great.


On 24 Aug 2011, at 11:57, Pierre-Antoine Champin wrote:
> Just a thought on that matter,
> IIRC, N-Triple is supposed to be declared as "text/plain". As I see it,
> what we intend to standardize (whether it is still called N-Triples or
> not) will have its own mimetype, right?
> So there *should* be no interoperability problem: parsers provided with
> "text/plain" will continue to interpret it as old-style ASCII-encoded
> N-Triples, while they can expect UTF-8 when provided with
> "text/whatever-we-call-it". The spec could make include something like:
>  The content-type for N-Triple is "text/n-triples". For legacy reasons
>  [ref], some N-Triples content may be delivered with the "text/plain"
>  content-type; however, in that case, the deliver content MUST/SHOULD
>  only use the ASCII subset of UTF-8.
> In order to avoid interoperability problems with software that would
> rely on file extension rather than explicit content-type, we can
> recommend to use the ".nt8" extension for UTF-8 N-Triple (as suggested
> by Steve), and reserve the legacy extension ".nt" for pure ASCII.
> So it seems to me that Zhe's concern about interoperability are a bit
> overestimated.
>  pa
> On 08/23/2011 06:18 PM, Jeremy Carroll wrote:
>> On 8/19/2011 6:34 PM, Zhe Wu wrote:
>>> I don't see how adding UTF8 encoding can make N-TRIPLES much more useful.
>> Dear Zhe
>> The simple answer is that several groups of experts on making the
>> internet work world wide have considered the general problem for many
>> years and come up with an answer that almost everyone seems happy enough
>> with.
>> Please have your manager and your AC rep read
>> http://www.w3.org/TR/charmod/#sec-Background
>> and RFC 2277
>> _*Charmod*_
>> The choice of Unicode was motivated by the fact that Unicode:
>>    * is the only universal character repertoire available,
>>    * provides a way of referencing characters independent of the
>>      encoding of the text,
>>    * is being updated/completed carefully,
>>    * is widely accepted and implemented by industry.
>> Characters outside the US-ASCII [ISO/IEC 646]
>> <http://www.w3.org/TR/charmod/#iso646>[MIME-charset]
>> <http://www.w3.org/TR/charmod/#MIME-charset> repertoire are being used
>> in more and more places.
>> With the international Internet follows an absolute requirement to
>> interchange data in a multiplicity of languages, which in turn utilize a
>> bewildering number of characters.
>> _*RFC 2277*_
>> Internationalization is for humans. This means that protocols are not
>> subject to internationalization; text strings are. Where protocol
>> elements look like text tokens, such as in many IETF application layer
>> protocols, protocols MUST specify which parts are protocol and which are
>> text. [WR] Names are a problem, because people feel strongly
>> about them, many of them are mostly for local usage, and all of them
>> tend to leak out of the local context at times. RFC 1958 [RFC 1958]
>> recommends US-ASCII for all globally visible names. This document does
>> not mandate a policy on name internationalization, but requires that all
>> protocols describe whether names are internationalized or US-ASCII.  
>> ***
>> Jeremy's note: in RDF the names are explicitly IRIs i.e. internationalized.
>> _*RFC 2277*_
>> Protocols MUST be able to use the UTF-8 charset
>> ****
>> Zhe - I currently believe Oracle is threatening a formal objection if this
>> WG follows mandated practice from IETF and W3C policy documents.
>> Is this the intent?
>> Jeremy
Received on Thursday, 25 August 2011 13:54:17 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:04:08 UTC