- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Thu, 25 Aug 2011 14:53:44 +0100
- To: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
- Cc: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Paraphrasing: US-ASCII - media type: text/plain - extension: .nt UTF-8 - media type: text/n-triples;charset=utf-8 - extension: .nt8 And add some text: [[ Some N-Triples parsers only support the US-ASCII subset of N-Triples. N-Triples serializers that intend to maximize compatibility with such parsers SHOULD restrict output to the US-ASCII subset. To maximize forward compatibility, N-Triples parsers SHOULD accept UTF-8 encoded content even if it is served with a text/plain media type or with a .nt file extension. ]] I think that would be great. Best, Richard On 24 Aug 2011, at 11:57, Pierre-Antoine Champin wrote: > Just a thought on that matter, > > IIRC, N-Triple is supposed to be declared as "text/plain". As I see it, > what we intend to standardize (whether it is still called N-Triples or > not) will have its own mimetype, right? > > So there *should* be no interoperability problem: parsers provided with > "text/plain" will continue to interpret it as old-style ASCII-encoded > N-Triples, while they can expect UTF-8 when provided with > "text/whatever-we-call-it". The spec could make include something like: > > The content-type for N-Triple is "text/n-triples". For legacy reasons > [ref], some N-Triples content may be delivered with the "text/plain" > content-type; however, in that case, the deliver content MUST/SHOULD > only use the ASCII subset of UTF-8. > > In order to avoid interoperability problems with software that would > rely on file extension rather than explicit content-type, we can > recommend to use the ".nt8" extension for UTF-8 N-Triple (as suggested > by Steve), and reserve the legacy extension ".nt" for pure ASCII. > > So it seems to me that Zhe's concern about interoperability are a bit > overestimated. > > pa > > On 08/23/2011 06:18 PM, Jeremy Carroll wrote: >> On 8/19/2011 6:34 PM, Zhe Wu wrote: >>> I don't see how adding UTF8 encoding can make N-TRIPLES much more useful. >> >> Dear Zhe >> >> The simple answer is that several groups of experts on making the >> internet work world wide have considered the general problem for many >> years and come up with an answer that almost everyone seems happy enough >> with. >> >> Please have your manager and your AC rep read >> http://www.w3.org/TR/charmod/#sec-Background >> >> and RFC 2277 >> >> >> _*Charmod*_ >> The choice of Unicode was motivated by the fact that Unicode: >> >> * is the only universal character repertoire available, >> * provides a way of referencing characters independent of the >> encoding of the text, >> * is being updated/completed carefully, >> * is widely accepted and implemented by industry. >> >> Characters outside the US-ASCII [ISO/IEC 646] >> <http://www.w3.org/TR/charmod/#iso646>[MIME-charset] >> <http://www.w3.org/TR/charmod/#MIME-charset> repertoire are being used >> in more and more places. >> >> With the international Internet follows an absolute requirement to >> interchange data in a multiplicity of languages, which in turn utilize a >> bewildering number of characters. >> >> _*RFC 2277*_ >> >> Internationalization is for humans. This means that protocols are not >> subject to internationalization; text strings are. Where protocol >> elements look like text tokens, such as in many IETF application layer >> protocols, protocols MUST specify which parts are protocol and which are >> text. [WR 2.2.1.1] Names are a problem, because people feel strongly >> about them, many of them are mostly for local usage, and all of them >> tend to leak out of the local context at times. RFC 1958 [RFC 1958] >> recommends US-ASCII for all globally visible names. This document does >> not mandate a policy on name internationalization, but requires that all >> protocols describe whether names are internationalized or US-ASCII. >> >> *** >> >> Jeremy's note: in RDF the names are explicitly IRIs i.e. internationalized. >> >> _*RFC 2277*_ >> Protocols MUST be able to use the UTF-8 charset >> >> >> **** >> >> >> >> Zhe - I currently believe Oracle is threatening a formal objection if this >> WG follows mandated practice from IETF and W3C policy documents. >> Is this the intent? >> >> Jeremy >> >> >> >> >> > >
Received on Thursday, 25 August 2011 13:54:17 UTC