Re: N-triples (1.4) from Satoshi Nakamura on 2001-07-19 (w3c-rdfcore-wg@w3.org from July 2001)

From: Satoshi Nakamura <snakamura@infoteria.co.jp>
Date: Thu, 19 Jul 2001 12:01:12 -0400
To: Graham Klyne <GK@NineByNine.org>
Cc: RDF core WG <w3c-rdfcore-wg@w3.org>
Message-Id: <53620010719112907snakamura@infoteria.co.jp>

Hi,

At 18 Jul 2001 21:49:06 +0100 Graham Klyne wrote:
> At 08:26 PM 7/18/01 +0100, Dave Beckett wrote:
> > > 3. Outstanding issue 1
> > >
> > > N-Triples is a text/plain MIME type format - consider character and
> > > encoding issues with requirement to be able to express all Unicode chars.
> > >
> > > That much is easy, I think.  Use:
> > >    Content-type: text/plain;charset=utf-8
> >
> >That out-of-band information cannot be picked up by the parser just
> >reading the bytes.  I would prefer the format to be self-contained if
> >possible, not depending on charset, so all unicode chars can be
> >handled inside US-ASCII.  Asking N-Triples parsers to have to add an
> >entire UTF-8 decoding step seems rather an large step, when \u
> >etc. below could do the work when required.
> 
> OK, I accept the "out of band" point.
> 
> Assuming that the only place where non-ASCII characters can appear is in 
> string literals, that might work.  Then I think we'd also need to be 
> careful about requiring non-USASCII characters to always be escaped in 
> string literals so that the higher Unicode code points don't appear 
> anywhere in the N-triples source code.
> 
> But then there's the internationalized URIs and domain name work that's 
> waiting in the wings, so I don't suppose that approach would last for ever.
> 
> If you want to stick with just US-ASCII in an N-triples file then I won't 
> fight it, but my own feeling is that it would be easier to just 
> say:  always use UTF-8 encoding.  That seems fairly future-proof.

If n-triples containing characters other than us-ascii are escaped, it
would not be viewed with a normal viewer or an editor. It may be a problem
for people who use a language other than English.

However, there are some problem.

1. Is it possible to using encoding other than utf-8?

For example, some servers or gateways may make a content negotiation and
convert its charset, if n-triples uses text/* content-type. What happen if
the server convert its encoding to other than utf-8 and replace charset
parameter?

2. What happen if n-triples are transfered using 'Content-Type:
text/plain'?

MIME specification says that if there is no charset parameter, it must be
treated as us-ascii, and HTTP says iso-8859-1. Is it an error?

To avoid these problem, RDF uses XML as a container, and it solves these
problems. So, it's may be nonsense to discuss about transfering n-triples.
In this meaning, escapeing all characters other than us-ascii, and
handling n-triples as us-ascii text seems good idea.

Satoshi

---
Satoshi Nakamura <snakamura@infoteria.co.jp>
Infoteria Corporation

Received on Thursday, 19 July 2001 12:01:55 UTC