- From: Satoshi Nakamura <snakamura@infoteria.co.jp>
- Date: Thu, 19 Jul 2001 12:01:12 -0400
- To: Graham Klyne <GK@NineByNine.org>
- Cc: RDF core WG <w3c-rdfcore-wg@w3.org>
Hi, At 18 Jul 2001 21:49:06 +0100 Graham Klyne wrote: > At 08:26 PM 7/18/01 +0100, Dave Beckett wrote: > > > 3. Outstanding issue 1 > > > > > > N-Triples is a text/plain MIME type format - consider character and > > > encoding issues with requirement to be able to express all Unicode chars. > > > > > > That much is easy, I think. Use: > > > Content-type: text/plain;charset=utf-8 > > > >That out-of-band information cannot be picked up by the parser just > >reading the bytes. I would prefer the format to be self-contained if > >possible, not depending on charset, so all unicode chars can be > >handled inside US-ASCII. Asking N-Triples parsers to have to add an > >entire UTF-8 decoding step seems rather an large step, when \u > >etc. below could do the work when required. > > OK, I accept the "out of band" point. > > Assuming that the only place where non-ASCII characters can appear is in > string literals, that might work. Then I think we'd also need to be > careful about requiring non-USASCII characters to always be escaped in > string literals so that the higher Unicode code points don't appear > anywhere in the N-triples source code. > > But then there's the internationalized URIs and domain name work that's > waiting in the wings, so I don't suppose that approach would last for ever. > > If you want to stick with just US-ASCII in an N-triples file then I won't > fight it, but my own feeling is that it would be easier to just > say: always use UTF-8 encoding. That seems fairly future-proof. If n-triples containing characters other than us-ascii are escaped, it would not be viewed with a normal viewer or an editor. It may be a problem for people who use a language other than English. However, there are some problem. 1. Is it possible to using encoding other than utf-8? For example, some servers or gateways may make a content negotiation and convert its charset, if n-triples uses text/* content-type. What happen if the server convert its encoding to other than utf-8 and replace charset parameter? 2. What happen if n-triples are transfered using 'Content-Type: text/plain'? MIME specification says that if there is no charset parameter, it must be treated as us-ascii, and HTTP says iso-8859-1. Is it an error? To avoid these problem, RDF uses XML as a container, and it solves these problems. So, it's may be nonsense to discuss about transfering n-triples. In this meaning, escapeing all characters other than us-ascii, and handling n-triples as us-ascii text seems good idea. Satoshi --- Satoshi Nakamura <snakamura@infoteria.co.jp> Infoteria Corporation
Received on Thursday, 19 July 2001 12:01:55 UTC