- From: Graham Klyne <GK@NineByNine.org>
- Date: Thu, 19 Jul 2001 20:43:03 +0100
- To: Dave Beckett <dave.beckett@bristol.ac.uk>
- Cc: RDF core WG <w3c-rdfcore-wg@w3.org>
At 04:38 PM 7/19/01 +0100, Dave Beckett wrote: >I've managed to update the document > http://www.w3.org/2001/sw/RDFCore/ntriples/ >with the notation changes you previously described, and with the >issues you have brought up, along with some solutions. A nit: you still have: eoln ::= cr? lf given that (a) email converts to CRLF in transit, and back to local conventions on receipt, (b) HTTP does not touch EOL sequences, (c) systems exist that use CR/LF (PCs), bare LF (Un*x), bare CR (Macs, I believe) then I think any of these may appear, however the document is created. Thus (contrary to my earlier comments) I'd suggest: eoln ::= cr lf | cr | lf (Aaron: am I right about the Mac?) > >>>Graham Klyne said: ><snip/> > > If you want to stick with just US-ASCII in an N-triples file then I won't > > fight it, but my own feeling is that it would be easier to just > > say: always use UTF-8 encoding. That seems fairly future-proof. > >I don't mind saying N-Triples is UTF-8 since I've got code around to >do that and it comes for free with Java and Python for example. >However it just moves the escaping to a different level and makes it >impossible for anyone to generate unicode characters with plain text >(ASCII that is) editors. If you want to have a single encoded representation for a string, that's true. I guess I don't mind either! >Dave said: > > >How about just one escape \UXXXXXXXX for all chars not made available > > >by \-escapes or used in-situ - that seems more appealing for this > > >little syntax. >Graham said: > > Well, that could work too. > >Yes, but is it better than my other suggestions? >I've listed all the suggestions in the updated doc. OK, my vote is for your option 1. My reasons, FWIW, are: (a) to achieve a common representation for any string, and to be possible to create N-triples with a non-UTF-8 tools. (The common encoded representation isn't so important to me, but the requirement has been expressed...) (b) that the higher 32-bit code-points seem very rare, and 4 leading zeros is a lot of overhead for a very occasionally (if ever) required feature. > > > > 5. eoln format ><snip/> >Graham said: > > I suppose, then, we must go back to allowing CRLF, LF or CR as a line > > break, to be compatible with anything that can be served via HTTP. > >which was actually where I started in the first version, Doh! Yes, I acknowledge that. That was before I'd taken on board the fact that HTTP breaks the MIME text handling model that I'm most familiar with. Sorry for the roundabout (but I learned something, so it wasn't a total waste ;-). #g ------------ Graham Klyne (GK@ACM.ORG)
Received on Thursday, 19 July 2001 15:52:37 UTC