- From: Graham Klyne <GK@NineByNine.org>
- Date: Thu, 19 Jul 2001 20:43:03 +0100
- To: Dave Beckett <dave.beckett@bristol.ac.uk>
- Cc: RDF core WG <w3c-rdfcore-wg@w3.org>
At 04:38 PM 7/19/01 +0100, Dave Beckett wrote:
>I've managed to update the document
> http://www.w3.org/2001/sw/RDFCore/ntriples/
>with the notation changes you previously described, and with the
>issues you have brought up, along with some solutions.
A nit: you still have:
eoln ::= cr? lf
given that (a) email converts to CRLF in transit, and back to local
conventions on receipt, (b) HTTP does not touch EOL sequences, (c) systems
exist that use CR/LF (PCs), bare LF (Un*x), bare CR (Macs, I believe) then
I think any of these may appear, however the document is created.
Thus (contrary to my earlier comments) I'd suggest:
eoln ::= cr lf | cr | lf
(Aaron: am I right about the Mac?)
> >>>Graham Klyne said:
><snip/>
> > If you want to stick with just US-ASCII in an N-triples file then I won't
> > fight it, but my own feeling is that it would be easier to just
> > say: always use UTF-8 encoding. That seems fairly future-proof.
>
>I don't mind saying N-Triples is UTF-8 since I've got code around to
>do that and it comes for free with Java and Python for example.
>However it just moves the escaping to a different level and makes it
>impossible for anyone to generate unicode characters with plain text
>(ASCII that is) editors.
If you want to have a single encoded representation for a string, that's true.
I guess I don't mind either!
>Dave said:
> > >How about just one escape \UXXXXXXXX for all chars not made available
> > >by \-escapes or used in-situ - that seems more appealing for this
> > >little syntax.
>Graham said:
> > Well, that could work too.
>
>Yes, but is it better than my other suggestions?
>I've listed all the suggestions in the updated doc.
OK, my vote is for your option 1.
My reasons, FWIW, are:
(a) to achieve a common representation for any string, and to be possible
to create N-triples with a non-UTF-8 tools. (The common encoded
representation isn't so important to me, but the requirement has been
expressed...)
(b) that the higher 32-bit code-points seem very rare, and 4 leading zeros
is a lot of overhead for a very occasionally (if ever) required feature.
> > > > 5. eoln format
><snip/>
>Graham said:
> > I suppose, then, we must go back to allowing CRLF, LF or CR as a line
> > break, to be compatible with anything that can be served via HTTP.
>
>which was actually where I started in the first version, Doh!
Yes, I acknowledge that. That was before I'd taken on board the fact that
HTTP breaks the MIME text handling model that I'm most familiar
with. Sorry for the roundabout (but I learned something, so it wasn't a
total waste ;-).
#g
------------
Graham Klyne
(GK@ACM.ORG)
Received on Thursday, 19 July 2001 15:52:37 UTC